How to process two lines in python or R

Asked 2 years ago, Updated 2 years ago, 39 views

I'm learning Python and R on my own.

abc de f...x
ID_1 11 17 32 962299     
ID_2371 22 929 1139      
ID_34321 241428       
ID_49 1 99 78 21
.
.
.
ID_X    

From a table similar to the one above,
I would like to calculate column a×b only if column a is 10 or more (ID_1, ID_3), and column c×d only if column c is 10 or more (ID_1, ID_2, ID_4). I don't know how to handle it well.
Is there a way to automatically divide two columns into two columns for more than 100 rows and process them?Thank you for your cooperation.

Additional
I'm sorry that it's hard to understand how to write.
Both columns after column f and rows after ID_4 are processed in large tables.
Therefore, we cannot perform individual processes such as determining column a ->calculating column c ->calculating...
Please tell me how to handle it well.

Note 2

 a/b c/de/f...x
ID_10.63.60.2      
ID_2NA 0.000.3       
ID_3 2.0 NA 5.3      
ID_4NA 1.3NA 
.
.
.
ID_X    

The image assumes the output shown above.

Currently

>odd<-DF [(0:(ncol(DF)%/%2)*2)+1]
>even<-DF [(0:(ncol(DF)%/%2)*2)]
>odd/even

So I managed to get them to calculate the adjacent columns, but
I still don't know how to calculate only rows with 10 or more even columns.

python r

2022-09-30 19:51

3 Answers

This is an example from Python.

Preprocessing

PSC:\>python.exe
Python 3.5.2 | Anaconda 4.2.0 (64-bit) | (default, Jul 516, 11:41:13) [MSC v.1900 64bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>text='"abcdef
... ID_1 11 17 32 962299
... ID_2371 22 929 1139
... ID_34321 241428
... ID_49 1 99 78 21'''
>>>data=[x.split() for x in text.splitlines()]
>> data
[['a', 'b', 'c', 'd', 'e', 'f', ['ID_1', '11', '17', '32', '9', '62, '299', ['ID_2', '3', '71', '22', '929', '11', '39', ['ID_3', '43', '21', '41', '4', '4', '8', '9', '9', '9', '9', '9', '9', '2', '9', '9', '9', '2', '9', '', '', '9', '9', '9', '9', '9', '
>>>header=data.pop(0)
>> header
['a', 'b', 'c', 'd', 'e', 'f']
>> data
[['ID_1', '11', '17', '32', '9', '62', '299', ['ID_2', '3', '71', '22', '929', '11', '39', ['ID_3', '43', '21', '2', '41', '42', '8', ['ID_4', '9', '9', '99', '78,1']]]
>>> header = ['index'] + header
>>import pandas as pd
>>>df=pd.DataFrame(data,columns=header,dtype=int)
>>>df=df.set_index(['index'])
>>df
        abc de f
index
ID_1 11 17 32 962299
ID_2371 22 929 1139
ID_34321 241428
ID_49 1 99 78 21

lines with a greater than or equal to 10

>>df [df.a>=10]
        abc de f
index
ID_1 11 17 32 962299
ID_34321 241428

Lines with a greater than or equal to 10 multiply by a and b

>>>df[df.a>=10].apply(lambdas:s.a*s.b,axis=1)
index
ID_1187
ID_3903
dtype —int64

Lines with c greater than or equal to 10 times c and d

>>df[df.c>=10].apply(lambdas:s.c*s.d,axis=1)
index
ID_1288
ID_220438
ID_47722
dtype —int64


2022-09-30 19:51

I'm not sure what output I want, but I'll give you one way with R:

# Odd Sequence Index
odd_col=seq(1L, ncol(DF), by=2L)
lapply(odd_col, function(jj){
  # pick the right line
  idx=DF[[jj]]>=10
  # line up next to this
  DF[[jj]][idx]*DF[[jj+1L]][idx]
  })


2022-09-30 19:51

For Pandas, [::2] can skip two, so you can calculate as follows:

import pandas as pd
import numpy as np

# sample data
df = pd.DataFrame ([11, 17, 32, 9, 62, 299),
             [3, 71, 22, 929, 11, 39],
             [43, 21, 2, 41, 42, 8],
             [9, 1, 99, 78, 2, 1]],
             columns = ['a', 'b', 'c', 'd', 'e', 'f'])

# line up a,c,e,...
df1 = df.iloc [:,::2]
# line up b,d,f,...
df2 = df.iloc [:,1::2]
# Calculation shall be made when more than 10.We're doing division here.
df1[df1>=10]/np.array(df2)


2022-09-30 19:51

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.