Understanding Data Processing with Python Pandas

Asked 1 years ago, Updated 1 years ago, 395 views

I'm almost a beginner.It's good to look into various things, but I can't figure out how to write them, so I can't help but get stuck.Thank you for your help.

There is a data file like chr_effect.txt.
There are 31 columns of animal, pnt, chr1...chr29 with 1 and 2 pnt for each animal.For convenience, we only show the first five animals (10 lines), but we actually have about 5,000 animals, or 10,000 lines of data.

There is another file like chr_pnt.txt.
For convenience, we only show the first 10 lines, but actually 100,000 rows x 29 columns (chr1...chr29) with 1 or 2 indicating pnt.

What I want to do is:
1. Extract the value of chr_effect corresponding to each element of chr_pnt (uniquely determined by chr and pnt).
2. Calculate the sum of chr1...chr29 from the value of chr_effect taken out.
3. Return the average and standard deviation of the calculated sum (100,000 units).
4. Do the above for each animal and output the file.

The image of the file you want to print is solutions.txt.
I'm sorry that the explanation is difficult to understand.

I personally feel ashamed to say that I have a hard time reading data as follows.

import pandas as pd

chr_effect=pd.read_table('chr_effect.txt', delim_whitespace=True)
chr_pnt=pd.read_table('chr_pnt.txt', delim_whitespace=True)

chr_effect.txt

animal pnt chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chr23 chr24 chr25 chr26 chr28 chr29
27790   1   -3.249619881    1.776331581 -2.24113552 -4.817298276    -5.48548554 -1.759651526    5.334048371 3.891744788 1.884854677 -3.242535628    3.756604968 2.986742453 0.612356541 -2.86922751 4.352907409 0.90651085  -0.412862385    0.296373936 -5.889000773    -0.68745163 0.298499644 0.814954378 -0.095186506    0.865718907 2.540816382 5.789061632 -0.559834391    -2.53661836 0.751069038
27790   2   -0.390512283    0.3496771   -2.139579662    -4.59481831 -0.776924186    -0.35638789 -0.526063862    -0.477037798    -1.569054067    -2.90589106 -1.465912498    -3.364848089    0.975073498 -9.574933303    -2.323818571    -2.219380737    -3.775384093    -0.309000541    -0.003886077    1.91834523  -0.124754068    2.23336583  -2.7314078  -1.352587759    -2.121087552    -0.59768476 0.519127176 5.32087313  1.659182504
75952   1   1.860075236 -1.952826593    7.941593806 6.971151872 -5.39318137 24.5906086  -0.627978002    1.558862775 -2.044366214    1.593249157 -1.53647959 -5.210752328    3.478248395 -5.889005176    -0.485182934    1.525452838 -5.212397443    1.330459383 3.158410718 -1.316064027    3.041514078 2.484066153 -1.601287478    4.598523036 1.176341264 -0.732658875    -1.546846435    -0.241240957    -2.703045534
75952   2   0.07320026  -1.193285236    3.664270318 -1.198182299    -3.285812454    -7.21470412 1.759143544 7.648100385 1.693592132 0.946791605 -2.87792026 -2.67183088 -1.945447857    2.474190902 -2.621106236    -2.764357816    -2.279304115    0.00797417  0.663827378 1.58174674  1.271269494 1.272317695 1.142039742 0.076808881 -0.46131242 3.365393607 -1.174178405    -2.626477213    -1.37721366
90414   1   1.739067807 2.705188308 -1.245923538    -0.002399875    -2.599279152    -0.154139815    0.303458033 2.786962772 -2.988691596    0.725134479 -1.551469947    1.406454822 1.488023864 0.978901455 1.780002113 -5.183237296    -1.602299904    0.057587453 4.119123527 -3.148129947    -0.699950319    -0.354722639    0.366632135 4.998152571 1.543905898 3.172006996 -5.345136372    -0.056620664    2.422692998
90414   2   -2.519037307    -0.926182651    0.66829595  0.083873692 -3.127323408    -7.296881507    -0.797579   -1.692284026    0.446947886 -2.593041865    -4.321641322    1.195738671 -1.478727635    4.047270398 2.312901751 -5.3839035  -0.904768118    -3.425975528    -0.418056083    0.648341358 -0.674819627    -1.525995063    0.219910922 0.381569648 -1.513227344    2.566777598 -2.742408197    2.510699832 -1.25012918
90470   1   1.451065566 7.687896532 1.118122984 0.217045435 -4.390935698    -2.358229388    -8.346397709    2.006546339 5.738527934 -3.025387626    -4.028971005    1.712205845 3.627846534 6.986557249 0.963232976 -6.304871835    0.176607468 0.800108489 -0.063414724    6.655480908 -3.68568447 -1.681859217    0.952996904 -3.51742161 -1.585046358    -0.40142409 -1.222321719    5.864682736 0.77443274
90470   2   -3.236474166    -0.828191866    0.864194126 -2.386854158    -5.98932058 -4.859134618    -1.443351926    0.280750195 7.801439742 -4.921804746    1.155552973 1.624703212 -0.895162234    -5.903743854    0.901627188 0.187514121 -2.314349528    -5.426454814    3.349431154 6.005233669 -1.902131074    -1.325578071    0.787564088 -0.522419842    -1.345004348    1.169492376 1.537055754 6.767023934 -3.057591512
102853  1   0.01378943  10.9234678  1.456693154 0.034962635 -5.202454665    -1.662830588    -4.066262419    10.26374255 6.980704894 -1.44364735 -10.15811404    -4.508814067    4.862243682 7.606665536 -0.010935424    -1.73585101 -0.57417706 -1.087272208    4.78751793  7.514152847 -1.479511504    -1.681859217    0.472674332 -0.056568908    -3.122885588    4.59333307  2.501062924 2.730274228 0.727288217
102853  2   1.398403246 1.072480483 -8.505226254    2.64111658  -2.297556361    -1.40125181 -0.318009921    13.65483956 -0.776215002    1.719216586 -3.03228246 -5.971878011    -0.004130784    2.885798136 2.377923162 0.57592465  -3.414375536    -2.701702732    3.752473244 0.186915701 3.673363686 -2.915896525    0.89251478  5.26937868  -1.79715952 2.484896944 -1.231909699    2.871125798 1.623539915

chr_pnt.txt

 chr 1 chr 2 chr 3 chr 4 chr 5 chr 6 chr 7 chr 8 chr 9 chr 10 chr 11 chr 12 chr 14 chr 15 chr 16 chr 17 chr 18 chr 19 chr 20 chr 21 chr 22 chr 23 chr 24 chr 25 chr 26 chr 27 chr 28 chr 29
2   2   1   1   2   2   2   1   1   1   1   1   1   2   1   2   2   2   2   1   2   1   2   1   2   2   1   1   2
1   2   2   1   2   1   2   1   1   1   1   1   1   2   2   2   1   1   2   1   1   1   1   2   2   2   2   1   1
1   1   2   2   1   1   1   2   2   2   2   1   1   2   2   1   1   1   1   1   2   2   2   1   1   1   1   1   1
2   1   2   2   1   2   2   1   2   1   1   2   1   2   2   1   2   2   2   1   2   2   1   1   2   1   1   2   2
1   2   2   1   1   2   2   1   2   1   1   1   1   2   1   1   2   1   1   2   2   1   1   2   2   2   1   2   1
2   1   2   2   2   2   2   1   1   1   1   1   1   1   2   1   1   1   2   2   1   1   2   2   2   1   1   1   1
2   2   1   2   1   1   1   1   2   1   1   1   2   2   2   1   1   2   1   1   2   2   1   2   2   2   1   1   2
1   2   1   2   1   1   1   2   2   2   2   1   2   2   1   1   2   2   1   1   2   2   1   2   1   1   1   2   1
2   2   2   2   1   1   1   2   1   2   2   2   1   1   2   2   2   1   2   1   1   1   2   1   1   2   1   2   1
1   1   1   1   1   1   1   2   1   1   2   2   1   2   1   2   1   2   1   2   1   1   2   1   1   2   1   1   1

solutions.txt

animal average std
27790   -13.80468698    10.76483559
75952   11.4056846  18.58066141
90414   -10.91855174    8.140109257
90470   -3.89969823 11.36792823
102853  20.69108446 11.8625067

python pandas bioinformatics

2022-12-08 01:16

2 Answers

If you want to follow the instructions below (although the content of solution.txt in the question differs from the results):

import pandas as pd

chr_effect=pd.read_table('chr_effect.txt', delim_whitespace=True)
chr_pnt=pd.read_table('chr_pnt.txt', delim_whitespace=True)

def mapping (animal):
    a=animal.set_index('pnt')
    return chr_pnt.apply(lambdac:c.apply(lambdax:a.loc[x,c.name]))\
                  .sum(axis=1).agg(average='mean', std='std')

chrx = chr_effect.groupby('animal').apply(mapping)
print(chrx)


2022-12-08 02:29

I've looked into various things, but I don't know how to write them

Below is the procedure for calculating the sum of the sum.You should be able to do the same


as it is managed by animal&pnt ① First, chr_pnt is counted by pnt (29 items up to ch29)

print((dfpnt==1).sum(axis=0).to_list())#pnt==1
# [5, 4, 4, 4, 7, 6, 5, 6, 5, 7, 6, 7, 8, 2, 4, 6, 5, 5, 5, 7, 4, 6, 5, 5, 4, 4, 9, 6, 7]

ch Extract chr_effect for each pnt

display(dfeff[dfeff.pnt==1])

あと Multiplication

 df1,df2 = (dfeff extract * dfpnt aggregate for pin(1,2))
dfsum = df1 + df2

# When adding pnt1, pnt2 without separating them
dfsum=sum(dfeff extract * dfpnt aggregate for pin(1,2))

To sum up the above,

import pandas as pd
dffef=pd.read_csv(chr_effect, sep='\s+', index_col='animal')
dfpnt=pd.read_csv(chr_pnt, sep='\s+')

dfsum=sum(dfeff.loc [dfeff.pnt==p, 'chr1': 'chr29']
                 .mul(dfpnt==p).sum(axis='index'),axis='columns')
                                                      for pin (1, 2)

display(dfsum.sum(axis='columns'))
# animal
# 27790    -185.712504
# 75952     159.058371
# 90414     -85.219206
# 90470     -75.583044
# 102853    203.524688
# dtype —float64


2022-12-08 05:02

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.