I'm almost a beginner.It's good to look into various things, but I can't figure out how to write them, so I can't help but get stuck.Thank you for your help.
There is a data file like chr_effect.txt.
There are 31 columns of animal, pnt, chr1...chr29 with 1 and 2 pnt for each animal.For convenience, we only show the first five animals (10 lines), but we actually have about 5,000 animals, or 10,000 lines of data.
There is another file like chr_pnt.txt.
For convenience, we only show the first 10 lines, but actually 100,000 rows x 29 columns (chr1...chr29) with 1 or 2 indicating pnt.
What I want to do is:
1. Extract the value of chr_effect corresponding to each element of chr_pnt (uniquely determined by chr and pnt).
2. Calculate the sum of chr1...chr29 from the value of chr_effect taken out.
3. Return the average and standard deviation of the calculated sum (100,000 units).
4. Do the above for each animal and output the file.
The image of the file you want to print is solutions.txt.
I'm sorry that the explanation is difficult to understand.
I personally feel ashamed to say that I have a hard time reading data as follows.
import pandas as pd
chr_effect=pd.read_table('chr_effect.txt', delim_whitespace=True)
chr_pnt=pd.read_table('chr_pnt.txt', delim_whitespace=True)
chr_effect.txt
animal pnt chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chr23 chr24 chr25 chr26 chr28 chr29
27790 1 -3.249619881 1.776331581 -2.24113552 -4.817298276 -5.48548554 -1.759651526 5.334048371 3.891744788 1.884854677 -3.242535628 3.756604968 2.986742453 0.612356541 -2.86922751 4.352907409 0.90651085 -0.412862385 0.296373936 -5.889000773 -0.68745163 0.298499644 0.814954378 -0.095186506 0.865718907 2.540816382 5.789061632 -0.559834391 -2.53661836 0.751069038
27790 2 -0.390512283 0.3496771 -2.139579662 -4.59481831 -0.776924186 -0.35638789 -0.526063862 -0.477037798 -1.569054067 -2.90589106 -1.465912498 -3.364848089 0.975073498 -9.574933303 -2.323818571 -2.219380737 -3.775384093 -0.309000541 -0.003886077 1.91834523 -0.124754068 2.23336583 -2.7314078 -1.352587759 -2.121087552 -0.59768476 0.519127176 5.32087313 1.659182504
75952 1 1.860075236 -1.952826593 7.941593806 6.971151872 -5.39318137 24.5906086 -0.627978002 1.558862775 -2.044366214 1.593249157 -1.53647959 -5.210752328 3.478248395 -5.889005176 -0.485182934 1.525452838 -5.212397443 1.330459383 3.158410718 -1.316064027 3.041514078 2.484066153 -1.601287478 4.598523036 1.176341264 -0.732658875 -1.546846435 -0.241240957 -2.703045534
75952 2 0.07320026 -1.193285236 3.664270318 -1.198182299 -3.285812454 -7.21470412 1.759143544 7.648100385 1.693592132 0.946791605 -2.87792026 -2.67183088 -1.945447857 2.474190902 -2.621106236 -2.764357816 -2.279304115 0.00797417 0.663827378 1.58174674 1.271269494 1.272317695 1.142039742 0.076808881 -0.46131242 3.365393607 -1.174178405 -2.626477213 -1.37721366
90414 1 1.739067807 2.705188308 -1.245923538 -0.002399875 -2.599279152 -0.154139815 0.303458033 2.786962772 -2.988691596 0.725134479 -1.551469947 1.406454822 1.488023864 0.978901455 1.780002113 -5.183237296 -1.602299904 0.057587453 4.119123527 -3.148129947 -0.699950319 -0.354722639 0.366632135 4.998152571 1.543905898 3.172006996 -5.345136372 -0.056620664 2.422692998
90414 2 -2.519037307 -0.926182651 0.66829595 0.083873692 -3.127323408 -7.296881507 -0.797579 -1.692284026 0.446947886 -2.593041865 -4.321641322 1.195738671 -1.478727635 4.047270398 2.312901751 -5.3839035 -0.904768118 -3.425975528 -0.418056083 0.648341358 -0.674819627 -1.525995063 0.219910922 0.381569648 -1.513227344 2.566777598 -2.742408197 2.510699832 -1.25012918
90470 1 1.451065566 7.687896532 1.118122984 0.217045435 -4.390935698 -2.358229388 -8.346397709 2.006546339 5.738527934 -3.025387626 -4.028971005 1.712205845 3.627846534 6.986557249 0.963232976 -6.304871835 0.176607468 0.800108489 -0.063414724 6.655480908 -3.68568447 -1.681859217 0.952996904 -3.51742161 -1.585046358 -0.40142409 -1.222321719 5.864682736 0.77443274
90470 2 -3.236474166 -0.828191866 0.864194126 -2.386854158 -5.98932058 -4.859134618 -1.443351926 0.280750195 7.801439742 -4.921804746 1.155552973 1.624703212 -0.895162234 -5.903743854 0.901627188 0.187514121 -2.314349528 -5.426454814 3.349431154 6.005233669 -1.902131074 -1.325578071 0.787564088 -0.522419842 -1.345004348 1.169492376 1.537055754 6.767023934 -3.057591512
102853 1 0.01378943 10.9234678 1.456693154 0.034962635 -5.202454665 -1.662830588 -4.066262419 10.26374255 6.980704894 -1.44364735 -10.15811404 -4.508814067 4.862243682 7.606665536 -0.010935424 -1.73585101 -0.57417706 -1.087272208 4.78751793 7.514152847 -1.479511504 -1.681859217 0.472674332 -0.056568908 -3.122885588 4.59333307 2.501062924 2.730274228 0.727288217
102853 2 1.398403246 1.072480483 -8.505226254 2.64111658 -2.297556361 -1.40125181 -0.318009921 13.65483956 -0.776215002 1.719216586 -3.03228246 -5.971878011 -0.004130784 2.885798136 2.377923162 0.57592465 -3.414375536 -2.701702732 3.752473244 0.186915701 3.673363686 -2.915896525 0.89251478 5.26937868 -1.79715952 2.484896944 -1.231909699 2.871125798 1.623539915
chr_pnt.txt
chr 1 chr 2 chr 3 chr 4 chr 5 chr 6 chr 7 chr 8 chr 9 chr 10 chr 11 chr 12 chr 14 chr 15 chr 16 chr 17 chr 18 chr 19 chr 20 chr 21 chr 22 chr 23 chr 24 chr 25 chr 26 chr 27 chr 28 chr 29
2 2 1 1 2 2 2 1 1 1 1 1 1 2 1 2 2 2 2 1 2 1 2 1 2 2 1 1 2
1 2 2 1 2 1 2 1 1 1 1 1 1 2 2 2 1 1 2 1 1 1 1 2 2 2 2 1 1
1 1 2 2 1 1 1 2 2 2 2 1 1 2 2 1 1 1 1 1 2 2 2 1 1 1 1 1 1
2 1 2 2 1 2 2 1 2 1 1 2 1 2 2 1 2 2 2 1 2 2 1 1 2 1 1 2 2
1 2 2 1 1 2 2 1 2 1 1 1 1 2 1 1 2 1 1 2 2 1 1 2 2 2 1 2 1
2 1 2 2 2 2 2 1 1 1 1 1 1 1 2 1 1 1 2 2 1 1 2 2 2 1 1 1 1
2 2 1 2 1 1 1 1 2 1 1 1 2 2 2 1 1 2 1 1 2 2 1 2 2 2 1 1 2
1 2 1 2 1 1 1 2 2 2 2 1 2 2 1 1 2 2 1 1 2 2 1 2 1 1 1 2 1
2 2 2 2 1 1 1 2 1 2 2 2 1 1 2 2 2 1 2 1 1 1 2 1 1 2 1 2 1
1 1 1 1 1 1 1 2 1 1 2 2 1 2 1 2 1 2 1 2 1 1 2 1 1 2 1 1 1
solutions.txt
animal average std
27790 -13.80468698 10.76483559
75952 11.4056846 18.58066141
90414 -10.91855174 8.140109257
90470 -3.89969823 11.36792823
102853 20.69108446 11.8625067
If you want to follow the instructions below (although the content of solution.txt
in the question differs from the results):
import pandas as pd
chr_effect=pd.read_table('chr_effect.txt', delim_whitespace=True)
chr_pnt=pd.read_table('chr_pnt.txt', delim_whitespace=True)
def mapping (animal):
a=animal.set_index('pnt')
return chr_pnt.apply(lambdac:c.apply(lambdax:a.loc[x,c.name]))\
.sum(axis=1).agg(average='mean', std='std')
chrx = chr_effect.groupby('animal').apply(mapping)
print(chrx)
I've looked into various things, but I don't know how to write them
Below is the procedure for calculating the sum of the sum.You should be able to do the same
as it is managed by animal&pnt
① First, chr_pnt is counted by pnt (29 items up to ch29)
print((dfpnt==1).sum(axis=0).to_list())#pnt==1
# [5, 4, 4, 4, 7, 6, 5, 6, 5, 7, 6, 7, 8, 2, 4, 6, 5, 5, 5, 7, 4, 6, 5, 5, 4, 4, 9, 6, 7]
ch Extract chr_effect for each pnt
display(dfeff[dfeff.pnt==1])
あと Multiplication
df1,df2 = (dfeff extract * dfpnt aggregate for pin(1,2))
dfsum = df1 + df2
# When adding pnt1, pnt2 without separating them
dfsum=sum(dfeff extract * dfpnt aggregate for pin(1,2))
To sum up the above,
import pandas as pd
dffef=pd.read_csv(chr_effect, sep='\s+', index_col='animal')
dfpnt=pd.read_csv(chr_pnt, sep='\s+')
dfsum=sum(dfeff.loc [dfeff.pnt==p, 'chr1': 'chr29']
.mul(dfpnt==p).sum(axis='index'),axis='columns')
for pin (1, 2)
display(dfsum.sum(axis='columns'))
# animal
# 27790 -185.712504
# 75952 159.058371
# 90414 -85.219206
# 90470 -75.583044
# 102853 203.524688
# dtype —float64
© 2024 OneMinuteCode. All rights reserved.