If you're good at handling Python np.unique, I'd like to ask you a question.

Asked 2 years ago, Updated 2 years ago, 18 views

I have an Excel file, but I'm having a hard time because there are more conditions than I thought to erase duplicate ones.

1. The data looks like this, but convert it into a csv file and load the file using Pandas Successfully changed to No.Pi data. Let's call this data.

2. Next, only rows with a value of 20180101 or higher and 20180630 or lower in the second column behind you are separate It was extracted using a function called np.delete (up to this point, the shape is (489, 8) So let's say the data created like this is data1.

3. After removing only the station column using the number-fi indexing function, We made up to idx using return_index=True using np.unique function. For example, since there are three ADJs in the first station, the first position is 0 and Next, there are three BBKs, so a 1d-array with the 3rd, the first position of the BBKs, is created. (The 1d-array thus made is 144 in length.) The 1d-array made like this is called idx.

4. Using idx, which contains unique location information, we created it through work 2 I want to extract the row of data1 separately. For example, if idx = [0,3,6,...] then the position of the row corresponding to the number of idx, that is, the 0th row of data1 Can you find the third row, the sixth row, etc., extract the information of the row and make it into the form of (144,8)?

I'd appreciate your help.

python

2022-09-20 15:05

1 Answers

>>> df = pd.DataFrame({"station":["AJD", "AJD", "BBK", "BBK"],
           "channel":["HGE", "HGN", "HGE", "HGE"],
           "network":["KG", "KG", "KG", "KG"],
           "lat":[34.74, 34.74, 35.57, 35.57],
           "lon":[126.12, 126.12, 129.43, 129.43],
           "ele":[100, 100, 100, 100],
           "st":[20140101, 20140101, 20170131, 20140101],
           "end":[99991231, 99991231, 99991231, 20170130]})
>>> df
  station channel network    lat     lon  ele        st       end
0     AJD     HGE      KG  34.74  126.12  100  20140101  99991231
1     AJD     HGN      KG  34.74  126.12  100  20140101  99991231
2     BBK     HGE      KG  35.57  129.43  100  20170131  99991231
3     BBK     HGE      KG  35.57  129.43  100  20140101  20170130
>>> df.groupby("station").first()
        channel network    lat     lon  ele        st       end
station                                                        
AJD         HGE      KG  34.74  126.12  100  20140101  99991231
BBK         HGE      KG  35.57  129.43  100  20170131  99991231
>>> df.groupby("station").first().reset_index()
  station channel network    lat     lon  ele        st       end
0     AJD     HGE      KG  34.74  126.12  100  20140101  99991231
1     BBK     HGE      KG  35.57  129.43  100  20170131  99991231

Is this what you want?


2022-09-20 15:05

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.