I would like to print out the same id in Pandas together, and then output it in the order of timestamp.

Asked 1 years ago, Updated 1 years ago, 254 views

Run Environment
Windows 10 Python 3.X
pandas

Data Description

The original data below is part of a csv version of the log.
This is a column called id, timestamp.
timestamp is arranged in ascending order with id arranged accordingly.
The id is basically organized, but due to delays, other ids may be output in the form of interrupts first, and this is the log for that part.

What do you want to do

I would like to sort this original data as shown in data I want to process
I would like to print out the same id together, and then output it in the order of timestamp.
I would like to sort by the timestamp at the beginning of each id.
How can I sort it?

Original data

Data you want to process

Tried

I did sort using sort_values.
We can sort the data by using the same id, but since the id is sorted in ascending order,
In principle, we could not sort them in the order of timestamp.

csv_data='"
id, timestamp
idefs, 2022-12-15T00:00:01.0000000Z
idefs, 2022-12-15T00:00:02.0000000Z
idefs, 2022-12-15T00:00:03.0000000Z
idefs, 2022-12-15T00:00:04.0000000Z
aaefsd, 2022-12-15T00:00:05.0000000Z
idefs, 2022-12-15T00:00:06.0000000Z
aaefsd, 2022-12-15T00:00:07.0000000Z
aaefsd, 2022-12-15T00:00:08.0000000Z
qwerty, 2022-12-15T00:00:09.0000000Z
aaefsd, 2022-12-15T00:00:10.0000000Z
aaefsd, 2022-12-15T00:00:11.0000000Z
qwerty, 2022-12-15T00:00:12.0000000Z
qwerty, 2022-12-15T00:00:13.0000000Z
qwerty, 2022-12-15T00:00:14.0000000Z
qwerty, 2022-12-15T00:00:15.0000000Z
'''

df = pd.read_csv(csv_data)

df.sort_values(['id', 'timestamp']) 

Results

id timestamp
4aaefsd2022-12-15T00:00:05.0000000Z
6aaefsd2022-12-15T00:00:07.0000000Z
7aaefsd2022-12-15T00:00:08.0000000Z
9aaefsd2022-12-15T00:00:10.0000000Z
10aaefsd2022-12-15T00:00:11.0000000Z
0 idefs2022-12-15T00:00:01.0000000Z
1 idefs2022-12-15T00:00:02.0000000Z
2 idefs2022-12-15T00:00:03.0000000Z
3 idefs2022-12-15T00:00:04.0000000Z
5 idefs 2022-12-15T00:00:06.0000000Z
8qwerty 2022-12-15T00:00:09.0000000Z
11 qwerty 2022-12-15T00:00:12.0000000Z
12 qwerty 2022-12-15T00:00:13.0000000Z
13qwerty 2022-12-15T00:00:14.0000000Z
14 qwerty 2022-12-15T00:00:15.0000000Z

python pandas

2023-01-03 23:30

2 Answers

dfx=df.groupby('id', sort=False).apply(lambdax:x['timestamp'].sort_values())\
        .reset_index(level=0).reset_index(drop=True)
print(dfx)


2023-01-04 01:13

Assuming that the id order of the data to be processed is the order of appearance of the id of the original data, we will answer.
※ The pandas version is 1.1.0

import pandas as pd
importio

data=""
id, timestamp
idefs, 2022-12-15T00:00:01.0000000Z
idefs, 2022-12-15T00:00:02.0000000Z
idefs, 2022-12-15T00:00:03.0000000Z
idefs, 2022-12-15T00:00:04.0000000Z
aaefsd, 2022-12-15T00:00:05.0000000Z
idefs, 2022-12-15T00:00:06.0000000Z
aaefsd, 2022-12-15T00:00:07.0000000Z
aaefsd, 2022-12-15T00:00:08.0000000Z
qwerty, 2022-12-15T00:00:09.0000000Z
aaefsd, 2022-12-15T00:00:10.0000000Z
aaefsd, 2022-12-15T00:00:11.0000000Z
qwerty, 2022-12-15T00:00:12.0000000Z
qwerty, 2022-12-15T00:00:13.0000000Z
qwerty, 2022-12-15T00:00:14.0000000Z
qwerty, 2022-12-15T00:00:15.0000000Z
"""

df = pd.read_csv(io.StringIO(data))
df["timestamp"] = pd.to_datetime(df["timestamp"])
print(df)
#        id timestamp
#0 idefs2022-12-1500:00:01 + 00:00
#1 idefs2022-12-15 00:00:02 + 00:00
#2 idefs2022-12-15 00:00:03 + 00:00
#3 idefs2022-12-15 00:00:04 + 00:00
#4 aaefsd 2022-12-15 00:00:05 + 00:00
#5 idefs2022-12-1500:00:06+00:00
#6 aaefsd 2022-12-15 00:00:07 + 00:00
#7 aaefsd 2022-12-1500:00:08 + 00:00
#8 qwerty 2022-12-15 00:00:09 + 00:00
#9 aaefsd 2022-12-15 00:00:10 + 00:00
#10 aaefsd 2022-12-15 00:00:11 + 00:00
#11 qwerty 2022-12-1500:00:12 + 00:00
#12 qwerty 2022-12-15 00:00:13 + 00:00
#13 qwerty 2022-12-15 00:00:14 + 00:00
#14 qwerty 2022-12-15 00:00:15 + 00:00

key_dict={}
for i,vin enumerate(df["id"].drop_duplicates()):
    key_dict[v] = i+1
    
def key(s):
    returns.map(key_dict) 

df=df.sort_values(["id", "timestamp", key=key)
print(df)
#        id timestamp
#0 idefs2022-12-1500:00:01 + 00:00
#1 idefs2022-12-15 00:00:02 + 00:00
#2 idefs2022-12-15 00:00:03 + 00:00
#3 idefs2022-12-15 00:00:04 + 00:00
#5 idefs2022-12-1500:00:06+00:00
#4 aaefsd 2022-12-15 00:00:05 + 00:00
#6 aaefsd 2022-12-15 00:00:07 + 00:00
#7 aaefsd 2022-12-1500:00:08 + 00:00
#9 aaefsd 2022-12-15 00:00:10 + 00:00
#10 aaefsd 2022-12-15 00:00:11 + 00:00
#8 qwerty 2022-12-15 00:00:09 + 00:00
#11 qwerty 2022-12-1500:00:12 + 00:00
#12 qwerty 2022-12-15 00:00:13 + 00:00
#13 qwerty 2022-12-15 00:00:14 + 00:00
#14 qwerty 2022-12-15 00:00:15 + 00:00


2023-01-04 05:01

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.