The data in df["click"] is in the form of a string yyyyymmddHHMMSS as shown below.
["20211122000000", "20211122000000", "20211122000000", "20211122000000" ...]
The code below is being used to convert to datetime values.
df["click"] = df["click"].apply(pd.to_datetime, errors="coerce")
But the number of rows in DataFrame exceeded 1 million lines, so it was too slow. Is it possible to convert the string data to datetime (yyyy-mm-dd HH:MM:SS) using numpy?
Or even if it's not numpy, what's faster than the source I'm using? DataFrame is Pandas because PySpark, Koalas, and Dask are not available.
python mongodb
pd.to_datetime
can receive series as a factor.
df["click_dt"] = pd.to_datetime(df["click"], error="coerce")
If the format of the time string is constant, if you fix the format
factor, pd.to_datetime
won't worry about what format it is, so it will be faster. If it's the format you asked, it's probably like the code below.
df["click_dt"] = pd.to_datetime(df["click"], format="%Y%m%d%H%M%S", error="coerce")
© 2024 OneMinuteCode. All rights reserved.