I am currently trying to change the value of the Pandas data frame by repeating all the lines.(This time we are trying to remove % from the value.)
At that time, the following warning will appear.Also, this warning will take a long time to process.
I went to the warning statement site below and used dataframe._setitem_with_indexer, but it turned out to be an error or a similar warning statement and I cannot change it.
I would appreciate it if you could tell me the correct grammar when substituting the same column name using df.iloc.
There were no errors or warnings if the left and right sides were different.
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caves in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
·Pre-modification code
for i in range(len(df)):
df['column'].iloc[i]=(df['column'].iloc[i].split('%')))[0]
·Changed code
for i in range(len(df)):
df['column'].iloc._setitem_with_indexer(i, (df['column'].iloc[i].split('%')))[0]))
SettingWithCopyWarning
is a warning that if you do the following, you will first calculate df['column']
and use it to calculate iloc[i]
, which will take a long time to process:
df['column'].iloc[i]
If you write like this, the calculation will be done once, so it will be faster.
df.loc [i, 'column']
That's not the only problem this time.Using for
when using Pandas is very slow.In this case, the str
accessory allows you to apply the string method to each element of the data, making it easy and fast.
df['column'] = df['column'].str.split('%')[0]
Also, it is often not possible to make a number because % is on the right side, but in that case, rstrip
can be used, so it is easy to process quickly.
df['column'] = df['column'].str.rtrip('%')
import pandas as pd
df = pd.DataFrame ([["abc%def", 15], ["efg%ghi", 22]], columns = ["column", "num")
is the sample data.
Some easy ways to meet your goals include:
df["column"] = df["column"].apply(lambdas:s.split("%")[0])
I think the above description is sufficient as long as it does not cover approximately gigabytes of data.
A more technical description is to use vectorize in numpy.
This one is a little faster than above.
import numpy as np
f=np.vectorize(lambdas:s.split("%")[0])
df["column"] = f(df["column"])
A faster way to do this is to use libraries such as cython and numba for static typing and compilation, but numba seems to need special treatment because it doesn't seem to optimize the string type. I don't understand cython enough, so I'll just introduce it.
Below is a list of sites that I have used as a reference.
https://pandas.pydata.org/pandas-docs/stable/enhancingperf.html
© 2024 OneMinuteCode. All rights reserved.