Understanding Pandas Replacing with Repeated Sentences

Asked 2 years ago, Updated 2 years ago, 47 views

I am currently trying to change the value of the Pandas data frame by repeating all the lines.(This time we are trying to remove % from the value.)

At that time, the following warning will appear.Also, this warning will take a long time to process.
I went to the warning statement site below and used dataframe._setitem_with_indexer, but it turned out to be an error or a similar warning statement and I cannot change it.

I would appreciate it if you could tell me the correct grammar when substituting the same column name using df.iloc.
There were no errors or warnings if the left and right sides were different.

SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caves in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

·Pre-modification code

for i in range(len(df)):
   df['column'].iloc[i]=(df['column'].iloc[i].split('%')))[0]

·Changed code

for i in range(len(df)):
    df['column'].iloc._setitem_with_indexer(i, (df['column'].iloc[i].split('%')))[0]))

python3 pandas

2022-09-30 18:24

2 Answers

SettingWithCopyWarning is a warning that if you do the following, you will first calculate df['column'] and use it to calculate iloc[i], which will take a long time to process:

df['column'].iloc[i]

If you write like this, the calculation will be done once, so it will be faster.

df.loc [i, 'column'] 

That's not the only problem this time.Using for when using Pandas is very slow.In this case, the str accessory allows you to apply the string method to each element of the data, making it easy and fast.

df['column'] = df['column'].str.split('%')[0]

Also, it is often not possible to make a number because % is on the right side, but in that case, rstrip can be used, so it is easy to process quickly.

df['column'] = df['column'].str.rtrip('%')


2022-09-30 18:24

import pandas as pd
df = pd.DataFrame ([["abc%def", 15], ["efg%ghi", 22]], columns = ["column", "num")

is the sample data.
Some easy ways to meet your goals include:

df["column"] = df["column"].apply(lambdas:s.split("%")[0])

I think the above description is sufficient as long as it does not cover approximately gigabytes of data.
A more technical description is to use vectorize in numpy.
This one is a little faster than above.

import numpy as np
f=np.vectorize(lambdas:s.split("%")[0])
df["column"] = f(df["column"])

A faster way to do this is to use libraries such as cython and numba for static typing and compilation, but numba seems to need special treatment because it doesn't seem to optimize the string type. I don't understand cython enough, so I'll just introduce it.

Below is a list of sites that I have used as a reference.
https://pandas.pydata.org/pandas-docs/stable/enhancingperf.html


2022-09-30 18:24

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.