To correlate with the previous date in Pandas

Asked 2 years ago, Updated 2 years ago, 45 views

If calorie consumption is related to the next day's weight change,
When calculating correlation,
not the same time I think it will be different time, but
How do I define Dataframe?
(Define dataframe and calculate correlation in df.corr)

Normally, pd.dataframe(table[['calorie consumption, 'weight']]) and
I think it comes out as df.corr, but I don't know how to express the calorie consumption on the day and the weight on the next day.

2000-01-01 calorie consumption, 2000-01-02 weight
2000-01-02 calorie consumption, 2000-01-04 weight
·
·

calorie consumption weight
2000-01-01 300 kcal 60 kg
2000-01-02 400 kcal 59.9 kg
2000-01-04 800 kcal 59.8 kg
2000-01-05 100 kcal 59.6 kg 
2000-01-07 200 kcal 59.6 kg

python python3 pandas

2022-09-30 21:42

1 Answers

This article probably applies.
How to Offset Pandas Pearson Correlation with Datetime Index
ASK:

I'm trying to get a correlation value for a previous week's inputs to the following week's output.
For the sake of this example I've set it up where each week's input will be the following week's Output, and the df.corr() should give a 1.0000 result.

You are trying to get a correlation value between last week's input and the next week's output.
For this example, I set the input for each week to be the output for the following week, and df.corr() should give the result of 1.000000.

And as a newbie here's where I'm stuck. I don't see a shift option built in the function and not sure how to do this.

And as a beginner, I'm stuck here. I haven't seen the shift option built into the function, and I don't know how to do this.

Answer:

If you do .corr on a dataframe, it will produce a correlation matrix.
In your case, you just want the correlation between two time series and you can achieve this with the code below.Note that the .corr method for a time series requirements the parameter other, which is the series to compute the correlation.Running .corr on a data frame generates a correlation matrix.
In your case, you just want a correlation between the two time series, so you can achieve this with the code below. Note that the time series .corr method requires the parameter other, which is a sequence for calculating correlation.

df["Input"].corr(df["Output"].shift(-1), method='pearson', min_periods=1)#1

If installed you want the correlation matrix, you should first create a dataframe with shifted Output and then compute the correlation:

If you need a correlation matrix instead, you must first create a data frame with shifted output before calculating the correlation.

temp_df=pd.concat([df['Input'], df['Output'].shift(-1)],axis=1).dropna()
temp_df.corr(method='pearson', min_periods=1)

#        Input Output
# Input 1.0 1.0
# Output 1.0 1.0

If we apply the end of the above quotation, wouldn't it be as follows?
However, since the above article is two years old, it is not clear whether the details are still the same.
Try it

temp_df=pd.concat([df['calorie consumption'], df['weight'].shift(-1)],axis=1).dropna()
temp_df.corr(method='pearson', min_periods=1)


2022-09-30 21:42

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.