I want to speed up the process of integration and linear storage in Pandas, numpy.

Asked 1 years ago, Updated 1 years ago, 498 views

I'm creating software for numerical analysis on Python, but it takes time and I want to speed it up.

After analyzing with cProfile, we found that the following two functions, integr() and scipy's interp1d, account for approximately 47% of the total processing time (total:54.4s, integrate:14.7s, interp1d:11.2s).

Here is a summary of the code for the affected parts:

#-*-coding:utf-8-*-

import numpy as np
import pandas aspd
from scipy.interpolate import interp1d


def integrate(matrix,t,start_time=0.0):
    time=np.array(matrix["time"])
    matches=np.where(start_time<=time, True, False) *np.where(time<=t, True, False)
    matched_time = time [ matches ]
    dt=(matched_time-np.insert(matched_time, 0, 0)[0:len(matched_time))[1:]
    x = np.array(matrix["vel_x"]) [ matches ]
    y=np.array(matrix["vel_y"]) [matrix]
    z=np.array(matrix["vel_z"]) [matrix]

    return np.array([
        np.sum(x[:len(matched_time)-1]*dt) + x[-1]*(t-matched_time[-1]),
        np.sum(y[:len(matched_time)-1]*dt) + y[-1]*(t-matched_time[-1]),
        np.sum(z[:len(matched_time)-1]*dt)+z[-1]*(t-matched_time[-1]),
    ])


if__name__=="__main__":
    data=pd.DataFrame([0.0, 0.0, 0.0, 0.0], [1.0, 1.0, 0.5, 1.5], [2.0, 2.0, 4.0, 3.0], [3.0, 3.0, 9.0, 10.0]], columns=["time", "vel_x", "vel_y", "vel_z")
    print("data:\n", data)
    # data:
    #       timevel_xvel_yvel_z
    #   0   0.0    0.0    0.0    0.0
    #   1   1.0    1.0    0.5    1.5
    #   2   2.0    2.0    4.0    3.0
    #   3   3.0    3.0    9.0   10.0

    fort in np.range(0,3,0.0001):
        position=integrate(data,t)
        vel_x_t=interp1d(data["time"], data["vel_x"], bound_error=False, fill_value=(0,0))(t)if isinstance(data,pd.DataFrame) else data

        # sampling
        if t == 2.5:
            print("t:",t)
            print("integrate result(position):\n", position)
            print("interpolate result(vel_x_t):\n",vel_x_t)
            # t —2.5
            # Integrate result (position):
            #  [2.  2.5 3. ]
            # translate result(vel_x_t):
            #  2.5


*Integrate() is processed as follows.
 "If there is data recording the time t and the state x at that time, an integral value of the state x at a certain time t=t1 is calculated."

The integrate was initially written using the for statement calculation and then using the pandas apply, but
It took a long time, so as I mentioned above, I took it out with np.array and rewritten it to process it.

Interp1d is a simple process, so I haven't come up with an idea for a fundamental acceleration, but
Maybe it's taking a long time because it's called in a loop, but I'm thinking that it would be better if it was a little faster.

Is it possible to rewrite these to faster processing?
Thank you for your cooperation.

python pandas numpy

2022-12-24 08:25

2 Answers

We're doing the same calculations in the loop, so we've swept them out of the loop.It's 2.2 to 2.4 times faster in your environment.

import numpy as np
import pandas aspd
from scipy.interpolate import interp1d

def integrate(time,vel_x,vel_y,vel_z,t,start_time=0.0):
    matches=np.logical_and (start_time<=time, time<=t)
    matched_time = time [ matches ]
    dt = np.diff(matched_time)
    x,y,z=vel_x [matches],vel_y [matches],vel_z [matches]
    t - = matched_time [-1]

    return np.array([
        np.sum(x[:-1]*dt) + x[-1]*t,
        np.sum(y[:-1]*dt) + y[-1]*t,
        np.sum(z[:-1]*dt)+z[-1]*t,
    ])

if__name__=="__main__":
    data=pd.DataFrame([
      [0.0, 0.0, 0.0, 0.0], [1.0, 1.0, 0.5, 1.5],
      [2.0, 2.0, 4.0, 3.0], [3.0, 3.0, 9.0, 10.0]
    ], columns = ["time", "vel_x", "vel_y", "vel_z" ])
    print("data:\n", data)

    vel_x=interp1d(data["time"], data["vel_x"], bound_error=False, fill_value=(0,0))
    args=data.values.T
    fort in np.range(0,3,0.0001):
        position=integrate(*args,t)
        vel_x_t = vel_x(t)
    
        # sampling
        if t == 2.5:
            print("t:",t)
            print("integrate result(position):\n", position)
            print("interpolate result(vel_x_t):\n",vel_x_t)


2022-12-24 08:59

Other (metropolis's) answers seem to speed it up a lot, so
Other parts

By calculating x, y, and z at the same time, it seems that colab has become slightly faster…
Maybe it's better to compare them in a real environment?

def integrate(matrix,t,start_time=0.0):
    time,xyz=np.split(matrix, [1])
    time = time [0]
    matches=(start_time<=time)&(time<=t)
    matched_time = time [ matches ]
    dt=np.append(np.diff(matched_time), t-matched_time[-1])
    xyz=xyz [:, matches]
    res=np.sum(xyz*dt,axis=1)
    return res

# Callers (various omissions)
if__name__=="__main__":
    fort in np.range(0,3,0.0001):
        position = integrate(args,t)# 扱い without '*' because the handling method is different


2022-12-24 09:27

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.