This is the result of my own trial and error in accelerating speed, but I am ashamed to say that there is still room for improvement.
Could you tell me how to speed this up?
For your information, there are actually 17 loop statements that I am running, ab, cde,,,,, and most of the variable ranges are 3, so I run about 40 million different things.
forab in range(3):
for cde in range(2):
for fgin range(3):
for hi in range (3):
Return=np.r_[Return_AB[ab], #Return_AB contains (1,41) np.array
Return_CDE[cde], #Return_CDE contains (1,41) np.array
Return_FG[fg], #Return_FG contains (1,41) np.array
Return_HI[hi]]#Return_HI contains (1,41) np.array
Return_total=np.sum(Return,axis=0)
Return_dif = Return_total - BM# BM is (1,41) data frame
Num0=max(Num0_AB[ab], Num0_CDE[cde], Num0_FG[fg], Num0_HI[hi])#4 to 8 values
Win_Pro=(Return_dif.iloc[:,Num0:]>0).sum(axis=1)/(Number_Date-Num0)
if Win_Pro.item()<1:
continue
Cum_return=np.prod(Return_dif.iloc[:,Num0:]+1,axis=1)-1
if Cum_return.item()<0.1:
continue
TE=Return_dif.iloc[:,Num0:].std(axis=1)
Result.append ([Win_Pro.item(), Cum_return.item(), TE.item(), Num0, ab, cde, fg, hi])
In the case of this question, using Numpy or Pandas vector calculations instead of repeating them will require a lot of memory, so it would be better to leave the repeating process as it is and use Numba
or Cython
.
Numba
is easy to use, so why don't you try Numba
for now?
import number
@numba.jit
def calc():
NMAX = 10000000 # Keep the number not overflowing
Win_Pro=np.zeros (NMAX)
Cum_return=np.zeros (NMAX)
TE = np.zeros (NMAX)
N=np.zeros (NMAX, dtype=int)
A=np.zeros(4,NMAX), dtype=int)
int n = 0
Forab in range (3):
for cde in range(2):
for fgin range(3):
for hi in range (3):
Return_total=Return_AB[ab]+#Return_AB contains (41) np.array
Return_CDE[cde]+#Return_CDE contains (41) np.array
Return_FG[fg]+#Return_FG contains (41) np.array
Return_HI[hi]#Return_HI contains (41) np.array
Return_dif=Return_total - BM#BM is converted to np.array of (41).DataFrame and Series in Pandas can be converted from .df.values to np.array
Num0=max(Num0_AB[ab], Num0_CDE[cde], Num0_FG[fg], Num0_HI[hi])#4 to 8 values
Win_Pro[n]=(Return_dif.iloc[:,Num0:]>0).sum()/(Number_Date-Num0)
if Win_Pro[n] <1:
continue
Cum_return[n] = np.prod(Return_dif.iloc[:,Num0:]+1)-1
if Cum_return[n]<0.1:
continue
TE[n] = Return_dif.iloc[:,Num0:].std()
N[n] = Num0
A[n] = np.array ([ab, cde,fg, hi])
n + = 1
return Win_Pro[:n], Cum_return[:n], TE[:n], N[:n], A[:n,:]
Return=np.r_
combines ndarray
, but I think it would be faster to omit it and add it directly.
Also, although Result is a list of lists, I tried changing the list to ndarray
because it is slow to process.pd.concat
to Pandas DataFrame is useful for future use.
If Numba
is still slow, you can use Cython
. The following official documentation will help you to use Cython
.
·Cython Working with NumPy
·Pandas Enhancing Performance
Cython
is not that difficult, but declaring the type of variable takes a lot of time.For example, if you're using Jupiter Notebook, first import Cython's magic function
%loadext Cython
The following code will work for now:
%cython
def calc(Return_AB, Return_CDE, Return_FG,, Num0_AB, Num0_CDE, Num0_FG, Num0_HI, BM):
# hereinafter abbreviated
Declaring the type of variable will speed up the processing.
548 Who developed the "avformat-59.dll" that comes with FFmpeg?
727 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
550 PHP ssh2_scp_send fails to send files as intended
549 rails db:create error: Could not find mysql2-0.5.4 in any of the sources
540 Uncaught (inpromise) Error on Electron: An object could not be cloned
© 2024 OneMinuteCode. All rights reserved.