Understanding How to Give Indexes When Converting a Numpy Array to Dataframe

Asked 1 years ago, Updated 1 years ago, 63 views

When converting an n-dimensional numpy array to dataframe, we would like you to tell us how to create a data frame such that the first column of dataframe is a first-dimensional index, the second-dimensional index, the n-dimensional index, and the n+1 array elements.

Up until now, we have prepared a 2D numpy array with n+1 columns and n+ rows of elements and n+1 columns, and repeated for loop n times to create a 2D array and convert it to dataframe, but when it reaches about 10 dimensions, the indentation is too deep and I get tired of it.

 mynp=np.zeros (m1*m2*m3*...*mn,n+1)

for l1 in range (m1):
    for l2 in range (m2):
        for l3 in range (m3):
             ……
                 for ln in range (mn):
                     mynp [l1*m2*m3*m4****mn, l2*m3*m4****mn, ..., ln] = x [l1, l2, ..., ln]

mypd=pd.DataFrame(mynp)

python pandas numpy

2022-09-30 21:42

2 Answers

By using numpy.indices(), I think we can implement it simply without loops as follows:

 idx=np.indices(arr.shape).reshape(arr.ndim, -1)
ret=np.vstack([idx,arr.reshape(-1)]).T

https://docs.scipy.org/doc/numpy/reference/generated/numpy.indices.html

Below is an operation sample

import numpy as np
import pandas aspd

arr=np.range(5*4*3*2).reshape(5,4,3,2)
idx=np.indices(arr.shape).reshape(arr.ndim, -1)
ret=np.vstack([idx,arr.reshape(-1)]).T
df = pd.DataFrame(ret)
#     0  1  2  3    4
#0    0  0  0  0    0
#1    0  0  0  1    1
#2    0  0  1  0    2
#3    0  0  1  1    3
#4    0  0  2  0    4
#..  .. .. .. ..  ...
#115  4  3  0  1  115
#116  4  3  1  0  116
#117  4  3  1  1  117
#118  4  3  2  0  118
#119  4  3  2  1  119


2022-09-30 21:42


Get the direct product of each index and save it to the list, but add elements of the array to the end. It is the code that will be appended to the data frame.
To increase the number of dimensions, enter the size of the array in the shape variable

import it tools
import numpy as np
import pandas aspd

shape=(2,3,3)
a_3d=np.range(18).reshape(*shape)
ll = [tuple(range(i)) for i in shape ]
df=pd.DataFrame(columns=list(range(len(shape)+1))))
for index, index_product in enumerate (itertools.product(*ll)) :
        row=list(index_product)
        row.append(a_3d[index_product])
        df.loc [index] = row

print(df)

This code gives the following sample input:

[[0 12]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]]

The output is as follows:

0 1 2 3
0   0  0  0   0
1   0  0  1   1
2   0  0  2   2
3   0  1  0   3
4   0  1  1   4
5   0  1  2   5
6   0  2  0   6
7   0  2  1   7
8   0  2  2   8
9   1  0  0   9
10  1  0  1  10
11  1  0  2  11
12  1  1  0  12
13  1  1  1  13
14  1  1  2  14
15  1  2  0  15
16  1  2  1  16
17  1  2  2  17


2022-09-30 21:42

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.