One-dimensional error occurs in Pandas.

Asked 2 years ago, Updated 2 years ago, 68 views

array([-1, 0, 1, 1, 1, 0, 2, 3, 4, 5, 6, 7, 8, 7, 0],
       [-1,  9, 10, 11, 12, 11, 13, 11, 14, 11, 15, 12, 16, 17, 18],
       [-1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [-1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [-1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [-1, 19, 20, 19, 21, 22, 23, 23, 22, 24, 22,  0,  0,  0,  0]])

When I try to use the above as the DF for pandas, I get the following error:
Why is it coming out?By the way, if you run A = [value in the same form as above], there will be no error.

Execution Code

statistic=pd.DataFrame({
    "label" —labels,
    "feature1"—features,
})

Error Contents

------------------------------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-28-8de74a8845ff>in<module>
      5 static = pd.DataFrame({
      6 "label"—labels,
---->7 "feature1": features,
      8 })
      9 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in__init__(self, data, index, columns, dtype, copy)
    409             )
    410 elif instance (data, dict):
-->411mgr=init_dict(data,index,columns,dtype=dtype)
    412elif instance (data, ma.MaskedArray):
    413 import numpy.ma.mrecords as mrecords

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\init_dict(data,index,columns,dtype)
    255arr if not is_datetime64tz_dtype(arr) elsearr.copy() forarrinarrays
    256         ]
-->257 return arrays_to_mgr (arrays, data_names, index, columns, dtype=dtype)
    258 
    259 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\construction.py inarrays_to_mgr(arrays,arr_names,index,columns,dtype)
     80 
     81# don't force copy cause getting jammed in and array anyway
--- >82 arrays=_homogenize (arrays, index, dtype)
     83 
     84# from BlockManager perspective

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\construction.py in_homogenize(data,index,dtype)
    321 val = lib.fast_multiget (val, oindex.values, default = np.nan)
    322 val = sanitize_array(
-->323val, index, dtype=dtype, copy=False, raise_cast_failure=False
    324             )
    325 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\construction.py in sanitize_array(data,index,dtype,copy,raise_cast_failure)
    727 elif subarr.ndim>1:
    728 if isinstance(data,np.ndarray):
-->729 raise exception ("Data must be 1-dimensional")
    730 else:
    731 subarr=com.asarray_tuplesafe(data,dtype=dtype)

Exception: Data must be 1-dimensional

python python3 pandas numpy

2022-09-30 10:29

2 Answers

This is because labels and features require one-dimensional array data as indicated in the error message.
pandas.DataFrame

Examples
Constructing DataFrame from dictionary.

>>>d={'col1':[1,2], 'col2':[3,4]}
>>>df=pd.DataFrame(data=d)
>>df
   col1col2
0     1     3
1     2     4

For example, if you do this, it will pass normally.

statistic=pd.DataFrame({
    "labels": ["labels",
    "feature1": "features",
})

It looks like this.

>>static
    label feature1
0 labels features

Or define labels or features as variables for one-dimensional array data.

 labels=[-1, 0, 1, 1, 1, 0, 2, 3, 4, 5, 6, 7, 8, 7, 0 ]
features = [-1, 9, 10, 11, 12, 11, 13, 11, 14, 11, 15, 12, 16, 17, 18 ]

static=pd.DataFrame({
    "label" —labels,
    "feature1"—features,
})

This is like this.

>>static
    label feature1
0      -1        -1
1       0         9
2       1        10
3       1        11
4       1        12
5       0        11
6       2        13
7       3        11
8       4        14
9       5        11
10      6        15
11      7        12
12      8        16
13      7        17
14      0        18

A= is this the correct shape?

 A = [[-1, 0, 1, 1, 1, 0, 2, 3, 4, 5, 6, 7, 8, 7, 0],
       [-1,  9, 10, 11, 12, 11, 13, 11, 14, 11, 15, 12, 16, 17, 18],
       [-1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [-1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [-1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [-1, 19, 20, 19, 21, 22, 23, 23, 22, 24, 22,  0,  0,  0,  0]]
static=pd.DataFrame(A)
>> static
   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14
0 -1   0   1   1   1   0   2   3   4   5   6   7   8   7   0
1 -1   9  10  11  12  11  13  11  14  11  15  12  16  17  18
2 -1   0   0   0   0   0   0   0   0   0   0   0   0   0   0
3 -1   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4 -1   0   0   0   0   0   0   0   0   0   0   0   0   0   0
5 -1  19  20  19  21  22  23  23  22  24  22   0   0   0   0

Perhaps what you really want to do is create a DataFrame with each column (horizontal axis) in labels and each row (vertical axis) in features with the name (or even if it's upside down)?
In that case, it should be as follows.

statistic=pd.DataFrame(A,
    columns = labels,
    index=features
)

or

statistic=pd.DataFrame(A,
    columns=features,
    index=labels
)

Note:
Pandas.DataFrame Structure and How to Create It

df=pd.DataFrame(np.range(12).reshape(3,4),
                  columns = ['col_0', 'col_1', 'col_2', 'col_3',
                  index=['row_0', 'row_1', 'row_2'])

print(df)
#        col_0col_1col_2col_3
# row_00 1 2 3
# row_1 4 5 67
# row_289 1011

for additional questions
For example:

statistic=pd.DataFrame(A)

and

statistic=pd.DataFrame(features)

would be treated the same.

If you want to treat features as data and labels as column names:

 labels=['C0', 'C1', 'C2', 'C3', 'C4', 'C5']
features = [[1, 2, 3, 4, 5, 6],
            [7, 8, 9, 10, 11, 12],
            [13, 14, 15, 16, 17, 18],
            [19, 20 ,21, 22, 23, 24],
            [25, 26, 27, 28, 29, 30]]

static=pd.DataFrame(features,
    columns=labels
)

Here's the result:

>>static
   C0 C1 C2 C3 C4 C5
0   1   2   3   4   5   6
1   7   8   9  10  11  12
2  13  14  15  16  17  18
3  19  20  21  22  23  24
4  25  26  27  28  29  30

However, @metropolis's comments seem to be slightly different, too?
This seems to be the result, but is this the desired shape?
Or is it bad to use?

>>statistic=pd.DataFrame({'feature1':iter(features)})
>> static
                   feature1
0        [1, 2, 3, 4, 5, 6]
1     [7, 8, 9, 10, 11, 12]
2  [13, 14, 15, 16, 17, 18]
3  [19, 20, 21, 22, 23, 24]
4  [25, 26, 27, 28, 29, 30]


2022-09-30 10:29

What is the difference between the same features below?

array([-1, 0, 1, 1, 1, 0, 2, 3, 4, 5, 6, 7, 8, 7, 0],
       [-1,  9, 10, 11, 12, 11, 13, 11, 14, 11, 15, 12, 16, 17, 18],
       [-1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [-1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [-1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [-1, 19, 20, 19, 21, 22, 23, 23, 22, 24, 22,  0,  0,  0,  0]])
 features = [[1, 2, 3, 4, 5, 6], \
            [7, 8, 9, 10, 11, 12],\
            [13, 14, 15, 16, 17, 18],\
            [19, 20 ,21, 22, 23, 24],\
            [25, 26, 27, 28, 29, 30]]


2022-09-30 10:29

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.