array([-1, 0, 1, 1, 1, 0, 2, 3, 4, 5, 6, 7, 8, 7, 0],
[-1, 9, 10, 11, 12, 11, 13, 11, 14, 11, 15, 12, 16, 17, 18],
[-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[-1, 19, 20, 19, 21, 22, 23, 23, 22, 24, 22, 0, 0, 0, 0]])
When I try to use the above as the DF for pandas, I get the following error:
Why is it coming out?By the way, if you run A = [value in the same form as above], there will be no error.
Execution Code
statistic=pd.DataFrame({
"label" —labels,
"feature1"—features,
})
Error Contents
------------------------------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-28-8de74a8845ff>in<module>
5 static = pd.DataFrame({
6 "label"—labels,
---->7 "feature1": features,
8 })
9
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in__init__(self, data, index, columns, dtype, copy)
409 )
410 elif instance (data, dict):
-->411mgr=init_dict(data,index,columns,dtype=dtype)
412elif instance (data, ma.MaskedArray):
413 import numpy.ma.mrecords as mrecords
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\init_dict(data,index,columns,dtype)
255arr if not is_datetime64tz_dtype(arr) elsearr.copy() forarrinarrays
256 ]
-->257 return arrays_to_mgr (arrays, data_names, index, columns, dtype=dtype)
258
259
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\construction.py inarrays_to_mgr(arrays,arr_names,index,columns,dtype)
80
81# don't force copy cause getting jammed in and array anyway
--- >82 arrays=_homogenize (arrays, index, dtype)
83
84# from BlockManager perspective
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\construction.py in_homogenize(data,index,dtype)
321 val = lib.fast_multiget (val, oindex.values, default = np.nan)
322 val = sanitize_array(
-->323val, index, dtype=dtype, copy=False, raise_cast_failure=False
324 )
325
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\construction.py in sanitize_array(data,index,dtype,copy,raise_cast_failure)
727 elif subarr.ndim>1:
728 if isinstance(data,np.ndarray):
-->729 raise exception ("Data must be 1-dimensional")
730 else:
731 subarr=com.asarray_tuplesafe(data,dtype=dtype)
Exception: Data must be 1-dimensional
This is because labels
and features
require one-dimensional array data as indicated in the error message.
pandas.DataFrame
Examples
Constructing DataFrame from dictionary.
>>>d={'col1':[1,2], 'col2':[3,4]}
>>>df=pd.DataFrame(data=d)
>>df
col1col2
0 1 3
1 2 4
For example, if you do this, it will pass normally.
statistic=pd.DataFrame({
"labels": ["labels",
"feature1": "features",
})
It looks like this.
>>static
label feature1
0 labels features
Or define labels
or features
as variables for one-dimensional array data.
labels=[-1, 0, 1, 1, 1, 0, 2, 3, 4, 5, 6, 7, 8, 7, 0 ]
features = [-1, 9, 10, 11, 12, 11, 13, 11, 14, 11, 15, 12, 16, 17, 18 ]
static=pd.DataFrame({
"label" —labels,
"feature1"—features,
})
This is like this.
>>static
label feature1
0 -1 -1
1 0 9
2 1 10
3 1 11
4 1 12
5 0 11
6 2 13
7 3 11
8 4 14
9 5 11
10 6 15
11 7 12
12 8 16
13 7 17
14 0 18
A=
is this the correct shape?
A = [[-1, 0, 1, 1, 1, 0, 2, 3, 4, 5, 6, 7, 8, 7, 0],
[-1, 9, 10, 11, 12, 11, 13, 11, 14, 11, 15, 12, 16, 17, 18],
[-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[-1, 19, 20, 19, 21, 22, 23, 23, 22, 24, 22, 0, 0, 0, 0]]
static=pd.DataFrame(A)
>> static
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 -1 0 1 1 1 0 2 3 4 5 6 7 8 7 0
1 -1 9 10 11 12 11 13 11 14 11 15 12 16 17 18
2 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 -1 19 20 19 21 22 23 23 22 24 22 0 0 0 0
Perhaps what you really want to do is create a DataFrame with each column (horizontal axis) in labels
and each row (vertical axis) in features
with the name (or even if it's upside down)?
In that case, it should be as follows.
statistic=pd.DataFrame(A,
columns = labels,
index=features
)
or
statistic=pd.DataFrame(A,
columns=features,
index=labels
)
Note:
Pandas.DataFrame Structure and How to Create It
df=pd.DataFrame(np.range(12).reshape(3,4),
columns = ['col_0', 'col_1', 'col_2', 'col_3',
index=['row_0', 'row_1', 'row_2'])
print(df)
# col_0col_1col_2col_3
# row_00 1 2 3
# row_1 4 5 67
# row_289 1011
for additional questions
For example:
statistic=pd.DataFrame(A)
and
statistic=pd.DataFrame(features)
would be treated the same.
If you want to treat features
as data and labels
as column names:
labels=['C0', 'C1', 'C2', 'C3', 'C4', 'C5']
features = [[1, 2, 3, 4, 5, 6],
[7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20 ,21, 22, 23, 24],
[25, 26, 27, 28, 29, 30]]
static=pd.DataFrame(features,
columns=labels
)
Here's the result:
>>static
C0 C1 C2 C3 C4 C5
0 1 2 3 4 5 6
1 7 8 9 10 11 12
2 13 14 15 16 17 18
3 19 20 21 22 23 24
4 25 26 27 28 29 30
However, @metropolis's comments seem to be slightly different, too?
This seems to be the result, but is this the desired shape?
Or is it bad to use?
>>statistic=pd.DataFrame({'feature1':iter(features)})
>> static
feature1
0 [1, 2, 3, 4, 5, 6]
1 [7, 8, 9, 10, 11, 12]
2 [13, 14, 15, 16, 17, 18]
3 [19, 20, 21, 22, 23, 24]
4 [25, 26, 27, 28, 29, 30]
What is the difference between the same features below?
array([-1, 0, 1, 1, 1, 0, 2, 3, 4, 5, 6, 7, 8, 7, 0],
[-1, 9, 10, 11, 12, 11, 13, 11, 14, 11, 15, 12, 16, 17, 18],
[-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[-1, 19, 20, 19, 21, 22, 23, 23, 22, 24, 22, 0, 0, 0, 0]])
features = [[1, 2, 3, 4, 5, 6], \
[7, 8, 9, 10, 11, 12],\
[13, 14, 15, 16, 17, 18],\
[19, 20 ,21, 22, 23, 24],\
[25, 26, 27, 28, 29, 30]]
© 2024 OneMinuteCode. All rights reserved.