I would like to check if the data in the column column
in Excel is numeric or non-numeric.I try to check using isdigit
, but df
is not considered a string. How do I determine the variable df
that contains Excel data read by pandas
as a string?
As a condition, I'd like to take out the data one by one and check if it's a number, so it's better not to check them all together.
It doesn't make any sense to use pandas
to read Excel files.
I would appreciate it if you could let me know if there is any other good way.
import pandas as pd
df=pd.read_excel("test.xlsx", sheet_name="sheet1")
for df_chindf ["column"]:
if df_ch.isdigit():
print ("Number")
else:
print ("Non-numeric")
Excel data
import pandas as pd
df=pd.read_excel("test.xlsx", sheet_name="sheet1", header=None, names=['column', dtype=str)
for df_chindf ['column']:
if df_ch.isdigit():
print ("Number")
else:
print ("Non-numeric")
I have tried the above, but I get the following error:
Where is the cause?
if df_ch.isdigit():
AttributeError: 'float' object has no attribute' isdigit'
If the specified column is recognized as a numeric item, isdigit()
should get an error like this=>AttributeError: 'int' object has no attribute'
その If such an error occurs, it would be better to specify it in the question
You may want to use .astype(str)
to treat such data as strings
>>import pandas as pd
>>>df=pd.DataFrame({'A': [100,200], 'B': [110,120], 'C': [210,300]})
>>df
ABC Corporation
0 100 110 210
1 200 120 300
>>df.dtypes
Aint64
Bint64
Cint64
dtype:object
>>>df['C'].astype(str)
0 210
1 300
Name:C, dtype:object
>> [v.isdigit() for vindf ['C'].astype(str)]
[True, True]
If you do not need to use pandas
, you can use openpyxl
to load Excel to get the cell type.
Below is a sample code that retrieves the cell type with the data_type
property of the cell.(pip install openpyxl
)
sample code
from openpyxl import load_workbook
wb=load_workbook("test.xlsx")
ws = wb ["Sheet1" ]
column=ws ["A"]
for row in range (1,len(column)):
cell=column [row]
ts = "Number" if cell.data_type == "n" else "Non-numeric"
print(f"{cell.value} is {ts}")
import pandas as pd
import numpy as np
df=pd.read_excel("test.xlsx", sheet_name="sheet1", header=None, names=['column', dtype=str)
print(np.where(df['column'].str.isdigit(), 'numeric', 'non-numeric')))
#
["Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number"]
"Number", "Number", "Number", "Number"]
"Because the conditions for judging as ""numeric"" are ambiguous, we decided whether it was an integer value or not."
Also, I don't know if I want to judge input values such as '0003 as strings or numbers, so I decided to judge them as numbers this time.
[Enter the following data in column 'A' of 'Sheet1']
1
'0002.2
'0003
'a'
'0004
'0005
import pandas as pd
import pathlib
file_path=pathlib.Path(__file__).parent/'test.xlsx'
df=pd.read_excel(file_path, sheet_name=0, header=None)
for df [0]:
isDigit=False
if isinstance(d,str):
if d.isdigit():
isDigit = True
elif isinstance(d, int):
isDigit = True
ifisDigit:
print(repr(d), 'Integer value.')
else:
print(repr(d), 'Non-integer value.')
1 integer value.
Other than the '0002.2' integer value.
Integer value '0003'.
Other than the 'a' integer value.
Integer value '0005'.
You will need to prepare and process various things in advance.
< u l > < l i > < p > < code > of the column name columnspieces, I don't have a place where which defines < / code >.dealing with any of the following is required. < / > < u l > Excel image which is presented < l i > (01 01 < / code > < code > ) from the first line data row is inserted into the column names for a column name here, but the world/s Columbia < code > < / code > is, I take it, < / >header=None
とname=
に列名リスト(画像の場合1列だけなので['column']
)を指定しておくThere is no place to define the column name column
.dealing with any of the following is required. < / >< u l >
Excel image which is presented < l i > (01 01 < / code > < code > ) from the first line data row is inserted into the column names for a column name here, but the world/s Columbia < code > < / code > is, I take it, < / >
header=None
とname=
に列名リスト(画像の場合1列だけなので['column']
)を指定しておくBy default, data that can be converted to numbers is converted to numbers (int or float) as much as possible and stored in DataFrame.dealing with out of the question of any of these followings. < / >< u l > d_elli Excel (< l i >) < code > a parameter so as to be dealt a string giving the time Specifies the type = str < / code >. < / > < l i > < code > for < / code > (< code > to judge, [ ' Columbia ' ] < / code > ) as it is of a data frame when a string designating the row of the loop and converts it into a character string not merely a series (< code >. Series ([ ' Columbia ' ], harpooned somebody, and type = ' Missing StringTokenizer ' ) < / code >) will be issued. < / >
For example, it's going to be one of these things.
Read all data as a string:
import pandas as pd
df=pd.read_excel("test.xlsx", sheet_name="sheet1", header=None, names=['column', dtype=str)
for df_chindf ['column']:
if df_ch.isdigit():
print ("Number")
else:
print ("Non-numeric")
Convert only that column to a string when checking:
import pandas as pd
df=pd.read_excel("test.xlsx", sheet_name="sheet1", header=None, names=['column'])
for df_chimpd.Series(df['column', dtype='string'):
if df_ch.isdigit():
print ("Number")
else:
print ("Non-numeric")
add
I tried my first answer and got an error because there was a blank somewhere.
With this article in mind, you can add keep_default_na=False
to the read_excel() parameter, or you can use df=df.filna(')
to process the read df.
Python Pandas read_excel dtype str replace nan by blank(') when reading or when writing via to_csv
pandas.read_excel
pandas.DataFrame.filna
It is recommended to add keep_default_na=False
to the parameters.
If df=df.filna(')
is processed, the entire column is treated as a floating-point number when read_excel()
has at least one data that can be considered a floating-point number instead of NaN
.
© 2025 OneMinuteCode. All rights reserved.