I want to check Excel data by numbers or non-numeric ones.

Asked 2 years ago, Updated 2 years ago, 299 views

I would like to check if the data in the column column in Excel is numeric or non-numeric.I try to check using isdigit, but df is not considered a string. How do I determine the variable df that contains Excel data read by pandas as a string?
As a condition, I'd like to take out the data one by one and check if it's a number, so it's better not to check them all together.
It doesn't make any sense to use pandas to read Excel files.
I would appreciate it if you could let me know if there is any other good way.

import pandas as pd
df=pd.read_excel("test.xlsx", sheet_name="sheet1")
for df_chindf ["column"]:
   if df_ch.isdigit():
        print ("Number")
  else:
        print ("Non-numeric")

Excel data

Enter a description of the image here

import pandas as pd
df=pd.read_excel("test.xlsx", sheet_name="sheet1", header=None, names=['column', dtype=str)
for df_chindf ['column']:
   if df_ch.isdigit():
      print ("Number")
   else:
      print ("Non-numeric")

I have tried the above, but I get the following error:
Where is the cause?

 if df_ch.isdigit():
AttributeError: 'float' object has no attribute' isdigit'

python pandas excel

2022-09-30 22:05

5 Answers

If the specified column is recognized as a numeric item, isdigit() should get an error like this=>AttributeError: 'int' object has no attribute'

その If such an error occurs, it would be better to specify it in the question

You may want to use .astype(str) to treat such data as strings

>>import pandas as pd
>>>df=pd.DataFrame({'A': [100,200], 'B': [110,120], 'C': [210,300]})
>>df
     ABC Corporation
0  100  110  210
1  200  120  300
>>df.dtypes
Aint64
Bint64
Cint64
dtype:object
>>>df['C'].astype(str)
0    210
1    300
Name:C, dtype:object
>> [v.isdigit() for vindf ['C'].astype(str)]
[True, True]


2022-09-30 22:05

If you do not need to use pandas, you can use openpyxl to load Excel to get the cell type.

Below is a sample code that retrieves the cell type with the data_type property of the cell.(pip install openpyxl)

sample code

from openpyxl import load_workbook
wb=load_workbook("test.xlsx")
ws = wb ["Sheet1" ]
column=ws ["A"]
for row in range (1,len(column)):
    cell=column [row]
    ts = "Number" if cell.data_type == "n" else "Non-numeric" 
    print(f"{cell.value} is {ts}")


2022-09-30 22:05

import pandas as pd
import numpy as np

df=pd.read_excel("test.xlsx", sheet_name="sheet1", header=None, names=['column', dtype=str)

print(np.where(df['column'].str.isdigit(), 'numeric', 'non-numeric')))

#
["Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number", "Number"]
 "Number", "Number", "Number", "Number"]


2022-09-30 22:05

"Because the conditions for judging as ""numeric"" are ambiguous, we decided whether it was an integer value or not."
Also, I don't know if I want to judge input values such as '0003 as strings or numbers, so I decided to judge them as numbers this time.

 [Enter the following data in column 'A' of 'Sheet1']
1
'0002.2
'0003
'a'
'0004
'0005

import pandas as pd
import pathlib

file_path=pathlib.Path(__file__).parent/'test.xlsx'

df=pd.read_excel(file_path, sheet_name=0, header=None)
for df [0]:
    isDigit=False
    if isinstance(d,str):
        if d.isdigit():
            isDigit = True
    elif isinstance(d, int):
        isDigit = True

    ifisDigit:
        print(repr(d), 'Integer value.')
    else:
        print(repr(d), 'Non-integer value.')

1 integer value.
Other than the '0002.2' integer value.
Integer value '0003'.
Other than the 'a' integer value.
Integer value '0005'.


2022-09-30 22:05

You will need to prepare and process various things in advance.

< u l > < l i > < p > < code > of the column name columnspieces, I don't have a place where which defines < / code >.dealing with any of the following is required. < / > < u l > Excel image which is presented < l i > (01 01 < / code > < code > ) from the first line data row is inserted into the column names for a column name here, but the world/s Columbia < code > < / code > is, I take it, < / >
  • Excelの内容を変更しないのなら、read_excel()の時にパラメータとしてheader=Nonename=に列名リスト(画像の場合1列だけなので['column'])を指定しておく
  • < l i > < code > people evacuated from their homes without the specified column name a serial number starting with < / code > (Excel < code > of the line A, B, 1, is, equal to 1, 2 < / code > < code > ) is assigned an index of information is used.None < code > to the parameter to the Reader =. you determine < / code > < / > < l i > < p > By default, converted into data numerical value is transformed into numeric (int and float ) as much as possible, a Data stored in the Framework.dealing with out of the question of any of these followings. < / > < u l > d_elli Excel (< l i >) < code > a parameter so as to be dealt a string giving the time Specifies the type = str < / code >. < / > < l i > < code > for < / code > (< code > to judge, [ ' Columbia ' ] < / code > ) as it is of a data frame when a string designating the row of the loop and converts it into a character string not merely a series (< code >. Series ([ ' Columbia ' ], harpooned somebody, and type = ' Missing StringTokenizer ' ) < / code >) will be issued. < / >

    There is no place to define the column name column.dealing with any of the following is required. < / >< u l > Excel image which is presented < l i > (01 01 < / code > < code > ) from the first line data row is inserted into the column names for a column name here, but the world/s Columbia < code > < / code > is, I take it, < / >

  • Excelの内容を変更しないのなら、read_excel()の時にパラメータとしてheader=Nonename=に列名リスト(画像の場合1列だけなので['column'])を指定しておく
  • < l i > < code > people evacuated from their homes without the specified column name a serial number starting with < / code > (Excel < code > of the line A, B, 1, is, equal to 1, 2 < / code > < code > ) is assigned an index of information is used.None < code > to the parameter to the Reader =. you determine < / code > < / >

    By default, data that can be converted to numbers is converted to numbers (int or float) as much as possible and stored in DataFrame.dealing with out of the question of any of these followings. < / >< u l > d_elli Excel (< l i >) < code > a parameter so as to be dealt a string giving the time Specifies the type = str < / code >. < / > < l i > < code > for < / code > (< code > to judge, [ ' Columbia ' ] < / code > ) as it is of a data frame when a string designating the row of the loop and converts it into a character string not merely a series (< code >. Series ([ ' Columbia ' ], harpooned somebody, and type = ' Missing StringTokenizer ' ) < / code >) will be issued. < / >

    For example, it's going to be one of these things.
    Read all data as a string:

    import pandas as pd
    df=pd.read_excel("test.xlsx", sheet_name="sheet1", header=None, names=['column', dtype=str)
    for df_chindf ['column']:
       if df_ch.isdigit():
          print ("Number")
       else:
          print ("Non-numeric")
    

    Convert only that column to a string when checking:

    import pandas as pd
    df=pd.read_excel("test.xlsx", sheet_name="sheet1", header=None, names=['column'])
    for df_chimpd.Series(df['column', dtype='string'):
       if df_ch.isdigit():
          print ("Number")
       else:
          print ("Non-numeric")
    

    add

    I tried my first answer and got an error because there was a blank somewhere.
    With this article in mind, you can add keep_default_na=False to the read_excel() parameter, or you can use df=df.filna(') to process the read df.
    Python Pandas read_excel dtype str replace nan by blank(') when reading or when writing via to_csv
    pandas.read_excel
    pandas.DataFrame.filna

    It is recommended to add keep_default_na=False to the parameters.
    If df=df.filna(') is processed, the entire column is treated as a floating-point number when read_excel() has at least one data that can be considered a floating-point number instead of NaN.


    2022-09-30 22:05

    If you have any answers or tips


    © 2024 OneMinuteCode. All rights reserved.