I want to cluster K-means in python, but I get an error: ValueError: Can only index by location with a [integer,...]

Asked 1 years ago, Updated 1 years ago, 341 views

What do you want to solve

Extract more than a certain condition from the original data of the simulation data ← "This is complete"

I was trying to cluster the data and extract the destination, but an error occurred here

Run Environment

·Excel has original data
·Open python at the command prompt
·Write the programming in Notepad and save it on hoge.py
·Run at the command prompt

Problems/errors encountered

 C:\datasyori>python hoge.py
    latitude longitude
0  35.693590  139.712202
1  35.693497  139.712096
2  35.693217  139.712261
3  35.693549  139.712430
4  35.693621  139.712501
Traceback (most recent call last):
  File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line769, in_validate_tuple_indexer
    self._validate_key(k, i)
  File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 1378, in_validate_key
    raiseValueError(f"Can only index by location with a [{self._valid_types}]")
ValueError: Can only index by location with a [ integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array ]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\datasyori\hoge.py", line 95, in<module>
    Cn = C.iloc [Tn, 0]
  File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 961, in__getitem__
    return self._getitem_tuple(key)
  File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 1458, in_getitem_tuple
    tup=self._validate_tuple_indexer(tup)
  File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line771, in_validate_tuple_indexer
    raise ValueError(
ValueError: Location based indexing can only have [ integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

Affected Source Codes

# Extracting destinations

from matplotlib import pyplot as plt
from sklearn import data sets, preprocessing
from sklearn.cluster importKEANS
import numpy as np
import pandas aspd
import cartopy.crs as ccrs
import cartopy.io.shapeleader as shpreader

pd.set_option('display.max_rows',600)
# Load preprocessed csv
yomi=pd.read_csv("simulationkai.csv")
df=pd.read_csv("simulationkai.csv", usecols=["longitude", "latitude")

# Convert to DataFrame
print(df.head())
# data shaping
X = df

 
# clustering
cls=KMeans(n_clusters=4)

result=cls.fit(X)
X['cluster'] = result.labels_
PC = pd.DataFrame (X['cluster'])
PC
df.head()
# Add cluster (cluster number) to yomi's data frame
yomi ['cluster_id'] = PC
yomi

# Save yomi (with cluster number added to original data) to allclsdata.csv
yomi.to_csv("allclsdata.csv")

D=X.sort_values(by="cluster")
D=D.drop_duplicates(subset='cluster')
D
# Count the number of data in each cluster
V = X ['cluster'].value_counts()
V
# Save the number and number of data for each cluster to clsvalue.csv
V.to_csv("clsvalue.csv")


# Checking cluster center of gravity
C=pd.DataFrame(result.cluster_centers_)
C

C.iloc [0,:]


lat=X ['latitude'].tolist()
lon=X ['longitude'].tolist()

clat=C[0].tolist()
clon=C[1].tolist()


From # to 1800 clusters of data, the number of data is obtained by eliminating duplication of the same subjects and sequentially summarizing them into CSVs.
from csv import writer
# pp = pd.DataFrame
#ppi=pd.DataFrame
# Extract only data from Nth cluster with While statement from yomi
i = 0
while i<=3:
  yomic=yomi [yomi['cluster_id']==i]
# Remove duplicate subject id from Nth cluster df
  yomics=yomic.drop_duplicates(subset=[id_questionnaire])
# Add the number of rows of Nth processed data to CSV
  # file = [i,len(yomics)]
  #ppi=pp.append([file], ignore_index=True)
  # ppi.to_csv("pp.csv")
  list_data=[i,len(yomics)]
  with open('pp.csv', 'a', newline=') asf_object:  
   writer_object = writer(f_object)
   writer_object.writerow(list_data)  
   f_object.close()
  i=i+1
# else:
  # ppi.to_csv("pp.csv") 

# Save the number of people in pp.csv in descending order to pps.csv
PP = pd.read_csv("pp.csv", names = ["cls", "people")
T = PP.sort_values (by = ["people"], ascending = False)
T.to_csv("pps.csv")
PP.to_csv("pp.csv")

# Pull the cluster numbers from above pps.csv in order to extract the coordinates of the numbers from C.
num = 0
while num<=3:
  Tn = T.iloc [num, 0]
  # Tno=Tn+1
  Cn = C.iloc [Tn, 0]
  Cn2 = C.iloc [Tn, 1]
  list_data2 = [Tn, Cn, Cn2 ]
  with open('point.csv', 'a', newline=') asf_object:  
   writer_object = writer(f_object)
   writer_object.writerow(list_data2)  
   f_object.close()
  num = num+1  

dfh=pd.read_csv("point.csv", names=["cluster_id", "latitude", "longitude"])
B=pd.read_csv("pps.csv", usecols=["people"])
# dfh2 = pd.DataFrame (B['people'])
dfh['people'] = B
dfh.to_csv("point.csv")

What I tried myself

There seems to be an error in the type for the value of Cn, but I don't understand it well due to lack of study.

Supplementary Information

Python 3.10.4 (tags/v3.10.4:9d38120, Mar 23 2022, 23:13:41) [MSC v. 1929 64bit (AMD64)] on win32

python

2022-10-22 09:15

1 Answers

First, regarding the error information, line 95

 Cn=C.iloc [Tn, 0]

Occurs in and the content is "Location-based indexing (.iloc[]) can only accept 'integer, integer slice, integer list, Boolean array'.
So, just before line 95,

print(num,Tn,type(Tn))

After inserting and creating and executing the appropriate input data (simulationkai.csv), we found that:
·There is no error in the first run, and Tn is an integer value (1,0,2,3)
·Error occurred after the second run, Tn is 'cls'
of type str (string) From this point of view, we believe that the csv file created when the first execution was probably affected, and the 76th line

goes back to Tn->T->PP->pp.csv.
with open('pp.csv', 'a', newline=') asf_object:

We have reached .Also, since this line is also executed in the first time (i=0) of the while loop, it seems to have been added ('a') to the pp.csv that was made in the past.Therefore, when I changed the while loop to overwrite only the first time ('w') as shown below, the error did not occur.

with open('pp.csv', 'w' if i==0 else'a', newline=') asf_object:


2022-10-22 09:15

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.