I'm taking cluster IDs of clustered data from zero in order to eliminate duplicates and save them, but when I want to increase the number of clusters, I can't create programs other than hand-add them. What code can I use to simplify it?
For example, if you want to set the number of clusters to 100 and extract 0 to 99 in order to eliminate duplication and save them
I changed the number and made it into a long code, but if it reaches 100, it will be a lot of effort, so I want to simplify it.
import numpy as np
import pandas aspd
# Load csv 0
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==0]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusternai0.csv")
# Load csv
df = pd.read_csv("allclsdata.csv")
# Extract Cluster ID
X = df [df["cluster_id" ]==1]
X
# deduplication
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
# Save
sinX.to_csv("clusternai1.csv")
# Load csv2
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==2]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusternai2.csv")
# Load csv3
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==3]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusternai3.csv")
# Load csv4
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==4]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusternai4.csv")
# Load csv5
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==5]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusternai5.csv")
# Load csv6
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==6]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusternai6.csv")
# Load csv7
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==7]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusternai7.csv")
# Load csv 8
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==8]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusternai8.csv")
# Load csv9
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==9]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusternai9.csv")
# Load csv 10
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==10]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusternai10.csv")
# Load csv 11
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==11]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusterna11.csv")
# Load csv12
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==12]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusternai12.csv")
# Load csv 13
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==13]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusternai13.csv")
# Load csv 14
df = pd.read_csv("allclsdata.csv")
X = df [df["cluster_id" ]==14]
X
sinX=X.drop_duplicates(subset=["id_questionnaire", keep='last')
sinX
sinX.to_csv("clusternai14.csv")
Python 3.10.4 (tags/v3.10.4:9d38120, Mar 23 2022, 23:13:41) [MSC v. 1929 64bit (AMD64)] on win32
python
I don't have any data, so I haven't tried it yet
df=pd.read_csv("allclsdata.csv")
for n, sdf indf.groupby('cluster_id'):
sinX=sdf.drop_duplicates(subset=["id_questionnaire", keep='last')
display(sinX)
sinX.to_csv(f'clusternai{n}.csv')
© 2025 OneMinuteCode. All rights reserved.