Error in clustering images using scikit-learn/k-means

Asked 2 years ago, Updated 2 years ago, 151 views

There is a program that uses SURF to extract feature quantities from jpg image files in a directory, group all SURFs using the k-means method to find visual word, and use it to make a list of local feature quantities of images a bag-of-words list.
As a trial, I was able to group well with about 90 images, but when I tried 1900 images, I got the following error message:

Traceback (most recent call last):
  File "sample.py", line 27, in <module>
    c=km.predict(d)
  File "C:\Anaconda3\lib\site-packages\sklearn\cluster\k_means_.py", line 1460, impredict
X=self._check_test_data(X)
  File "C:\Anaconda3\lib\site-packages\sklearn\cluster\k_means_.py", line 794, in_check_test_data warn_on_dtype=True)
  File "C:\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 407, check_array context))
ValueError: Found array with 0 sample(s) (shape=(0,64) while a minimum of 1 is required.

How can I do it correctly?
Here's the code.

import mahotas as mh
import numpy as np
from glob import glob
from mahotas.features import surf
from sklearn.cluster import MiniBatchKMeans
from sklearn.feature_extraction.text import TfidfTransformer

picture_category_num = 5
feature_category_num = 128

# image surf
images=glob('./*.jpg')
alldescriptors = [ ]
for im in images:
  im=mh.imread(im,as_grey=True)
  im = im.astype(np.uint8)
  alldescriptors.append(surf.surf(im,descriptor_only=True))

# image surf->basic feature
concatenated=np.concatenate(alldescriptors)
km = MiniBatchKMeans (feature_category_num)
km.fit (contained)

# image surf and basic features ->features
features = [ ]
for in alldescriptors:
  c=km.predict(d)
  features.append(np.array([np.sum(c==ci)for ci in range(feature_category_num))))
features=np.array(features)

# features ->tfidf
transformer=TfidfTransformer()
tfidf=transformer.fit_transform(features)
tfidf.toarray() 
# not use tfidf
# tfidf=features

# categorization
km=MiniBatchKMeans(n_clusters=picture_category_num, init='random', n_init=1, verbose=1)
km.fit (tfidf)

# print result
images=np.array(images)
print('completed')
f=open("result.txt", "w")
for i in range (picture_category_num):
  print('image category {0}'.format(i), file=f)
  print(images[km.labels_==i], file=f)
else:
  f.close()

python opencv scikit-learn

2022-09-30 21:17

1 Answers

If you look at the error message, I think the 1900 images contain images that cannot be extracted by SURF.For example,

 if d.size>0:
    c=km.predict(d)

Alternatively, why don't you append to alldescriptors for images with feature 0?

--This post is metropolis's Comment posted as a community wiki


2022-09-30 21:17

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.