There is a program that uses SURF to extract feature quantities from jpg image files in a directory, group all SURFs using the k-means method to find visual word, and use it to make a list of local feature quantities of images a bag-of-words list.
As a trial, I was able to group well with about 90 images, but when I tried 1900 images, I got the following error message:
Traceback (most recent call last):
File "sample.py", line 27, in <module>
c=km.predict(d)
File "C:\Anaconda3\lib\site-packages\sklearn\cluster\k_means_.py", line 1460, impredict
X=self._check_test_data(X)
File "C:\Anaconda3\lib\site-packages\sklearn\cluster\k_means_.py", line 794, in_check_test_data warn_on_dtype=True)
File "C:\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 407, check_array context))
ValueError: Found array with 0 sample(s) (shape=(0,64) while a minimum of 1 is required.
How can I do it correctly?
Here's the code.
import mahotas as mh
import numpy as np
from glob import glob
from mahotas.features import surf
from sklearn.cluster import MiniBatchKMeans
from sklearn.feature_extraction.text import TfidfTransformer
picture_category_num = 5
feature_category_num = 128
# image surf
images=glob('./*.jpg')
alldescriptors = [ ]
for im in images:
im=mh.imread(im,as_grey=True)
im = im.astype(np.uint8)
alldescriptors.append(surf.surf(im,descriptor_only=True))
# image surf->basic feature
concatenated=np.concatenate(alldescriptors)
km = MiniBatchKMeans (feature_category_num)
km.fit (contained)
# image surf and basic features ->features
features = [ ]
for in alldescriptors:
c=km.predict(d)
features.append(np.array([np.sum(c==ci)for ci in range(feature_category_num))))
features=np.array(features)
# features ->tfidf
transformer=TfidfTransformer()
tfidf=transformer.fit_transform(features)
tfidf.toarray()
# not use tfidf
# tfidf=features
# categorization
km=MiniBatchKMeans(n_clusters=picture_category_num, init='random', n_init=1, verbose=1)
km.fit (tfidf)
# print result
images=np.array(images)
print('completed')
f=open("result.txt", "w")
for i in range (picture_category_num):
print('image category {0}'.format(i), file=f)
print(images[km.labels_==i], file=f)
else:
f.close()
If you look at the error message, I think the 1900 images contain images that cannot be extracted by SURF.For example,
if d.size>0:
c=km.predict(d)
Alternatively, why don't you append
to alldescriptors
for images with feature 0?
--This post is metropolis's Comment posted as a community wiki
© 2024 OneMinuteCode. All rights reserved.