https://employment.en-japan.com/engineerhub/entry/2017/04/28/110000
After seeing , I used the actual source and the image I used as it was.
However, there was a problem that did not move to the next epoch even when one epoch ended.
Looking at the CPU, it seems to be working, but it doesn't migrate at all.
What are some possible problems?
from keras.applications.acception_v3importInceptionV3
from keras.applications.acceptance_v3 import preprocess_input
from keras.models import sequential, Model
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution 2D, MaxPooling 2D, ZeroPadding 2D, GlobalAveragePooling 2D, AveragePooling 2D
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint, CSVLogger, LearningRateScheduler,ReduceLROnPlateau
from keras.optimizers import SGD
from keras.regularizers import l2
import matplotlib.image asmpimg
from scipy.misc import immediate
import numpy as np
import keras.backend as K
import path
K.clear_session()
img_size=299
# training data extension
train_datagen=ImageGenerator(
featurewise_center=False,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
rotation_range = 10,
width_shift_range = 0.2,
height_shift_range = 0.2,
horizontal_flip = True,
vertical_flip = False,
zoom_range=[.8,1],
channel_shift_range = 30,
fill_mode='reflect')
test_dataGen= ImageDataGenerator()
# Image loading
defload_images(root,nb_img):
all_imgs=[]
all_classes=[ ]
for i in range(nb_img):
img_name="%s/dog.%d.jpg"%(root,i+1)
img_arr=mpimg.imread(img_name)
resize_img_ar=imresize(img_arr, (img_size, img_size))
all_imgs.append(resize_img_ar)
all_classes.append(0)
for i in range(nb_img):
img_name="%s/cat.%d.jpg"%(root,i+1)
img_arr=mpimg.imread(img_name)
resize_img_ar=imresize(img_arr, (img_size, img_size))
all_imgs.append(resize_img_ar)
all_classes.append(1)
return np.array(all_imgs), np.array(all_classes)
X_train,y_train=load_images('./train',1000)
X_test,y_test=load_images('./train',400)
train_generator=train_datagen.flow(X_train,y_train,batch_size=64,seed=13)
test_generator=test_datagen.flow(X_test,y_test,batch_size=64,seed=13)
# Load Inception v3 model.Do not load the final layer
base_model=InceptionV3 (weights='imagenet', include_top=False)
# Final Layer Configuration
x = base_model.output
x = GlobalAveragePooling 2D()(x)
predictions=Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid", kernel_regularizer=l2(.0005))(x)
model=Model(inputs=base_model.input, outputs=predictions)
#base_model does not update weights
for layer in base_model.layers:
layer.trainable=False
opt=SGD(lr=.01, momentum=.9)
model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
checkpoint=ModelCheckpoint(filepath='model.{epoch:02d}-{val_loss:.2f}.hdf5', verbose=1, save_best_only=True)
csv_logger = CSVLogger('model.log')
reduce_lr=ReduceLROnPlateau(monitor='val_loss', factor=0.2,
US>patience=5,min_lr=0.001)
history=model.fit_generator(train_generator,
steps_per_epoch = 2000,
epochs = 10,
validation_data=test_generator,
validation_steps = 800,
verbose = 1,
callbacks = [reduce_lr, csv_logger, checkpointer])
Additional
1 We decided that the epoch would end because it would stop with the output as shown below.
Even though the processing is not progressing at all (even hdf5 files created at the end of one epoch are not created), the CPU and others are working well.
Epoch 1/2
1/5 [=====>............] - ETA: 264s-loss: 0.7831 - acc: 0.6094
2/5 [===========>........] - ETA: 154s-loss: 0.7622-acc: 0.5859
3/5 [=================>........] - ETA:85s-loss:0.7396-acc:0.5729
4/5 [=======================>...] - ETA:38s-loss:0.7270-acc:0.5703
According to a post by sayaka1202 from multipost destination,
epochs=3
batch_size=64
nb_train_samples = 2000
nb_validation_samples = 800
Then I changed fit_generator accordingly and it seems to have worked.
When I used Keras and tensorflow to learn, it took me several tens of seconds to reach the end of the epoch, and then the display stopped for about 40 minutes, and then the process moved on to the next epoch.
I thought it was simply because of the low performance, because the learning was done on a desktop machine.
Isn't it just that the processing is heavy and it's taking a long time.
It seems that the site you referred to uses an instance of AWS, but does it operate in the same environment?
If the CPU is running, it doesn't seem to stop processing...
© 2024 OneMinuteCode. All rights reserved.