Resource exhausted:understanding OOM when allocating tensor errors

Asked 2 years ago, Updated 2 years ago, 107 views

What do you want to do

  • Use Google colab to learn image data in YOLO-v3.

What I've already done

  • Collect 20 teacher images and resize them
  • Annotate the collected images with Vott and put them on the colab as zip and unzip
  • Convert to data for YOLO
  • Convert for Keras

What's stuck

After doing the above, I ran train.py and got the following error:

2019-09-09 08:03:13.567901: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ****************************************************************************************************
2019-09-09 08:03:13.568544: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at fused_batch_norm_op.cc:487 : Resource exhausted: OOM when allocating tensor with shape[32,512,20,20] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "train.py", line 195, in <module>
    _main()
  File "train.py", line 89, in _main
    callbacks=[logging, checkpoint, reduce_lr, early_stopping])
  File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 217, in fit_generator
    class_weight=class_weight)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1217, in train_on_batch
    outputs = self.train_function(ins)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File"/usr/local/lib/python 3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in_call
    fetched=self._callable_fn(*array_vals)
  File"/usr/local/lib/python 3.6/dist-packages/tensorflow/python/client/session.py", line 1458, in__call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.ResourceExhaustedError:2rooterror(s)found.
  (0) Resource exhausted: OOM when allocating sensor with shape [32,512,20,20] and type float on/job: localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[{{nodebatch_normalization_41/FusedBatchNorm}]]
Hint: If you want to see a list of allocated sensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[loss_1/add_74/_5299]]
Hint: If you want to see a list of allocated sensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating sensor with shape [32,512,20,20] and type float on/job: localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[{{nodebatch_normalization_41/FusedBatchNorm}]]
Hint: If you want to see a list of allocated sensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

As I am a python beginner, I would like to try running it on a test as the time being.
Thank you for your cooperation.

pkg used https://github.com/sleepless-se/keras-yolo3

python3 machine-learning tensorflow keras google-colaboratory

2022-09-30 15:55

1 Answers

It's GPU OutOfMemory.Train is CPU.


2022-09-30 15:55

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.