Is the NVIDIA Tesla V100 the only NVIDIA Volta GPU that can be calculated with Half Precision?

Asked 1 years ago, Updated 1 years ago, 78 views

In order to speed up the calculation with deep learning, I would like to do a half precision calculation.
According to 4.1 of this paper, NVIDIA Volta GPUs seem to be able to be calculated with semi-precision. What kind of GPUs can be specifically calculated with NVIDIA Tesla V100? For example, GeForceX TITA X NX 10/poreX?

deep-learning gpu

2022-09-29 21:32

2 Answers

That Paper:

NVIDIA Volta GPUs introduction Tensor Cores that enable efficient half precision floating point (FP) computers that are separate times faster than full precision operations. (...) This can be mitigated by scaling values to fit into the F16/range.

certainly looks like a feature introduced from the Volta core with the FP16 keyword.

Volta Core products are available in wikipedia(en)

  • Tesla V100
  • Tesla V100S PCIe
  • Titan V
  • Titan V CEO Edition
  • Quadro GV100

It appears to be for the data center, and the release is 2017-12.

Successor is Turing core (current latest, release 2018-09), which seems to be adopted across the board.

Turing Tensor Cores
Turing GPUs include an enhanced version of the Tensor Cores first introduced in the Volta GV100 GPU. The Turing Tensor Core design adds INT8 and INT4 Precision mods for interfering workloads that cannot be qualified quantified.FP16 is limited/strong>

Therefore, I believe that the new generation of GPUs, including the Turing core after Volta, will support FP16.Turing Product List (wikipedia(en)

GeForce GTX TITANX or GeForce GTX 1080 Ti

These are Maxwell and Pascal cores, so they are old and useless.


2022-09-29 21:32

The double rate FP16 itself, which doubles theoretical computing performance (FLOPS) as FP32, was first implemented in Volta's previous generation, the GP100 core of Pascal architecture.
Specifically, the Tesla P100 and Quadro GP100 are affected.

However, even with the same Pascal architecture, GP102 or GP104 core GPUs do not have FP16 arithmetic units and emulate them using FP32 arithmetic units, so theoretical arithmetic performance is on the other hand, and the rate is 1/64.These are similar to Maxwell's improvements, with gaming performance more important than machine learning.

While the Maxwell architecture does not support FP16 in most cores and the high-end GM200 does not support the computation itself, the mobile Tegra X1 exceptionally supports double-rate FP16.

The Turing architecture supports double-rate FP16 with FP16 arithmetic units even in lower gaming products without a Tensor core.

AMD also has GPUs supporting double-rate FP16s starting with Vega (GCN 5th Gen).

Check the spec sheet to see if the GPU supports double-rate FP16.


2022-09-29 21:32

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.