Is the NVIDIA Tesla V100 the only NVIDIA Volta GPU that can be calculated with Half Precision?

In order to speed up the calculation with deep learning, I would like to do a half precision calculation.
According to 4.1 of this paper, NVIDIA Volta GPUs seem to be able to be calculated with semi-precision. What kind of GPUs can be specifically calculated with NVIDIA Tesla V100? For example, GeForceX TITA X NX 10/poreX?

deep-learning gpu

2022-09-29 21:32

2 Answers

That Paper:

NVIDIA Volta GPUs introduction Tensor Cores that enable efficient half precision floating point (FP) computers that are separate times faster than full precision operations. (...) This can be mitigated by scaling values to fit into the F16/range.

certainly looks like a feature introduced from the Volta core with the FP16 keyword.

Volta Core products are available in wikipedia(en)

Tesla V100
Tesla V100S PCIe
Titan V
Titan V CEO Edition
Quadro GV100

It appears to be for the data center, and the release is 2017-12.

Successor is Turing core (current latest, release 2018-09), which seems to be adopted across the board.

Turing Tensor Cores
Turing GPUs include an enhanced version of the Tensor Cores first introduced in the Volta GV100 GPU. The Turing Tensor Core design adds INT8 and INT4 Precision mods for interfering workloads that cannot be qualified quantified.FP16 is limited/strong>

Therefore, I believe that the new generation of GPUs, including the Turing core after Volta, will support FP16.Turing Product List (wikipedia(en)

GeForce GTX TITANX or GeForce GTX 1080 Ti

These are Maxwell and Pascal cores, so they are old and useless.

2022-09-29 21:32

The double rate FP16 itself, which doubles theoretical computing performance (FLOPS) as FP32, was first implemented in Volta's previous generation, the GP100 core of Pascal architecture.
Specifically, the Tesla P100 and Quadro GP100 are affected.

NVIDIA Tesla P100 PCIe 16GB Specs|TechPowerUp GPU Database (GP100)

NVIDIA Quadro GP100 Specs | TechPowerUp GPU Database (GP100)

However, even with the same Pascal architecture, GP102 or GP104 core GPUs do not have FP16 arithmetic units and emulate them using FP32 arithmetic units, so theoretical arithmetic performance is on the other hand, and the rate is 1/64.These are similar to Maxwell's improvements, with gaming performance more important than machine learning.

NVIDIA TITANX Pascal Specs|TechPowerUp GPU Database (GP102)

NVIDIA GeForce GTX 1080 Ti Specs|TechPowerUp GPU Database (GP102)

NVIDIA GeForce GTX 1080 Specs|TechPowerUp GPU Database (GP104)

While the Maxwell architecture does not support FP16 in most cores and the high-end GM200 does not support the computation itself, the mobile Tegra X1 exceptionally supports double-rate FP16.

NVIDIA GeForce GTX TITANX Specs|TechPowerUp GPU Database (GM200)

NVIDIA Tesla M40 Specs | TechPowerUp GPU Database (GM200)

NVIDIA Jetson TX1 GPU Specs | TechPowerUp GPU Database (GM20B)

The Turing architecture supports double-rate FP16 with FP16 arithmetic units even in lower gaming products without a Tensor core.

NVIDIA GeForce GTX 1650 Specs|TechPowerUp GPU Database (TU117)

AMD also has GPUs supporting double-rate FP16s starting with Vega (GCN 5th Gen).

AMD Radeon RX Vega 64 Specs | TechPowerUp GPU Database (Vega10)

Check the spec sheet to see if the GPU supports double-rate FP16.

2022-09-29 21:32

If you have any answers or tips

Popular Tags

python x 4647
android x 1593
java x 1494
javascript x 1427
c x 927
c++ x 878
ruby-on-rails x 696
php x 692
python3 x 685
html x 656

Popular Questions

1091 /usr/bin/google-chrome:symbol lookup error:/usr/bin/google-chrome: undefined symbol:gbm_bo_get_modifier
844 GDB gets version error when attempting to debug with the Presense SDK (IDE)
928 M2 Mac fails to install rbenv install 3.1.3 due to errors
939 Error in x, y, and format string must not be None
1462 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error