Tensor Cores in NVIDIA Volta

The Next Generation of Deep Learning

NVIDIA® Tesla® V100 GPU is powered by NVIDIA Volta, a revolutionary new GPU architecture. Its streaming multiprocessors are 50 percent more energy efficient than previous-generation NVIDIA® PascalTM, enabling major boosts in 32-bit floating-point precision (FP32) and 64-bit floating-point precision (FP64) performance. But the biggest advancement? The introduction of Tensor Cores.

A Breakthrough In Training And Inference

Designed specifically for deep learning, Tensor Cores deliver groundbreaking performance—up to 12X higher peak teraflops (TFLOPS) for training and 6X higher peak TFLOPS for inference. This key capability enables Volta to deliver 3X performance speedups in training and inference over the previous generation. 

Each of Tesla V100's 640 Tensor Cores operates on a 4x4 matrix, and their associated data paths are custom-designed to dramatically increase floating-point compute throughput with high-energy efficiency.

Efficiency and Performance Accelerated

Deep Learning Training in Less Than a Workday

Volta is equipped with 640 Tensor Cores, each performing 64 floating-point fused-multiply-add (FMA) operations per clock. That delivers up to 125 TFLOPS for training and inference applications. This means that developers can run deep learning training using a mixed precision of FP16 compute with FP32 accumulate, achieving both a 3X speedup over the previous generation and convergence to a network’s expected accuracy levels. This 3X speedup in performance is a key breakthrough of Tensor Core technology. Now, deep learning can happen in mere hours.

47X Higher Throughput than CPU Server on Deep Learning Inference

For inference, Tesla V100 also achieves more than a 3X performance advantage versus the previous generation, and is 47X faster than a CPU-based server. Using NVIDIA’s TensorRT Programmable Inference Accelerator, these speedups are due in large part to Tensor Cores accelerating inference work using mixed precision.

A Major Boost in Computing Performance

Read the whitepaper about Tensor Cores and the NVIDIA Volta architecture.