Powering New Levels
of User Engagement

Boost Throughput and Responsive Experience in Deep Learning Inference Workloads.

AI is constantly challenged to keep up with exploding volumes of data and still deliver fast responses. Meet the challenges with NVIDIA® Tesla®, the world’s fastest, most efficient data center platform for inference. Tesla supports all deep learning workloads and provides the optimal inference solution—combining the highest throughput, best efficiency, and best flexibility to power AI-driven experiences. 


For Universal Data Centers

The Tesla V100 has 125 teraflops of inference performance per GPU. A single server with eight Tesla V100s can produce a petaflop of compute.

For Ultra-Efficient Scale-Out Servers

The Tesla P4 accelerates any scale-out server, offering an incredible 60X higher energy efficiency compared to CPUs.

For Inference-Throughput Servers

The Tesla P40 offers great inference performance, INT8 precision and 24GB of onboard memory for an amazing user experience.


50X Higher Throughput to Keep Up with Expanding Workloads

Volta-powered Tesla V100 GPUs give data centers a dramatic boost in throughput for deep learning workloads to extract intelligence from today’s tsunami of data. A server with a single Tesla V100 can replace up to 50 CPU-only servers for deep learning inference workloads, so you get dramatically higher throughput with lower acquisition cost.

Unprecedented Efficiency for Low-Power, Scale-Out Servers

The ultra-efficient Tesla P4 GPU accelerates density-optimized, scale-out servers with a small form factor and a 50/75 W power footprint design. It delivers an incredible 52X better energy efficiency than CPUs for deep learning inference workloads, so hyperscale customers can scale within their existing infrastructure and service the exponential growth in demand for AI-based applications.

A Dedicated Decode Engine for New
AI-Based Video Services

The Tesla P4 GPU can analyze up to 39 HD video streams in real time. Powered by a dedicated hardware-accelerated decode engine, it works in parallel with the NVIDIA CUDA® cores performing inference. By integrating deep learning into the pipeline, customers can offer new levels of smart, innovative functionality that facilitates video search and other video-related services.

Faster Deployment with NVIDIA TensorRT and DeepStream SDK

NVIDIA TensorRT is a high-performance, neural-network inference engine for production deployment of deep learning applications. With TensorRT, neural nets trained in 32-bit or 16-bit data can be optimized for reduced-precision INT8 operations on Tesla P4 or FP16 on Tesla V100. NVIDIA DeepStream SDK taps into the power of Tesla GPUs to simultaneously decode and analyze video streams.

Performance Specs

Tesla V100: The Universal Data Center GPU Tesla P4 for Ultra-Efficient, Scale-Out Servers Tesla P40 for Inference-Throughput Servers
Single-Precision Performance (FP32) 14 teraflops (PCIe)
15.7 teraflops (SXM2)
5.5 teraflops 12 teraflops
Half-Precision Performance (FP16) 112 teraflops (PCIe)
125 teraflops (SXM2)
Integer Operations (INT8) 22 TOPS* 47 TOPS*
GPU Memory 16 GB HBM2 8 GB 24 GB
Memory Bandwidth 900 GB/s 192 GB/s 346 GB/s
System Interface/Form Factor Dual-Slot, Full-Height PCI Express Form Factor SXM2 / NVLink Low-Profile PCI Express Form Factor Dual-Slot, Full-Height PCI Express Form Factor
Power 250 W (PCIe)
300 W (SXM2)
50 W/75 W 250 W
Hardware-Accelerated Video Engine 1x Decode Engine, 2x Encode Engines 1x Decode Engine, 2x Encode Engines

*Tera-Operations per Second with Boost Clock Enabled



iFLYTEK’s Voice Cloud Platform uses NVIDIA Tesla P4 and P40 GPUs for training and inference, to increase speech recognition accuracy. 


NVIDIA Inception Program startup Valossa is using NVIDIA GPUs to accelerate deep learning and divine viewer behavior from video data. 


JD uses NVIDIA AI inference platform to achieve 40X increase in video detection efficiency. 


The Tesla V100, P4, and P40 are available now for deep learning inference.