Benchmarking RTX 2080 Ti vs Pascal GPUs vs Tesla V100 with DL tasks

November 06, 2018

3 mins read

Testing

The post presents results of Turing and Pascal GPUs benchmarking with a popular Deep Learning Benchmark. PyTorch based tests with both floating point precisions (FP32 and FP16) were chosen for the comparison.

All tests were performed within a docker container nvcr.io/nvidia/pytorch:18.10-py3 (nvidia-docker image with PyTorch 1.0a0, CUDA 10, cuDNN 7400) for reproducibility. Proprietary Nvidia driver version: 410.73. It can be obtained from the Nvidia NGC Registry. Other test parameters were set the same to the original tests, that allows a direct comparison between the results.

An AMD Ryzen 7 1700X CPU powered the testing machine with 64 GB of RAM. All pieces of hardware were on stock frequencies without overclocking. The Tesla V100 setup is the only exclusion, an AWS p3.2xlarge cloud instance was used for the test.

Results

The tables contains time for a forward (eval) or forward and backward (train) passes for different models. The relative results as compared with GTX 1080 Ti as a reference are given in brakets.

**FP32 results**
GPU	VGG-16		ResNet-152		DenseNet-161		Average
GPU	eval	train	eval	train	eval	train	Average
Tesla V100	21.4 ms (-46.0%)	74.4 ms (-41.1%)	36.9 ms (-37.5%)	151.6 ms (-24.0%)	37.6 ms (-41.3%)	156.7 ms (-25.5%)	-35.9%
RTX 2080 Ti	28.4 ms (-28.3%)	97.5 ms (-22.9%)	42.7 ms (-27.6%)	151.2 ms (-24.2%)	46.6 ms (-27.2%)	155.9 ms (-25.9%)	-26.0%
GTX 1080 Ti	39.6 ms	126.4 ms	59.0 ms	199.5 ms	64.0 ms	210.4 ms	0.0%
GTX 1070	65.9 ms (+66.4%)	205.6 ms (+62.7%)	102.4 ms (+73.6%)	333.9 ms (+67.4%)	109.0 ms (+70.3%)	348.7 ms (+65.7%)	+67.7%

**FP16 results**
GPU	VGG-16		ResNet-152		DenseNet-161		Average
GPU	eval	train	eval	train	eval	train	Average
Tesla V100	11.9 ms (-67.3%)	42.2 ms (-63.7%)	30.4 ms (-38.5%)	110.5 ms (-43.7%)	32.6 ms (-38.1%)	121.3 ms (-37.0%)	-48.0%
RTX 2080 Ti	19.3 ms (-40.0%)	70.7 ms (-39.1%)	25.0 ms (-49.4%)	101.8 ms (-48.1%)	30.7 ms (-41.7%)	116.4 ms (-39.6%)	-44.1%
GTX 1080 Ti	36.4 ms	116.1 ms	49.4 ms	196.2 ms	52.7 ms	192.6 ms	0.0%
GTX 1070	61.2 ms (+68.1%)	190.9 ms (+64.4%)	86.1 ms (+74.3%)	309.3 ms (+57.6%)	88.2 ms (+67.4%)	306.2 ms (+59.0%)	+65.1%

To sum up, the new generation of GPU’s has a less than 30% increase in computational power as compared with the Pascal 1080 Ti. However, one should note up to 50% increase in FP16 performance which is achieved by the hardware support of the half precision calculations. Such an increase might produce a huge difference for practical application, especially for inference speed-up.

An interesting point to mention is the fact that the Nvidia RTX 2080 Ti performance in the test is on par with the Nvidia Titan V results (see here, but mind the software versions difference). Interestingly, the software versions make a big difference. For instance, see an older benchmark of Tesla V100 within a docker container with CUDA 9.0. It is also worth mentioning that the Tesla V100 performs significantly better in the case of the VGG-16 neural network, probably due to special architecture related optimizations.

[UPDATE] 19.01.2019 Add Tesla V100 test results.

« Semantic Segmentation of Seismic Reflection Images Multi-task learning loss balancing »

Benchmarking Nvidia RTX 5090 (Categories: Hardware, DeepLearning, Benchmark)
Performance Analysis of Intel iGPUs in VLM and LLM applications (Categories: Hardware, DeepLearning)
Camera Calibration: What to perfect before touching the code (Categories: ComputerVision, OpenCV, Calibration, Hardware)
Deep Learning in Sports and Autonomous Vehicles (Categories: DeepLearning, ComputerVision, Self-Driving)
Top-1 solution of SoccerNet Camera Calibration Challenge 2023 (Categories: DeepLearning, ComputerVision, Calibration, Competitions)
Four generations of Nvidia GPUs compared (Categories: Hardware, DeepLearning)

Computer Vision Lab

Nikolay Falaleev