Benchmarking RTX 2080 Ti vs Pascal GPUs vs Tesla V100 with DL tasks
November 06, 2018
3 mins read
Testing
The post presents results of Turing and Pascal GPUs benchmarking with a popular Deep Learning Benchmark. PyTorch based tests with both floating point precisions (FP32 and FP16) were chosen for the comparison.
All tests were performed within a docker container nvcr.io/nvidia/pytorch:18.10-py3 (nvidia-docker image with PyTorch 1.0a0, CUDA 10, cuDNN 7400) for reproducibility. Proprietary Nvidia driver version: 410.73. It can be obtained from the Nvidia NGC Registry. Other test parameters were set the same to the original tests, that allows a direct comparison between the results.
An AMD Ryzen 7 1700X CPU powered the testing machine with 64 GB of RAM. All pieces of hardware were on stock frequencies without overclocking. The Tesla V100 setup is the only exclusion, an AWS p3.2xlarge cloud instance was used for the test.
Results
The tables contains time for a forward (eval
) or forward and backward (train
) passes for different models.
The relative results as compared with GTX 1080 Ti as a reference are given in brakets.
GPU | VGG-16 | ResNet-152 | DenseNet-161 | Average | |||
---|---|---|---|---|---|---|---|
eval | train | eval | train | eval | train | ||
Tesla V100 | 21.4 ms (-46.0%) | 74.4 ms (-41.1%) | 36.9 ms (-37.5%) | 151.6 ms (-24.0%) | 37.6 ms (-41.3%) | 156.7 ms (-25.5%) | -35.9% |
RTX 2080 Ti | 28.4 ms (-28.3%) | 97.5 ms (-22.9%) | 42.7 ms (-27.6%) | 151.2 ms (-24.2%) | 46.6 ms (-27.2%) | 155.9 ms (-25.9%) | -26.0% |
GTX 1080 Ti | 39.6 ms | 126.4 ms | 59.0 ms | 199.5 ms | 64.0 ms | 210.4 ms | 0.0% |
GTX 1070 | 65.9 ms (+66.4%) | 205.6 ms (+62.7%) | 102.4 ms (+73.6%) | 333.9 ms (+67.4%) | 109.0 ms (+70.3%) | 348.7 ms (+65.7%) | +67.7% |
GPU | VGG-16 | ResNet-152 | DenseNet-161 | Average | |||
---|---|---|---|---|---|---|---|
eval | train | eval | train | eval | train | ||
Tesla V100 | 11.9 ms (-67.3%) | 42.2 ms (-63.7%) | 30.4 ms (-38.5%) | 110.5 ms (-43.7%) | 32.6 ms (-38.1%) | 121.3 ms (-37.0%) | -48.0% |
RTX 2080 Ti | 19.3 ms (-40.0%) | 70.7 ms (-39.1%) | 25.0 ms (-49.4%) | 101.8 ms (-48.1%) | 30.7 ms (-41.7%) | 116.4 ms (-39.6%) | -44.1% |
GTX 1080 Ti | 36.4 ms | 116.1 ms | 49.4 ms | 196.2 ms | 52.7 ms | 192.6 ms | 0.0% |
GTX 1070 | 61.2 ms (+68.1%) | 190.9 ms (+64.4%) | 86.1 ms (+74.3%) | 309.3 ms (+57.6%) | 88.2 ms (+67.4%) | 306.2 ms (+59.0%) | +65.1% |
To sum up, the new generation of GPU’s has a less than 30% increase in computational power as compared with the Pascal 1080 Ti. However, one should note up to 50% increase in FP16 performance which is achieved by the hardware support of the half precision calculations. Such an increase might produce a huge difference for practical application, especially for inference speed-up.
An interesting point to mention is the fact that the Nvidia RTX 2080 Ti performance in the test is on par with the Nvidia Titan V results (see here, but mind the software versions difference). Interestingly, the software versions make a big difference. For instance, see an older benchmark of Tesla V100 within a docker container with CUDA 9.0. It is also worth mentioning that the Tesla V100 performs significantly better in the case of the VGG-16 neural network, probably due to special architecture related optimizations.
[UPDATE] 19.01.2019 Add Tesla V100 test results.