Aurora1 AV1 vs. NVIDIA NVENC HEVC
TL;DR: Watch this video to get the essential data in 34 seconds. Here’s what you are going to see: Aurora1 beats the venerable NVIDIA NVENC HEVC encoder that is built-in to the popular Tesla 4 by delivering the same quality or higher at exactly half the bitrate (5Mbps for 1080p60). In order to compare appropriately, Aurora1 was operated in its real-time mode as required for cloud gaming and live streaming services.
Video engineers seeking to optimize the quality of video that they stream face many challenges. Everything from the codec standard selected to the choice of resolution and bitrate to the streaming format, HLS or DASH, impacts the user experience. As important as these decisions are, the encoding technology approach is critical, starting with using software or hardware-based video encoder.
As video encoding experts, we’ve chosen to innovate on the video encoding process using software. The temptation to use the encoder available in hardware can be a strong pull given that it feels safe and often requires minimal integration. Given the NVIDIA Tesla T4 is somewhat ubiquitous, we thought it would be interesting to compare VMAF scores (average and 5% percentile) of our Aurora1 AV1 software encoder to an NVIDIA NVENC HEVC encoder running on the NVIDIA Tesla T4 GPU.
GOP structure- GOP length set to 512 frames, closed GOPs (each GOP starts with a keyframe).
Slicing/tiling- The NVIDIA Tesla T4 does not provide tiling for HEVC; thus, we used slicing. However, this represents a disadvantage for the NVIDIA encoder since uniform tiling offers reduced boundary lengths and fewer dependency breaks than slicing. For cloud gaming, slicing/tiling is necessary to enhance error resilience due to packet loss. The Aurora1 encoder split each frame into a 4×4 tile grid, while the NVIDIA encoder divided each frame into 16 uniform slices.
Rate Control- For Cloud Gaming applications, the Rate Control is almost always CBR or a tight VBR with a small overflow margin of 10%.
Aurora1 AV1 Encoder- version 1.3.5-6-g67e2108e2 6-g67e2108e2@2021-1-24.
ENCODER CONFIG: Aurora1_sample_encoder.exe 1920 1080 60 Fifa17.yuv fifa17.ivf 512 0 0 ultrafast game cbr 5000 0 4 4 2 -1
Utilized presets ‘ultrafast’ and ‘fast’ each with the same tuning ‘game’.
NVIDIA Tesla T4 HEVC HW Encoder- driver 436.3, SDK8 reference encoder.
NvEncoder8.exe -i [yuv-file] -size 1920 1080 -codec 1 -preset lowLatencyHP -fps 60 -bitrate [targetBitrate] -vbvMaxBitrate [1.1x targetBitrate] -vbvSize [33ms payload given target bitrate] -numB 0 -picStruct 1 -rcmode 32 -devicetype 1 -inputFormat 0 -deviceID 0 -o [h265-file]
‘numB 0’ – no B frames (due to low latency requirement)
‘picStruct 1’ – progressive source
‘inputFormat 0’ – yuv420p
‘deviceID 0’ – always activate GPU0
‘rcmode 32’ – sets dual-pass VBR
‘codec 1’ – specifies HEVC codec
‘vbvMaxBitrate’ – peak bitrate, allows 10% variance from the target bitrate
All video content is in the YUV 4:2:0 format, resolution 1920×1080, and 60 frames per second. Due to Rate Control sensitivity, the first 20 frames were not included in the following VMAF score calculations computed using vmafossexec.exe (vmaf_4k_v0.6.1.pkl model).
According to the paper “VMAF Reproducibility: Validating a Perceptual Practical Video Quality Metric” by Reza Rassool, after extensive subjective testing and correlating the results using VMAF, a score of 93 can optimally serve the majority of an audience.
Across the video library tested, at the 10Mbps target bitrate, the NVIDIA NVENC HEVC and Aurora1 encoders beat the 93 VMAF score, except Aurora1 on a single video, and NVIDIA NVENC on two videos. It demonstrates that for applications where bitrate is not a constraint, provided users can reliably receive a 10Mbps video stream, either encoder will get the job done.
However, the situation changes when the required delivery bitrate is reduced from 10Mbps to 5Mbps. Here, only Aurora1 met or exceeded the ideal VMAF score of 93, except for the ‘Unravel’ video. While the NVIDIA NVENC encoder missed the preferred VMAF score for the high action clip ‘FIFA17’ and ‘Unravel.’
There is a common belief that for latency-critical applications, only hardware encoders are suitable. However, comparing these two encoders demonstrates a significant deviation in VMAF scores, and with Aurora1 producing consistently better quality while still operating in its real-time mode, shows that software is not only capable of keeping up with hardware, but it is better suited for applications that require high quality and high bitrate efficiency.