Visionular Intelligent Optimization Technology
This paper provides an introduction with technical details about Visionular’s Content-Adaptive Encoding (CAE) Intelligent Optimization technology that combines content-adaptive encoding algorithms which operate deep inside the codec, and are powered by advanced machine learning processes, image processing, and image enhancement, and controlled by a subjectively aligned quality assessment mechanism that provides the most effective video encoding solutions on the market.
Built for maximum flexibility, and modern workflows, our Intelligent Optimization technology works across all use-cases from premium VOD, live broadcast streaming, to ultra-low latency RTC video conferencing and communications applications.
- Improved visual quality. Regardless of bandwidth limitations, our encoders are able to produce better perceptual visual experiences that result in more detailed and visually pleasing images.
- Reduced operational cost. Our technology produces higher perceptual quality while achieving a 50% or higher bitrate savings. This translates directly to a decrease in storage and bandwidth costs.
The market is flooded with CAE technology and vendors all making very similar claims. We know it’s difficult to assess the solution that is best for your application and use case.
The following paper will give you the information that you need to understand what makes our approach better and why we can guarantee you the result that you are looking for: higher quality with fewer bits.
Common Approaches to CAE
Per-Title encoding includes context-based adaptive bitrate as well as adaptive resolution assignments. Before Per-Title Coding, it was standard practice to adopt a single fixed predefined parameter set that the encoder would use to handle all of the video content.
Over time many services have optimized their encoding recipes, yet the problem persists of what to do when a video library is diverse, containing high complexity video and low complexity video. Or where within a single title there are vast sections that are low complexity, followed by shorter sections that are higher complexity.
The following table from Netflix shows a series of (bitrate, resolution) encoding ladders applied to different use cases. For any specific resolution, the corresponding bitrate must adequately cover the majority of video assets without noticeable degradation.
Nevertheless, Netflix found two downsides to the fixed recipe approach. One is that for some titles, bitrate is wasted. For example, let’s consider animation or interview scenes. Here, it is not needed to use 5800kps for a 1080p encode. And yet, assuming the bandwidth is available to the user, the ABR mechanism will allow the video to be streamed at its maximum quality, consuming excessive network resources. While on the overhand, if a user was limited to 2000kbps on their network connection, they could reliably stream 720p at 1750kbps rather than 480p as specified in the table.
As every video encoding engineer has seen, when the encoding bitrate reaches a certain local maximum, a further bitrate increment will not result in better subjective quality improvement. For instance, when PSNR reaches 45dB, an increase in bitrate and subsequent PSNR will barely impact subjective quality.
This phenomenon is illustrated in the following figures where JND (just noticeable difference) based subjective evaluation scores reveal that using x264 for different user scenarios when QP is set below a certain level and the bitrate goes above a specific number, the JND subjective scores do not change any further.
The objective for Per-Title Coding is to determine the minimum bitrate that can result in the ideal visual quality and subjective visual sensitivity, for every single video in the library. As illustrated in the following Figure, Per-Title technologies utilize adaptive resolution where the ideal resolution and its corresponding bitrates are selected.
At each resolution, video quality gets improved along with the bitrate increment monotony. When bitrate reaches a certain threshold, the curve will level off. For different categories of content, varying video sources of the same category, different segments of the same source, discrete encoding parameter sets should be chosen. If the CAE mechanism is able to operate the encoders rate-controller bits can be allocated to those regions that are visually significant allowing the encoder to maintain sufficient subjective quality using the fewest bits possible.
Image example: Intelligent optimization for UGC music video scenario. Left side shows video transcoded by FFmpeg-x264, 241×136, 200kbps. And the right side is the same video transcoded by Visionular’s AuroraCloud Intelligent Transcoder at double the resolution (426×240) but same bitrate (200kbps).
Employing content categorization, appropriate video algorithms can be used that best fit the content to achieve higher compression ratio and quality enhancement. Based on the user scenario and the complexity of the video, two levels of categorization may be used.
For user scenarios, video content can be classified into social, gaming, animation, education, sports, action, drama, video conferencing, screen share, and further if needed based on scene complexity. Deep learning algorithms and extensive training using CNN Neural Networks allow accuracy greater than 95%.
Categorization builds the foundation for the deployment of adaptive video compression techniques. For different classes, the following algorithms can be exploited and combined to best fit the specific category:
- Combinations of compression algorithms and compression tools.
- Advanced preprocessing algorithms and parameter tuning.
- ROI (region of interest, e.g. face regions) algorithms.
- Objective quality metrics, including VMAF for quality validation.
- Per-Title coding using parameter tuning and selection.
Consider online education (remote learning) user scenario where screen content comprises much of the image. In this application, it is important to design and develop screen content-oriented algorithms. For celebrity and interview “talk” shows, the facial region of interest algorithms is important to preserve the detailed textures. The importance of image processing continues for sports videos where moving object detection and motion deblurring are critical.
In order to cover the wide range of user scenarios, intelligent optimization technology is required for adaptive methodologies to meet the requirements of every video in the library.
AI Video Preprocessing Introduction
AI video preprocessing can play a significant role in improving video quality. We have engineered and adopted a series of preprocessing algorithm modules, each with a specific target for quality enhancement, including coding artifacts removal, denoising, spikiness removal, sharpening, contrast enhancement, saturation enhancement, deblurring, etc. These algorithms are jointly tuned and utilized in the context of our CAE video optimization technology.
The use of AI video processing provides further visual quality enhancement and can be combined with video encoding algorithms to achieve overall rate-distortion optimization. Regions that are visually more significant to the human eye are further enhanced, whereas those regions visually insignificant can be weakened or removed through smoothing and low-pass filtering, to achieve a higher compression ratio. Our AI preprocessing algorithms are optimized and tuned for an optimum tradeoff between performance and computational complexity.
Continue reading...
At Foothill Ventures, we believe in startup companies that ride the transformative power of major technology shifts such as deep learning in computer vision. Visionular’s founders are world-class technologists in their field of video codec and AI-driven optimization. We feel privileged to support their adventure with our resources and experience.
I invested in Visionular because the team is at the forefront of innovations in video encoding and image processing for real-time low latency video communications and premium video streaming applications.