Optimizing User Generated Content for the World

David Lea
VP of Video Solutions at Visionular
YouTube has become the go-to platform for user-generated content (UGC), with over 2 billion daily users, 4 billion hours of video playtime daily, and 550 hours of video uploaded every minute.
With such a massive vast amount of content being continuously uploaded, YouTube’s engineers need to continually optimize their video processing pipeline to handle the enormous volume of UGC (User Generated Content).
In a recent episode of the VideoVerse podcast, Balu Chowdary Adsumilli, Head of Media Algorithms at YouTube, discussed the challenges of processing and transcoding UGC on YouTube’s platform.
Joining Balu on the podcast were Zoe Liu, the Co-Founder and CTO of Visonular, and Thomas Davies, Distinguished Engineer of Visionular.
UGC and the Challenge it Poses
The conversation discussed how UGC (User Generated Content) videos differ from professionally produced content.
UGC comes from regular people, so there’s a wide range of quality, compression issues, editing problems, and even artifacts in the original videos uploaded to YouTube.
These issues make maintaining high-quality video content challenging while keeping the creator’s intent intact.
That’s why YouTube has developed unique metrics to evaluate UGC quality without needing a reference video, such as the ones created by Professor Al Bovik’s team at UT Austin and other teams collaborating with YouTube.
When a video is uploaded, YouTube runs a series of metrics to determine its quality. Machine learning and deep nets have enabled extracting even more information from these metrics, like content attributes, to help understand and improve the video.
The UVQ metric, developed by YouTube, is one example of this approach, using three different nets to assess the video’s content, distortion, and compression. This method offers valuable insights into the video’s history and characteristics as uploaded by the user and before it gets compressed.
One of YouTube’s challenges is balancing creator intent and video quality. It might be tempting to preprocess and “fix” videos before transcoding, but doing so could risk compromising the creator’s vision.
Example of a video with a high level of compression and many artifacts in the original video itself. Source:
This is especially true with the rise of short-form content on platforms like TikTok and YouTube Shorts, where it’s essential to respect creators’ original intentions, even if the video quality isn’t perfect.
Balu Adsumilli, YouTube’s Head of Video Processing
Thomas inquired if YouTube’s video processing pipeline has a ranking system based on factors like popularity and download numbers and if they revisit videos that gain popularity later.
Balu explained that they process videos based on predicted popularity, with insights from their partners and creator managers. Their processing approach is divided into three segments – the head, body, and long tail of their ‘dinosaur’ or ‘cockatoo’ model.
- Head segment: For more popular videos in the head segment, they allocate more CPU resources to maintain the best quality possible regarding resolution, bitrate, device specifications, and other factors. These videos get a lot of attention, and the system processes them more.
- Body segment balances quality with factors such as expected watch time and the number of people watching the video.
- The long tail segment consists of videos with fewer views (sometimes as low as zero!), and they process these with minimal CPU resources to save on power consumption.
Balu next emphasized that they do not compromise on quality but rather allocate CPU resources, tune the compression parameters, and optimize storage and availability based on the video’s popularity, the expected number of views, and other factors.
Feedback to Users based on the Metrics Gathered
Considering that YouTube analyzes every video before transcoding it and gathers loads of intelligence and metrics, Thomas asked if they could use these metrics to provide feedback to creators on their video quality.
Balu agreed that involving creators in improving their video quality is essential, particularly popular creators.
He also highlighted the importance of conveying feedback in a way that creators can understand and relate to, which is where their UVQ metric comes into play.
Balu mentioned that their approach is more interactive, especially with top-end creators, as they work on helping them understand and improve their video quality.
Developing and Modeling Metrics at Scale
Continuing the discussion on video quality metrics, Zoe asked Balu how YouTube identifies, creates, and models metrics and how they correlate with quality and creator intent on such a large scale.
Balu explained that for large-scale systems, having an objective metric that closely correlates with subjective scores is vital.
That’s why YouTube has developed new ways of creating subjective scores involving methods like golden eye, lab testing, creative focus testing, surveys, and crowdsourced testing.
They’ve collaborated with academia and industry to ensure these methods are rooted correctly in the subjective sense.
Using the ground truth data and datasets they’ve created, they’ve gained insights into how viewers enjoy UGC and how it differs from professional content.
The UVQ Training Framework (Source)
By correlating creator understanding with metric-driven approaches, they’ve developed UVQ models that
- perform better than other metrics in various content categories
- have better correlations with subjective scores compared to other metrics.
However, Balu clarified that his team focuses on algorithm development, while sister teams handle scaling within the video infrastructure.
Finally, Balu concluded this segment by mentioning that UVQ is now open-source and can be used by anyone, and their team is exploring ways of using it in video compression.
Handling Vertical Videos in YouTube Shorts
The podcast shifted to vertical videos and how their handling differs from normal 16:9 or 4:3 aspect ratio videos.
Zoe specifically asked about the handling of YouTube Shorts, a relatively new genre in the world of video creation, and the difference in their processing compared to conventional YouTube videos.
Vertical videos in YouTube Shorts
Balu began his answer by underscoring the fact that YouTube Shorts is different in terms of both creator and viewer engagement. Being a mobile-first video creation and distribution platform, YouTube Shorts focuses on vertical video consumption, which differs significantly from traditional horizontal video.
Balu discussed how vertical videos break down traditional norms in video creation, affecting where people focus and concentrate on the video. He explained that processing vertical video requires a different approach to processing and optimization within the content.
Balu also emphasized that creator intent changes for YouTube Shorts, as there is more focus on features like text, emojis, and overlays than traditional videos.
The challenges of compressing YouTube Shorts include dealing with high-frequency data (aftermath of applying transforms like the DCT) and the increasing number of reactionary videos – where the video playing in the background is equally important to the creator sitting in the foreground reacting to the video.
Dancers react to RRR’s Naatu Naatu. The video demonstrates a composition of a high-action foreground, with a talking-heads inset video and a highly textured background image. Source
Closing the Loop with UVQ
Thomas then asked Balu how they close the loop with their sophisticated metrics (that correlate well with subjective quality) and whether applying these algorithms results in greater engagement with videos and higher viewer numbers.
Balu explained that while discussing internal metrics is difficult, they have observed increased satisfaction and higher engagement rates among users viewing high-quality content.
YouTube has also witnessed greater creator happiness and fewer issues or complaints regarding quality.
And to fully realize the benefits of these metrics, they need to be available, exercised, and correlated across various aspects of the platform, such as preprocessing pipelines, compression engines, and user recommendations.
Handling Audio at YouTube – Challenges Faced
The final part of the podcast focused on audio quality challenges faced at YouTube.
Balu explained that audio’s unique challenge compared to video is that audio quality is more closely tied to bitrate, making it difficult to break people’s assumptions about perceived quality based on bitrate values.
In audio, the primary challenges involve dealing with various formats and format conversions.
Open-source initiatives have seen significant success in the video domain but they haven’t been as successful in the audio field. This has led to a reliance on proprietary methods, which are more prevalent in the audio industry, like Dolby Atmos.
Spatial audio is becoming increasingly popular, and YouTube caters to high-end creators and movie studios that produce premium partner content or high-value content (HVC).
However, evaluating audio quality in user-generated content (UGC) can be tricky due to issues like clipping, saturation, and jarring noises, making it difficult to assess audio quality accurately. To address these challenges, YouTube employs various algorithms, such as dynamic range compression and audio loudness equalization across videos or playlists.
However, with a better understanding of audio quality in UGC, it remains easier to transition to a more metric-driven experience in the audio space and improve the end-user experience.
Conclusion
In conclusion, the podcast provided valuable insights into the challenges and complexities of processing and transcoding UGC videos on YouTube’s platform.
Balu’s team has to continuously develop and optimize processing pipelines and algorithms to handle the enormous volume of videos uploaded every minute while maintaining quality standards.
As new technologies emerge and real-time processing becomes more crucial, YouTube’s pipelines must adapt to meet these challenges.

Click the link to access Episode 16 of the podcast: The VideoVerse
At Foothill Ventures, we believe in startup companies that ride the transformative power of major technology shifts such as deep learning in computer vision. Visionular’s founders are world-class technologists in their field of video codec and AI-driven optimization. We feel privileged to support their adventure with our resources and experience.
I invested in Visionular because the team is at the forefront of innovations in video encoding and image processing for real-time low latency video communications and premium video streaming applications.