zakruti.com » IT - Software » Gamers Nexus

YouTube's Existence is Insane: How Video Compression, Encode, & Decode Work (Basics)

video description

Rating: 4.0; Vote: 1

Sponsor: Get 10% off Squarespace purchases (https://geni.us/BqEpf) This video goes through the very top-level basics of how videos work. Most of the discussion is hardware-agnostic, talking about video encode, decode, and compression. At GN, none of us are experts in these topics (and they exit our usual coverage spectrum), so graphics engineer Tom Petersen will be joining to help provide the foundational knowledge as a part of our educational series of engineering discussions. Towards the end, he talks about the hardware-level choices that affect media processing. This is the last of our series of 3 videos with Tom Petersen. Check the others below, and check back for videos with other engineers later! Watch our educational video on graphics/video drivers and game optimization: https://www.youtube.com/watchv=Qp3BGu3vixk Watch the video on Simulation Time Error & Presentmon: https://www.youtube.com/watchv=C_RO8bJop8o
Date: 2024-04-03

← Entire Case Company Built on Literal Theft

HW News - Intel Battlemage Appears, Open Source GPU, Xbox Handheld Rumors, $1400 Monitor →

Related videos

The DRAM Cartel - Price Fixing, Anti-Consumer Collusion, & Corporate Conspiracy

• Gamers Nexus

End of Real Photography

• PiXimperfect

Nano Banana Officially in Photoshop: 50 EPIC Uses!

• PiXimperfect

5 Photoshop Tricks to Improve Any Portrait FAST!

• PiXimperfect

I Was Wrong About Chromebooks...

• PC World

Nano Banana vs Photoshop: A Fair Comparison

• PiXimperfect

Comments and reviews: 20

jackykoning
What needs to be implemented is a quality feedback system. Preferably ssimulacra2 but thats unlikely.
Archiving video in a watchable quality is just about as important as encoding it for livestreaming (if not more important).
Implementing a feedback system that takes the output frame and runs it through the decoder and measures the quality of the output. Say we do have ssimulacra2 and I want a minimum quality of 70 the feedback would send the frame back to the encoder if it is below 70 and tell it to use more bitrate. Say I want to also limit the maximum to 80, anything above 80 is send back and the encoder is told to use less bitrate for that frame.
Archiving does not need to be realtime. It is perfectly fine to require 10 retries especially since the encoder runs at 250 fps.
So lets say I have a video of an aircraft landing. There isn't much motion even though you probably think there is. I know that a minimum value of 70 for ssimulacra2 is good enough for archiving. So I tell the encoder to set the minimum to 70 I also know about 85 is pretty much lossless so it never needs to be above that number. So I tell the encoder to go for a max of 85. Then I will say that I want the average to be about 75. You might also have a stop if within function. So if the score from ssimulacra2 is within 5% continue to the next frame. Obviously it needs to know that some frames can't be re-encoded in a lower quality as they are literally the same frame as the last one so you don't even have to try.
Implementing that would be a game changer. Especially for say Linus who could then re-encode his stuff without losing quality (in theory) by archiving it at 90 score. Since Linus as far as I can tell is storing it all raw. As for me I want that because I have quite a lot of footage from 20 years ago that takes up a lot of space but re-encoding it takes a massive amount of work and time. I have written a tool for it to do exactly as I just said but there are major flaws to it. Like the requirement of frequent keyframes.
reply

Ranguvar13
Excellent video! I'll point friends who are new to video/image compression this way.
One thing to note, from haunting video compression forums for many years. While the fixed function decode hardware on GPUs have been very fast and as fully featured as software decode since 2006 or so, and of course the same quality as software decoding, the same can't be said for encode even today. The x264 guys, to my best recollection viewed GPU encode as a marketing exercise, and saw very little speedup in leveraging GPU hardware themselves in comparison to other possible optimizations. Still today I'm not aware of any leveraging of GPU hardware by x265 or rav1e.
NVIDIA has the best GPU hw encode quality currently, and they're all somewhat better now, but not much. At least on NVIDIA and AMD, the encoders use much more of the actual programmable shader hardware than decode does (slowing games, etc.), they can't use many of the advanced features of the H.264/HEVC/AV1 formats, and they can struggle to compete on quality with 2-4 modern CPU cores running e.g. x264 --preset veryfast. If you can, try isolating a few cores and running software encode - you may be surprised, especially since you can still use the decode hardware on the input video if it's compressed.
reply

gamersnexus
Computer file compression is fascinating, but to me video compression specially. Here's a good way to explain 2 of the steps:
Spatial & Temporal Redundancy Search:
If a video's scene is a black screen for 10 seconds, and then a light shines in the middle, you can store those 10 seconds of video as a single black frame, and even after that, when the light shines, store only the pixels that changed. The black pixels around the light that don't get lit don't take extra storage because you, well, don't store them!
Symbol Coding:
Basically you're trading storage space for processing time. Let's say the binary for a frame is: 1110100001. If you store that, it takes 10 bits. You can look at the patterns, and instead do [3 times 1] 01 [4 times 0] 1. Because you have an external instruction manual that gives you the most frequent patterns, and the math to take care of them, you end up cutting up to half the size of the file. However, now you're tasking your video processor more heavily because it has to do extra math to figure out the binary composition of each frame, that's the trade off, less storage but higher decoding time.
reply

Dudewitbow
adjacent to the bit compression, its why some GPU tech may get even bigger in the future. increasing resolution, HDR and stuff are all very data heavy. It's why companies like Nvidia are investing into experimental tech for streaming, with RTX video enhancement, to upscale videos year back. Recently, Nvidia has been toying with their own implementation of Auto HDR. by removing the 2 functions from the source end, and generating the higher resolution and color range on the display side virtually can save a LOT of data, on top of the upcoming (likely going to be standard eventually) AV1 encoding. The combination of the 3 are likely going to be headlining features for a new type of Nvidia Shield product, to make the ultimate box for streaming video content, and adjacently, the most immersive version of cloud gaming for those who use gamepass/geforce now or equivalent services.
reply

maxwell_edison849
Who cares about HDR, I have had multiple HDR supported monitors over the years but literally never knew how to turn it on. Turning on HDR mode in windows always just made colors bad and most people say to NOT do it that way, but guess what I'm convinced 99% of consumers with HDR devices (including myself) do not know if they have HDR enabled on any specific piece of content at any point in time, because there is seemingly no TURN ON HDR BUTTON on any form of content ever except.. Windows and the monitor itself, both of which often times don't let you enabled them or don't do what you think they do.
Maybe if we admitted that 99% of people don't have or know how to watch in HDR we'd not have any issue with file sizes ever.
reply

magicmanchloe
16:33 The frequency quantization is really complicated, the best way I’ve heard it explained is as a map of comparison. Basically you’re comparing each pixel value to the pixels next to it to create a map of where the biggest differences contrast are in the photo. It’s basically a mathematical way of figuring out. What the most important data in the image is and then you can discard the information below a certain threshold and when you do the math backwards converting it back to an image it looks like almost nothing has happened.
reply

user-jp7tw3sd3x
I'd just like to nitpick on a detail. The huffman encoding with fixed pre-defined tables is no longer how modern video codecs work. It was true for MPEG-1/2/4, WMV1/2/3, etc. Codecs since h264/AVC (h265/HEVC, av1, etc) are using context adaptive variable length coding (CAVLC) or the significantly better context adaptive binary arithmetic coding (CABAC).
The adaptive part means that there are no fixed tables, encoding codes are generated and updated depending on the actual usage (probability) of the input values.
reply

ZeroUm_
The drop of bytes in chroma during the color space transformation is why some text gets fuzzy and hard to read on the desktop, when you change resolutions from 4k30 or 4k60 to 4k120 on a HDMI 1.4(2.0 I need to check) or DisplayPort 1.2 connection.
The bandwidth isn't enough to keep up with what you're trying to push, so the color space drops from YUV444(RGB) to YUV422 or lower, decreasing the color resolution and thus the small details. It's fine for video and gaming, but it sucks when reading static text.
reply

Robinthefox88
I only have a very high level understanding of video encode/decode, so it always melts my brain just how insanely complex it all is, and just how smart everyone is to not only come up with the theory, but then turn that into actual silicon and software to seamlessly perform these tasks
This video did a great job of getting deeper into the weeds of it all without being overwhelming and still very interesting, but then again, I'm pretty sure Tom reading a phone directory would be just as fascinating
reply

javajoint
one thing that was not mentioned when they talked about Temporal Difference is sun goes down.. A shifting light source changes the lighting all over the scene (as opposed to: car drives by static house w/ambient lighting). In other words Temporal is more about motion / positional differences than overall lighting. I understand why they did not deep dive into that, as it is a whole tangent - but good to be aware of. Very well done though!
reply

Craigerry
Tom is so darn smart and a great presenter, we don't deserve him
E: since this does turn into a product pitch for ARC, I wonder if there could be a use case in the future (if it's possible) for streamers to use a dual video card setups, one for the game they're playing and one for video encoding and compression in the same PC. I'd have to guess that this is currently possible with a dual PC setup but certainly a curiosity
reply

Originalimoc
I'm recently baffled by the fact the hard coded 4:2:0 down sampling, looks really bad when lots of edge present, like text, why can't we just treat color the same as resolution(because decreasing resolution is basically a dumb compression algorithm), we should just set a bitrate limit, then let the smart encoder do the chroma subsampling alongside the other compression DSC for monitor is kind of doing this
reply

gamersnexus
Suggestion for (d)GPU media file testing: How much data is sent/received through the PCIe bus while encoding/decoding video with certain resolutions/refresh rates/color settings
Why This might be relevant for operating dedicated GPUs explicitly meant for transcoding purposes (remember that being one recommendation for early ARC GPUs) via Thunderbolt or USB 4 adapters.
reply

Mantarhochen
Thank you for this interesting video!
I would have wished for a deeper dive into the current codecs and what they are doing better than the previous generations In fact I have a hard time imagining what can be done better at any step of the way Maybe better identifying the pixels that can be dropped at the 4th, frequency-related step
Nonetheless a good video!
reply

Nobody-vr5nl
I love that GN does this content. It doesnt have to apeal to everyone. Just genuinely nerdy content that few fully understand (including me) is great.
I work in a place with a lot of nerds. Some love to talk about stuff they probably shouldnt, and i love listening to them. Even if i dont really get it. They are so pationate about what they do and its great.
reply

TheNerd
I would say the simplest way of describing Huffmann is: If you have a Word document that has the letters AAAA in it and you want to compress it,
you can compress it as 4A (like 4 times A). For example AAAA BBB ZZZZZ CCCC could instead be: 4A 3B 5Z 4C
Of yourse the example is extremly oversimplified and almost an insult, but you get the idea of what it does.
reply

cromefire_
And the great thing is with Intel's VAAPI HW accelerated Video on Linux is pretty simple these days, well except on NVIDIA of course, but on Intel and AMD platform it's great. Intel and AMD should really work more often on vommon APIs, as intel is just plain better in designing those API, while AMD is perfectly capable of useing these better APIs.
reply

magicmanchloe
Edit: someone else explained it better
11:08 slight correction, 4:4:4, 4:2:2 and 4:2:0 is actually referring to the data in the grid. So in that grid of pixels 4:2:2 represents 4 vertical and horizontal illuminants samples. 2 horizontal chroma samples and 2 vertical chroma samples. 4:2:0 is the same but has no vertical chroma samples.
reply

TheGamerUnknown
Question: How does video streaming work when there are temporal components You can't send it frame-by-frame because of the temporal parts, but sending the entire video may be impractical for high-resolution long videos. Is the video 'chunked' into groups of frames that have all the temporal information they need
reply

DEMENTO01
another interesting thing in codecs is lossless codecs, you can still save quite a lot of bandwith without losing any information whatsoever, like with FFV1 and others, also, YUV chroma subsampling (aka 4:2:0 and such) has been a thing since before digital video existed, afaik it has its roots in color tv signals
reply

Add a review, comment