NVENC, NVDEC, VRAM, bitrate, bit depth, compression standards and performance
Hello slothtechtv! It seems it is you and me here currently.
I have been meaning to switch my library to h.265 and have been looking around to find information about the performance of each card and chip. I would love to test all of my theories but sadly I don't have any h.265 hardware yet. This could all be very interesting for your YT channel as well as interesting to the community. This mostly applies to Plex servers that have lots of users. This post will get very technical.
I started my research on reddit and came across elpamsoft's matrix: https://elpamsoft.com/downloads/nVidia%20NVENC%20NVDEC%20Matrix.pdf
Elpamsoft also provides its newer page here: https://www.elpamsoft.com/?p=Plex-Hardware-Transcoding but this page lacks h.265/h.264 FPS differentiation.
As it seems, not all cards are equal and having e.g. a Pascal NVENC/NVDEC chip on your GPU does not mean that you will get the same performance with a lower/higher tier GPU. Speccing a GPU for HW transcoding for your Plex server is not straight forward at all!
Elpamsoft is going by NVidia's numbers of their "benchmarks", but those bitrates are not indicative to real world Plex usage.
Here is for NVDEC (page 6): https://developer.download.nvidia.com/designworks/video-codec-sdk/secure/7.1/01/NVDEC_DA-06209-001_v08.pdf?-0ei-lpY62Nz3Sqptll9QW-_Nnm7TzLRVyKDzTdyldvH7wI8ETPFsjnCwzuuDHlxk5-jCtIgacO-uCGlLuicHK3gvnC2H9vHASDGG3d5knjC11JCOqXZuUUffRvbMrvXdR-JvbI2seJn5wupRwgyv2gkojrzqx7_klrEBIxV-l06MmM
Here is for NVENC (page 9): https://developer.download.nvidia.com/designworks/video-codec-sdk/secure/7.1/01/NVENC_DA-06209-001_v08.pdf?NI--jMcWJvHfzG0IWcp0L0HPYWU1TFm7V46oKH8mUMH9gbEM-ZHadx71FFqFDbmmjNJWbJByULGin5HQU7-E2xUnaNCUw1hlid0KQk47UQwJs4UAcvfHFVQ2C9WMVnpyA88JfN0HzzuUTaubsyI9RwEPOHTwtDVJQIjVX4N-beYH8hg
Elpamsoft's research is centered around VRAM. Cards with the same chip but different amount of VRAM exhibit different characteristics.
Because elpamsoft's documentation is not very well written, I will explain it myself:
Every NVENC/NVDEC chip is able to encode/decode a certain amount of FPS of a video. That amount varies in relation to the video compression standard (h.264, h.265 or others), bitrate, the bit depth and the color (4:2:0 or 4:4:4).
Every GPU has a certain amount of VRAM (and VRAM bandwidth for that matter). A Quadro P2000 has the same NVDEC/NVENC chips as a GTX 1060. The Quadro has 5GB of VRAM while the 1060 has 6GB (obviously talking about the 6GB version). Elpamsoft has a handy "Streams for VRAM" section in its original pdf (link 1 above) that tells us how many h.264 (I presume) 1080p 15Mbps to 720p 4Mbps streams can fit in each card's VRAM. Elpamsoft says that if you go over that amount of streams, " will cause new transcodes to buffer indefinitely, even once VRAM usage has dropped below the maximum. The Plex client will need to stop the play request and request it again once VRAM usage has dropped ". This is obviously unwanted for a Plex server. This is why it suggests 17 streams for the P2000 and 20 for the 1060. Moreover, if we have a card with a chip that can decode/encode more streams than the amount that fits in VRAM, it will cause the streams to buffer, but that is better. Again, elpamsoft is quoting NVidia's numbers but NVidia used a 20mbps h.265 1080p file that is high and not common with real world Plex users. That is why we need to test for ourselves.
Since we put GPUs in our Plex servers to decode h.265 to h.264, our main focus is 1080p h.265 decoding (8 or 10 bit) and 1080p h.264 encoding performance. You could also throw 720p h.264 encoding in there for when we are streaming to devices unable to play h.265 at locations with low internet bandwidth.
I will make some assumptions for my use case (which I think is quite common) and explain what I would test:
I will be decoding h.265 1080p either 8 or 10 bit and 10mpbs, 5mbps or 3.5mbps. That is six tests. Elpamsoft is using NVidia's 1080p 20mbps 8 bit numbers which is quite high bitrate for h.265. I know you haven't enabled decoding on your Linux computer but you should. I suppose you know of the patch: https://github.com/revr3nd/plex-nvdec
Then I will be encoding to h.264 1080p (not sure what bitrate Plex defaults to) and/or 720p 4mbps. Multiply these by the former to get the total performance number and you get twelve tests, more if you encode to different 1080p bitrates.
Basically we need to find out:
- How many FPS can, in this case the P2000, decode when fed 1.)h.265 1080p 8 bit at 10mbps, 5mbps, 3.5mbps and 2.)h.265 1080p 10 bit at the same bitrates. If you want to test more bitrates, feel free to do so.
- How many FPS it can encode to 1080p (whatever mbps Plex transcodes to, maybe 8?) and 720p 4mbps. We could also test different profiles like High performance, High quality etc.
- How much VRAM each of the combinations above consumes and what is the number of streams that either the VRAM, the NVENC chip or the NVDEC chip constrains us to in each of the combinations above.
The only way I see to calculate those FPS numbers is through the use of ffmpeg (and probably the version plex is using). If you have any other ideas please do share.
Total performance is indicated by the lowest FPS number. For example, decoding 1080p h.265 10 bit 10mbps at 200FPS and encoding 1080p h.264 8mbps at 400FPS, 200 FPS is your limit on how fast you can transcode. That also means that cards that have multiple NVDEC or NVENC chips are still limited to their lowest FPS number!! That is why, again in elpamsoft's matrix, the GTX 1070 has 22 max recommended streams. It can encode h.264 at 1262 FPS but it can only decode h.264 at 658 FPS. See " Multiple NVENC Chips " again in the first link.
I know all this information is very baffling at first but it is not that hard to get your head around it. I wish I could do the tests myself but I am still saving up for my P2000. If you want to go through with this and/or have any questions or just need different ideas, feel free to commend/message.
P.S. Please switch the site to HTTPS. I don't feel comfortable typing passwords on HTTP sites.
Moreover, if we have a card with a chip that can decode/encode more streams than the amount that fits in VRAM, it will cause the streams to buffer, but that is better.
I got this wrong. If we have a card that can NVENC/NVDEC less streams than those that fit in VRAM, streams will buffer but clients will not need to stop and restart their playbacks.
First let me apologize for how long it took me to respond to this, unfortunately, because you spent a lot of time researching this topic thoroughly and I wanted to write a meaningful response to it It took me a bit of time to fully wrap my head around everything you had written. Thank you for taking the time to post here.
--I looked into migrating my library to h265 but after quickly going over the power costs to do this (along with the time) it seemed like it would be quite an endeavor to take on.. so i went the direction of a migration through attrition. it's been slow going.
Yeah that elpamsoft article/page is only general data that kind of points you towards whats better or worse.. it's not actually THAT accurate when it comes down to it I've found that memory is helpful but they seem to weigh it a bit unevenly and VRAM == LOTS OF TRANSCODES by their measurement.. that is not really the case as most 1080p transcodes only require 100-300Megabytes of VRAM usually. So an 8GB card could get you between 27-81 which is quite a range depending on bitrate etc.. as you mentioned above
The questions you pose are totally valid and would be difficult to answer but I'll work on them as I think getting to the bottom of what plex is doing and what actual ffmpeg libraries its using (i am pretty sure i read a post on their forum where a dev said it was 'complicated' and they had some custom ffmpeg libraries build for plex.. which is why they haven't updated them in awhile).
I'll get HTTPS going sorry for my laziness LOL i feel your concern and pain.
I want to add one piece to my reply which is this -- I haven't enabled decoding because You can actually get better performance with less VRAM by only doing hardware encoding with the GPU. (IN SOME CASES..)
Let me explain:
I have 2 2690v2 cpus, the DECODE portion is easy for the CPUs to do but the ENCODE part is the harder part. If i JUST do encode on the GPU i end up using 100Megabytes of VRAM on average per stream. This means that with 8GB of VRAM I can theoretically do 80 transcodes just taking into account my total VRAM. This does not ACTUALLY translate to 80 transcodes in the real world NVENC and CPU are my bottlenecks and I can see about 45-50 H265 1080p 6mbps to H264 @ 10mbps transcodes..
I hope this was at least partially helpful to you, i'll try and get some of those tests you requested done and put a video up as i think it would contribute to the community as a whole. Thank you again for contributing what you have here -- I feel honored.
Hey again! Thanks for getting back to me! No worries about late replies etc!
I did not actually know your Plex server specs up until your recent video. Not enabling NVDEC totally makes sense with such powerful CPUs. I was wondering what the performance would be by offloading decoding to the CPU since, again, Elpamsoft's papers suggest that most VRAM is used by the decoding process instead of the encoding (comparing 1080p 15mbps to 1080p and 720p there is only a 20MB difference which hinted me that decoding is most VRAM hungry).
No worries on the HTTPS as well, I was just a bit bummed seeing that and maybe exaggerated a bit.
Your potential findings and research on Plex and ffmpeg libraries is very interesting! Please do update if you have any newer information with an article or video.
Thanks again for your awesome and unique content and for getting back to me!
I have 2 2690v2 cpus, the DECODE portion is easy for the CPUs to do but the ENCODE part is the harder part. If i JUST do encode on the GPU i end up using 100Megabytes of VRAM on average per stream.
I have a question about your current setup. Are you doing DECODE in ubuntu? I know that out of the box it isn't supported, I've been following this thread here, https://forums.plex.tv/t/hardware-accelerated-decode-nvidia-for-linux/233510/355 . Which watching your video with the, it may change soon comment sounds like you have too. I didn't know if you currently use the hack or just doing the decode in software?