Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: bgr0 input, yuv444 10 bit output? #488

Open
markg85 opened this issue Jul 4, 2024 · 19 comments
Open

[Question]: bgr0 input, yuv444 10 bit output? #488

markg85 opened this issue Jul 4, 2024 · 19 comments
Labels

Comments

@markg85
Copy link

markg85 commented Jul 4, 2024

Hi,

I went though a lot of documentation and code and could find this little clue.

Do note that this is for desktop streaming. I want to preserve as much quality as possible so i'd prefer to not even convert from bgr0 to anything else. If i look at the features that HEVC as container format should support in terms of color spaces then RGB is definitely one of them.

That makes me assume the AMF codec can handle bgr0 as format.
The question is: How do i output a format that both AMF and HEVC support that preserves the most quality?

I have ffmpeg with the 10 bit for HEVC with AMF working. If there's a hint for what to add to support this then i can probably hack it in. I just need the hint to know what to do :)

Best regards,
Mark

@MikhailAMD
Copy link
Collaborator

I understand the need and here are few points:

  • The best way to preserve small details for VDI (desktop streaming) use case is 4:4:4 encoding/decoding. Currently it is not supported by AMD HW.
  • HEVC or AV1 10-bit encoding allows preservation of more color information. This is needed for HDR use case.
  • Direct submission of RGBA surfaces into AMF encoder doesn't change the fact that they will be color-converted to YUV 4:2:0. The main benefit here is performance. The conversion will happen inside VCN encoder and is "free" meaning it doesn't add latency, doesn't use GFX or Compute GPU engines and doesn't add memory bandwidth. From quality perspective RGBA input doesn't change anything.
  • Another thing affecting encoded image quality is rate control made, and various rate control parameters. For VDI, usually makes sense to use variable bitrate, increase peak bitrate and VBV.

@markg85
Copy link
Author

markg85 commented Jul 4, 2024

Hi @MikhailAMD,

Thank you for you helpful and informative answers, as always! That's much appreciated :)

Your suggestions are nice and make sense but it's only to get the most quality out of a limited color format. For desktop streaming that's just gonna give you a washed out desktop.

Considering that the AMF API does know how to convert many color formats to 4:2:0 it isn't a far flung thought to assume the encoder can handle those too, can it not? Else the VCN hardware literally is a pipeline of Many formats in -> 1 format out. Possible, but it sounds like a waste of resources to not add in the extra bits to at least to a 1:1 format mapping (no conversion), to me that seems technically easier too.

I suppose i'm asking "in subtle ways" if AMF, with the current hardware of say VCN 4.0, could technically support higher quality color spaces?

@MikhailAMD
Copy link
Collaborator

MikhailAMD commented Jul 4, 2024

You need to clarify term "quality". It can mean multiple things:

  • Small details
  • Color range
  • Compression artefacts

Also need to clarify "support higher quality color spaces". One thing is input format support. Another thing is encoder output support.
The result quality is defined by properties of H264 or HEVC or AV1 specs supported by current HW, which is following:

  • 4:2:0 only
  • 8 or 10bit
  • Full or studio color range. (forgot to mention earlier)
  • Bitrate and all related

@markg85
Copy link
Author

markg85 commented Jul 4, 2024

Ha :)
Oke, "quality" in this context means having the capacity for a high bitrate (say up to a gigabit if needed but realistically it should be doable in 20mbit max). The result essentially needs to be "pixel perfect" but then with video compression to make it feasible to stream a desktop over a local lan network. Ideally capped at 20mbit so that it's not exclusive to lan but can be used over the internet too. That does add in a whole range of different issues but let's not go there in this post.

It doesn't have to output 10 bit as long as it does the encoding based on 10 bit. This is solely to reduce banding hell.
Everything thus far is possible with the current AMF HEVC codec.

In terms of color this is still poor quality as the desktop stream would look washed out. So the YUV compression (it's color/luminosity compression) of 4:2:0 is just too little color data (chrominance signals, the U and V) to have a good looking desktop stream. 4:2:2 would already look much better and perhaps even be sufficient. But ideally 4:4:4 is what's needed here.

@MikhailAMD
Copy link
Collaborator

IMHO, 4:2:0 is not related to washed-out color but only to reduced clarity of the text on symbol borders.
For washed-out color you may check AMF_VIDEO_CONVERTER_COLOR_PROFILE_FULL_709 if you are using AMF color converter. If you directly submit RGBA to AMF HEVC encoder, you may set AMF_VIDEO_ENCODER_HEVC_NOMINAL_RANGE_FULL. Or similar property to your converter. Of cause, the color converter on the client side should also use corresponding "full" parameter.
And 8-bit HEVC should more enough to show full gamut of desktop, no need for 10 bit HEVC.
You may also make a small experiment: record desktop to a file with AMF DVR sample and then replay it on the same machine to check if you have significant color wash-out. If you see the difference, send a screenshot with desktop and player in view.

@markg85
Copy link
Author

markg85 commented Jul 4, 2024

Thank you so much for your hints @MikhailAMD!

I'm passing in bgr0 and changed the out color profile to AMF_VIDEO_CONVERTER_COLOR_PROFILE_FULL_709 this gives me this result (forget the small mis-alignment):
image
Top row: AMF
Bottom row: local native
This is near perfect, good enough :)

Another snapshot:
image
It's obvious which side is AMF.

Now gradients. Color banding hell...
image

You can easily see banding in the bottom part, that's AMF.
After changing AMF to force 10 bit color input (and changing the input format to p010le) i get this as result:
image

Looks the same as 8 bit? Yup, that's right. No change indeed.

Below is a full settings output of 8 bit and bgr0, i found these settings to be working quite well.

0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcUsage:5
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcFrameSize:2560,1440
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcFrameRate:60,1
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcProfile:2
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcTier:0
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcQualityPreset:0
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorProfile:7
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcNominalRange:1
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcColorBitDepth:8
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorTransferChar:2
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorPrimaries:2
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcGOPSPerIDR:1
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcGOPSize:250
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcDeBlockingFilter:false
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcHeaderInsertionMode:0
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcRateControlPreAnalysisEnable:0
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcRateControlMethod:0
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcEnableVBAQ:false
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcHalfPixel:true
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcQuarterPixel:true
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcEnforceHRD:false
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcFillerDataEnable:false
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcTargetBitrate:15000000
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcPeakBitrate:15000000
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcVBVBufferSize:15000000
0124-06-05 01:40:00 6CC006C0 [AMFEncoderCoreHevc]   Debug: AMFEncoderCoreHevcImpl::Init(BGRA, 2560, 1440)

If we can only fix that banding, that'd be super! Yes, i'm using main10 so that's not it...
I tried forcing it to 10 bit... But if bgr0 is the input then the video turns completely green!
And with p010le i can have 10 bit but it doesn't help one bit (hehe pun) in mitigating color banding.

@MikhailAMD
Copy link
Collaborator

OK, it is more clear now. The gradient bands you have are most likely due to compression, not color conversion. I would recommend several changes in your parameters:

  • HevcRateControlMethod:0 (constant QP) change to 2 (VBR with PEAK).
  • HevcPeakBitrate:15000000 to 25000000 (30% or more than HevcTargetBitrate)
  • HevcGOPSize:250 - every 250 frames you will see IDR frame. They are more compressed than P- frames so periodic loss of quality. Some applications set GOP to 0 - no IDRs and if frame is lost explicitly request IDR by setting property on input surface.
  • As an experiment set HevcTargetBitrate to 50000000 and HevcPeakBitrate to 70000000- check if if you have any difference.

@markg85
Copy link
Author

markg85 commented Jul 6, 2024

Hi @MikhailAMD

Thank you for these suggestions!
Changing to VBR does indeed seem to give slightly better results. And certainly gives more control in bandwidth as the bitrate is now actually used...

I tried vbr_latency too but i did't notice anything different. Do you have any numbers on what the latency difference between just vbr and vbr_latency should be?

I also tried hqvbr but that failed:

[hevc_amf @ 0x625d4dc160c0] encoder->Init() failed with error 4
[vost#0:0/hevc_amf @ 0x625d4dc15dc0] Error while opening encoder - maybe incorrect parameters such as bit_rate, rate, width or height.
[vf#0:0 @ 0x625d4dc16680] Error sending frames to consumers: Internal bug, should not have happened
[vf#0:0 @ 0x625d4dc16680] Task finished with error code: -558323010 (Internal bug, should not have happened)
[vf#0:0 @ 0x625d4dc16680] Terminating thread with return code -558323010 (Internal bug, should not have happened)
[vost#0:0/hevc_amf @ 0x625d4dc15dc0] Could not open encoder before EOF
[vost#0:0/hevc_amf @ 0x625d4dc15dc0] Task finished with error code: -22 (Invalid argument)
[vost#0:0/hevc_amf @ 0x625d4dc15dc0] Terminating thread with return code -22 (Invalid argument)
[out#0/nut @ 0x625d4dc14500] Nothing was written into output file, because at least one of its streams received no packets.

A big issue i still have is YUV 4:2:0.
In general the quality is great (enough) and certainly very usable. Having it be through hardware instead of software also is a great help! But... YUV... Specially red text is sooooo freaking ugly:
image

Forget what i'm doing there (certainly not playing with flv... but i tried out some ffmpeg container formats for latency reasons). Just having YUV 4:2:2 would already be such an immense improvement here!

For completeness sake, here's my full set of settings:

0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcUsage:5
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcFrameSize:2560,1440
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcFrameRate:60,1
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcProfile:2
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcTier:0
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcQualityPreset:0
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorProfile:7
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcNominalRange:1
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcColorBitDepth:8
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorTransferChar:2
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorPrimaries:2
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcGOPSPerIDR:1
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcGOPSize:60
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcDeBlockingFilter:false
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcHeaderInsertionMode:0
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcRateControlPreAnalysisEnable:0
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcRateControlMethod:1
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcVBVBufferSize:8000000
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcInitialVBVBufferFullness:48
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcEnableVBAQ:false
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcHalfPixel:true
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcQuarterPixel:true
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcEnforceHRD:false
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcFillerDataEnable:false
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcTargetBitrate:10000000
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcPeakBitrate:12000000
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcPeakBitrate:20000000
0124-06-06 15:07:18 C36006C0 [AMFEncoderCoreHevc]   Debug: AMFEncoderCoreHevcImpl::Init(BGRA, 2560, 1440)

A couple observations:

  • I used a GOP of 60 instead of 0. I got weird artifacts at 0 and it looks nice at 60 as well so.. why not ;)
  • Note the multiple SetProperty TL0.QL0.HevcPeakBitrate... yeah, no clue. I'm trying this all through ffmpeg and in my command line i just have one -maxrate 20M so no clue why there's 2 with different values there.
  • I added a -bufsize 8M because of the explenation of bufsize from ffmpeg here. The way i read it is that this property tries to keep the bitrate around that number. Not set on this yet, i still have to play a bit more with it.

This is how it would look like with YUV 4:2:2 (tried on libx264)
image

It's still bad but acceptable.
And just for fun, here's YUV 4:4:4 (also on libx264)
image

But as i want to stay within 20 mbit/sec i'm guessing YUV 4:2:2 + HEVC is the maximum achievable. Before one thinks AV1 here, that really doesn't matter much. HEVC and AV1 have similar bandwidth usages. A practical difference here is that i can have a hardware encode on one pc and hardware decode on the other with HEVC. I can't with AV1 (receiving end doesn't have AV1 hardware decoding).

My wishlist for encoding here:

  • YUV 4:2:2! Preferable also 4:4:4
  • 10 bit input encoding from rgb0 as input. I can just see stepping in gradient with 8 bit.

Now here's a fun little detail. While i am using main10 all the time and rgb0 all the time, for AMF HEVC that is converted into yuv420p. But if i keep everything the same but just change my input from bgr0 to yuv420p10le the encoder errors with:

[hevc_amf @ 0x64f9b8fda0c0] encoder->Init() failed with error 14
[vost#0:0/hevc_amf @ 0x64f9b8fd9dc0] Error while opening encoder - maybe incorrect parameters such as bit_rate, rate, width or height.
[vf#0:0 @ 0x64f9b8fda680] Error sending frames to consumers: Internal bug, should not have happened
[vf#0:0 @ 0x64f9b8fda680] Task finished with error code: -558323010 (Internal bug, should not have happened)
[vf#0:0 @ 0x64f9b8fda680] Terminating thread with return code -558323010 (Internal bug, should not have happened)
[vost#0:0/hevc_amf @ 0x64f9b8fd9dc0] Could not open encoder before EOF
[vost#0:0/hevc_amf @ 0x64f9b8fd9dc0] Task finished with error code: -22 (Invalid argument)
[vost#0:0/hevc_amf @ 0x64f9b8fd9dc0] Terminating thread with return code -22 (Invalid argument)

But ... if i change my input from yuv420p10le to p010le then it works! It does add a color conversion in software where it would be much better if that was done internally with bgr0 in AMF. It does show that the AMF encoder probably already has everything it needs to do this.

@markg85
Copy link
Author

markg85 commented Jul 6, 2024

What is the correct AMF surface format for yuv420p10le (AV_PIX_FMT_YUV420P10LE)? Not AMF_SURFACE_P010?

@markg85
Copy link
Author

markg85 commented Jul 6, 2024

Hmm, a new observation.
image

I've noticed that with p010le as format, everything looks a there a soft grey filter over the whole capture.
In the above image, a section of the Steam UI where it's most notable, the top is p010le, the bottom is rgb0.

In both cases the format is passed in as ffmpeg -pix_fmt <format>.

It's tricky to find out if there's something wrong with AMF, with the arguments i give it or with the software format conversion. I tried passing in this exact format to libx264 as well but it doesn't recognize this format and automatically switches to yuv420p10le ... It does look as expected (no grey veil over it). So inconclusive... I'm guessing AMF but that might as well be just a missing or wrong argument somewhere.

To aid debugging, here's the full set of properties passed into AMF with p010le:

0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcUsage:5
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcFrameSize:2560,1440
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcFrameRate:60,1
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcProfile:2
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcTier:0
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcQualityPreset:0
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorProfile:7
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcNominalRange:1
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcColorBitDepth:10
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorTransferChar:2
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorPrimaries:2
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcGOPSPerIDR:1
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcGOPSize:60
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcDeBlockingFilter:false
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcHeaderInsertionMode:0
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcRateControlPreAnalysisEnable:0
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcRateControlMethod:2
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcVBVBufferSize:5000000
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcInitialVBVBufferFullness:48
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcEnableVBAQ:false
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcHalfPixel:true
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcQuarterPixel:true
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcEnforceHRD:false
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcFillerDataEnable:false
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcTargetBitrate:10000000
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcPeakBitrate:15000000
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcPeakBitrate:20000000
0124-06-06 21:22:26 A7E006C0 [AMFEncoderCoreHevc]   Debug: AMFEncoderCoreHevcImpl::Init(P010, 2560, 1440)

@DimkaTsv
Copy link

DimkaTsv commented Jul 7, 2024

YUV 4:2:2! Preferable also 4:4:4

We all wish... But, currently AMD hardware does not support it, so it would be impossible to use even if there were AMF parameters for this. But we can wait for 10-bit preanalysis meanwhile).

In general the quality is great (enough) and certainly very usable. Having it be through hardware instead of software also is a great help! But... YUV... Specially red text is sooooo freaking ugly:

Weird. For my captures (did a screenshot of your YUV444 sample, pasted with 100% resolution scaling and captured window with screenshot), text sharpness doesn't drop as much as red color precision (due to YUV conversion).
YUV420 / NV12 / Rec. 709 based on OBS parameters. (This is 8-bit HEVC)
image

YUV420 / P010 / Rec. 709 (this is 10-bit HEVC)
image

My 10-bit 420 result looks closer to your 422 example and text has slightly higher clarity, imo. Still not as bright and precise as 444 capture sample, ofc. But i guess you can check if 10-bit encode makes things better on your side?
May i also ask what device you try to do encode on? (mine is 7800XT) Maybe there are some architectural differences that could've lead to this result? Or is it potential difference in software/OS?

@markg85
Copy link
Author

markg85 commented Jul 7, 2024

Hi @DimkaTsv,

While it might seem like you're doing an apples-apples comparison, i think it's more like an apples-oranges one in this case :)

OBS does a lot. I don't know what but you should remove external factors and limit it to just an ffmpeg command. That's what i'm testing here after all. The eventual intent is live desktop streaming for my own use. There's no OBS in there at all. As you asked, my GPU is an 7900XT.

I did do another test with bgr0 and nv12 both at 8 bit (as 10 bit seems to introduce some grey veil annoyance).
Even this has a surprising color difference as output!
image

The AMF properties for BGR0:

0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcUsage:5
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcFrameSize:2560,1440
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcFrameRate:60,1
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcProfile:1
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcTier:0
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcQualityPreset:0
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorProfile:7
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcNominalRange:1
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcColorBitDepth:8
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorTransferChar:2
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorPrimaries:2
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcGOPSPerIDR:1
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcGOPSize:250
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcDeBlockingFilter:false
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcHeaderInsertionMode:0
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcRateControlPreAnalysisEnable:0
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcRateControlMethod:0
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcEnableVBAQ:false
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcHalfPixel:true
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcQuarterPixel:true
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcEnforceHRD:false
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcFillerDataEnable:false
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcTargetBitrate:15000000
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcPeakBitrate:15000000
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcVBVBufferSize:15000000
0124-06-07 16:50:02  42006C0 [AMFEncoderCoreHevc]   Debug: AMFEncoderCoreHevcImpl::Init(BGRA, 2560, 1440)

And the AMF properties for NV12:

0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcUsage:5
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcFrameSize:2560,1440
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcFrameRate:60,1
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcProfile:1
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcTier:0
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcQualityPreset:0
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorProfile:7
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcNominalRange:1
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcColorBitDepth:8
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorTransferChar:2
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcOutColorPrimaries:2
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcGOPSPerIDR:1
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcGOPSize:250
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcDeBlockingFilter:false
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcHeaderInsertionMode:0
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcRateControlPreAnalysisEnable:0
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcRateControlMethod:0
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcEnableVBAQ:false
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcHalfPixel:true
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty HevcQuarterPixel:true
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcEnforceHRD:false
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcFillerDataEnable:false
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcTargetBitrate:15000000
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcPeakBitrate:15000000
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: SetProperty TL0.QL0.HevcVBVBufferSize:15000000
0124-06-07 16:54:12 5B0006C0 [AMFEncoderCoreHevc]   Debug: AMFEncoderCoreHevcImpl::Init(NV12, 2560, 1440)

In both cases the output is yuv420p.

I tried the same nv12 on libx264 and while it doesn't look exactly like the the above bgr0 image it does match it very closely. It's definitely different. This does lead me to think that there is something wrong with color conversions in AMF. It might be a flag that needs tweaking but perhaps i'm also hitting up against actual driver bugs here?

@DimkaTsv
Copy link

DimkaTsv commented Jul 7, 2024

While it might seem like you're doing an apples-apples comparison, i think it's more like an apples-oranges one in this case :)

You definitely have a point here, but as a simple Windows user, and not so advanced at that, it is quite hard for me to write own screen capture implementation. So i used what i have to make a comparison, sorry for that.
On other hand... You also comparing apple to pears in this case. YUV to YUV will lose more of a color data, while RGB to YUV will also lose brightness, unless loseless conversion of course. If i am wrong, feel free to correct me.

As you asked, my GPU is an 7900XT.

Ok, that should rule out HW diff, then. Sorry for bothering. It's not that i am able to resolve a problem, just pointed out that text clarity / color precision with red is quite common issue, but not to this extent (even YouTube will degrade red text quality quite significantly). I remember having similar colors on capture with old 1060 and 2060. Granted, NVENC does support YUV444, so there is that.

Even found some old recordings i have. And red color on them is very similar looking.
For example - 2060 default capture
image

7800XT default capture
image

So from my experience color capture with 8bit 420 techniques in a whole is far from perfect.
I also heard there is libx264rgb for more precise color capture, as it is patched to take RGB input?

@markg85
Copy link
Author

markg85 commented Jul 7, 2024

You definitely have a point here, but as a simple Windows user, and not so advanced at that, it is quite hard for me to write own screen capture implementation. So i used what i have to make a comparison, sorry for that.

I'm also using existing tools. ffmpeg :) Well, i did patch in the 10 bit support but only because a patch already existed and it seemed easy enough for me to add it.

On other hand... You also comparing apple to pears in this case. YUV to YUV will lose more of a color data, while RGB to YUV will also lose brightness, unless loseless conversion of course. If i am wrong, feel free to correct me.

It's more complex.

  • Capture surface RGB -> RGB pix format (no software conversion on the cpu) -> yuv420 output (converted by AMF internally because it understands RGB input)
  • Capture surface RGB -> nv12 pix format (software conversion) -> yuv420 output (converted by AMF internally because it understands nv12 input)

I get that there is a difference with RGB -> NV12.
But i don't get the difference between NV12 (cpu) and NV12 (AMF)...

Now i could be wrong but i think the nv12 -> yuv420 conversion should be lossless. They both represent the exact same data but just in a different memory layout. This does rests on the assumption that i'm correct here which would also mean that AMF does something with nv12 input that libx264 doesn't. The output is different while it shouldn't.

I also heard there is libx264rgb for more precise color capture, as it is patched to take RGB input?
Yep, i know. It works wonderfully!
But it has 2 very big downsides.

  1. It's cpu bound. So if you stream a game that is taxing on your cpu then your encoding - specially your mouse - will go nuts. I tried, it's not usable. For non-gaming desktop streaming it works perfectly fine.
  2. In my case i'm streaming 1440p. I want high quality. Which means, for x264(rgb), that you'd be easily have a bitrate of 40-100 mbit/sec. Again, no problem in a local network environment (my case) but just too much for over the interwebs.

@DimkaTsv
Copy link

DimkaTsv commented Jul 7, 2024

Again, no problem in a local network environment (my case) but just too much for over the interwebs.

For interwebs to think about reasonable bitrate... You have own streaming server hosted somewhere?
Otherwise, if you intend to use Twitch or YouTube, then your key problem will primarily be ingest server limitation. And at such low bitrare color precision would often be much less of a problem compared to general image sharpness and movement quality. Such is a tradeoff...

Twitch limits you to 6mbps unless you popular afaik, and that forces even 1080p users to downscale their streams to 720p, not to say 1440p. Or use capture cards and second dedicated PC with CPU encode just to squeeze more quality. And if you go over limit, then ingest server will transcode source into lower quality stream anyways iirc.

One reason i mention this, is that you concidered raw RGB or YUV444 encode, which, despite giving you higher color precision, will also likely require higher bitrate for same movement quality due to more data being encoded. But issue you still is being left atm, is that AMD HW physically doesn't support YUV444, and 10bit stream support is still pretty limited.

@markg85
Copy link
Author

markg85 commented Jul 7, 2024

@DimkaTsv Those are all assumptions that are just not relevant. I don't think i ever even hinted at streaming to a webservice.

You apparently want to know... My case is simple. I have a local pc/server "in a closet" that is a powerhouse. I have thin/light devices elsewhere (laptop, raspberry pi, etc..). Within this local network environment i want to stream the gorgeous graphics from my high powered "server" to my low powered devices. With that in mind, i don't care if it's 10mbit, 100mbit or even a gigabit! As long as it works and feels fast en fluid.

You didn't aks/imply about this yet but i'll answer it in advance 😛 I'm not going for the sunshine/moonlight combination because those are pre-made where it would be much harder for me to try out a tiny change. That's - to me - surprisingly easy to do in ffmpeg. For reference, within a minute i can make an ffmpeg one-liner change, build it and run with it.

Now i prefer to also have this high quality desktop stream working over the internet - for my own personal private use, not for youtube/twitch/whatever - and for that reason i try to make something work that i deem acceptable within 20mbit/sec (also because that's my max upload bandwidth..).

Does that answer your assumptions/questions? 😉

@MikhailAMD
Copy link
Collaborator

There is a lot of good discussions here, be sure they are noted as well as the wish list. On practical terms:

In one of the parameter sets from @markg85 (not the latest one) I see couple of problems:

  • HevcVBVBufferSize:5000000 - should be close to target bitrate for VDI use case.
  • HevcInitialVBVBufferFullness:48 - should be max - 100
  • If you set target bitrate before setting peak bitrate AMF tries to adjust peak bitrate so you see additional trace.

vbr_latency forces less variations in frame-over-frame bitrate compared to vbr_peak.

HEVC 10 bit (main10) - several points:

  • Main input YUV format is P010
  • yuv420p10le is not supported
  • HEVC 10 bit is designed for HDR.
  • Configuration starts with setting COLOR_BIT_DEPTH to 10
  • Submitting RGBA8 for HEVC 10bit you are asking for SDR-to-HDR conversion. Currently HW doesn't support it. Internal shader converter will do the job.
  • HDR inputs are P010, R10G10B10A2, RGBA_F16.
  • Any RGBA input for HEVC 10-bit requires all color parameters (INPUT and OUTPUT) to be set. In addition INPUT_HDR_METADATA has to be set. Grey effect indicates that color conversion is not properly programmed.
  • If you submit P010 to AMF encoder and do color conversion using FFmpeg filter you still need to set a lot of color parameters to the filter or SDR to HDR conversion will not be correct. Also you need to set AMF OUTPUT color parameters to inform further decoder on color properties.
  • Most of INPUT color conversion parameters come from IDXGIOutput. The OUTPUT you may want to hard-code or use defaults.

Pls remind me if I missed something.

@markg85
Copy link
Author

markg85 commented Jul 8, 2024

Hi @MikhailAMD,

So the 10 bit support for the VDI case isn't much useful i suppose? As that requires HDR on the AMF side to make things work properly. That seems a bit much, i might play with it to see if i can get it to work properly but from what you've said it seems like 8 bit is the way to go instead.

Regarding HevcInitialVBVBufferFullness:48, nice hint! I did not set anything with that value so i'll dig into ffmpeg and see if a can change it and if that changes the results much.

What i'm missing is why the BGR0 vs NV12 as input have a (slight but observable) different output color? I't all 8 bit in this case, no HDR or 10 bit. The output is yuv420p so the conversion inside AMF must be from BGR0 -> -> yuv420p. This should be the same color output as if i were passing in nv12 yet it's not. Do you have a clue about what's going on here?
image

@MikhailAMD
Copy link
Collaborator

It looks like you are using FFmpeg to convert from BGRA to NV12 and FFmpeg color converter is programmed to apply STUDIO range. The STUDIO range keeps color values for luma 16-235 so you don't have true black. For chroma it is 16-240. Full range is 0-255 for both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants