-
-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add H264RtpDepacketizer #1082
Add H264RtpDepacketizer #1082
Conversation
@paullouisageneau can/should I add more fields to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paullouisageneau can/should I add more fields to
Message
? It would be nice to know Duration + Discontinuity
Yes, feel free to add them, but not directly in Message
as it adds overhead to every packet in every transport. A shared_ptr<FrameInfo>
would make sense, like for the reliability information. You should also add the frame timestamp there.
src/h264rtpdepacketizer.cpp
Outdated
auto first = this->rtp_buffer.begin(); | ||
auto last = this->rtp_buffer.begin() + (packets_in_timestamp - 1); | ||
|
||
messages = buildFrame(first, last); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are issues with the handling of messages
. For instance:
- If there is a single timestamp in
rtp_buffer
(for instance a single frame), no new frame is depacketized (because of the break just above), in that case it looks like input messages inmessages
won't be cleared and will leak to the next element in the media processing chain. - If there are two frames depacketized in a single call,
messages
will be replaced for each frame, so frames will be dropped and only the last one will be passed to the chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing parts of the logic but if the principle is to flush the current frame when the next timestamp is seen, couldn't such a simple approach do the job for H264RtpDepacketizer::incoming
?
message_vector result;
for (auto message : messages) {
[...] // check message type and size
auto p = reinterpret_cast<const RtpHeader *>(message->data());
if (!rtp_buffer.empty() && current_timestamp != p->timestamp()) {
result.push_back(buildFrame(rtp_buffer.begin(), rtp_buffer.end()));
rtp.buffer.clear();
}
current_timestamp = p->timestamp();
rtp_buffer.push_back(std::move(message));
}
messages.swap(result);
current_timestamp
could be a class member (or read from a packet in rtp_buffer
before the loop).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both should be handled now!
If incoming RTP packets aren't enough to build a frame
messages.clear()
is called so messages aren't leaked
many frames in a singe call
I merge the lists now. If a incoming RTP packet results in multiple frames being available it works!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks correct now, even if I'm still a bit puzzled by the convoluted approach.
Thank for adding the depacketizer, this is great! For visibility, this PR partially implements #676. |
how to use this depacketizer in c_api ? |
ac302f1
to
b42ff73
Compare
@paullouisageneau can I get another review please! Sorry for the delay I will be on top of this now :) |
src/h264rtpdepacketizer.cpp
Outdated
auto firstByte = std::to_integer<uint8_t>(pkt->at(headerSize)); | ||
auto secondByte = std::to_integer<uint8_t>(pkt->at(headerSize + 1)); | ||
auto naluType = firstByte & naluTypeBitmask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for redefining the parsing logic and constants rather than relying on helpers structs in nalunit.hpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paullouisageneau Wasn't aware of it! I just looked a bit and I don't believe they are applicable.
nalunit.hpp
seems to be just concerned with detecting/splitting NAL units and not the actual understanding of them?
I am all for expanding nalunit.hpp
to include this logic also though if you want that in this commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. No need to expand the logic, but maybe you could only use the header struct from nalunit.hpp
to read the fields here?
src/h264rtpdepacketizer.cpp
Outdated
auto first = this->rtp_buffer.begin(); | ||
auto last = this->rtp_buffer.begin() + (packets_in_timestamp - 1); | ||
|
||
messages = buildFrame(first, last); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks correct now, even if I'm still a bit puzzled by the convoluted approach.
@paullouisageneau Can I get another review please? I also fixed the 'Outdated' comments also. I can't respond to them inline on GitHub though :/ |
b42ff73
to
838e21f
Compare
After this commits lands I am going to add Thanks for merging+reviewing so much @paullouisageneau |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Sean-Der It looks good, thank you for your work! Would you mind replacing the firstByte
and secondByte
manipulation with casts to NalUnitHeader
and NalUnitFragmentHeader
so it is not implemented twice?
838e21f
to
ee1b355
Compare
@paullouisageneau Done! Can I get another review? I added one small method to With some more refactoring/exposing things we could drop even more. |
Inverse of H264RtpPacketizer. Takes incoming H264 packets and emits H264 NALUs. Co-authored-by: Paul-Louis Ageneau <[email protected]>
ee1b355
to
70a1fc3
Compare
@paullouisageneau Ok I think I got it this time :) Mind taking a look and if this is good going to start Opus + FrameInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good, thank you! If you have the opportunity to add Opus and FrameInfo
metadata, I would be my pleasure to review it.
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082. I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader). So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798): > A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.* "Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs. With this commit I can now do the following: * Generate encoded bitstream output from the Intel VPL encoder. * Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler. * Transport the video track over a WebRTC connection to a libdatachannel peer. * Depacketize it with the `H265RtpDepacketizer` media handler in this commit. * Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082. I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader). So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798): > A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.* "Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs. With this commit I can now do the following: * Generate encoded bitstream output from the Intel VPL encoder. * Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler. * Transport the video track over a WebRTC connection to a libdatachannel peer. * Depacketize it with the `H265RtpDepacketizer` media handler in this commit. * Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082. I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader). So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798): > A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.* "Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs. With this commit I can now do the following: * Generate encoded bitstream output from the Intel VPL encoder. * Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler. * Transport the video track over a WebRTC connection to a libdatachannel peer. * Depacketize it with the `H265RtpDepacketizer` media handler in this commit. * Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082. I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader). So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798): > A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.* "Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs. With this commit I can now do the following: * Generate encoded bitstream output from the Intel VPL encoder. * Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler. * Transport the video track over a WebRTC connection to a libdatachannel peer. * Depacketize it with the `H265RtpDepacketizer` media handler in this commit. * Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082. I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader). So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798): > A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.* "Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs. With this commit I can now do the following: * Generate encoded bitstream output from the Intel VPL encoder. * Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler. * Transport the video track over a WebRTC connection to a libdatachannel peer. * Depacketize it with the `H265RtpDepacketizer` media handler in this commit. * Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082. I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader). So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798): > A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.* "Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs. With this commit I can now do the following: * Generate encoded bitstream output from the Intel VPL encoder. * Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler. * Transport the video track over a WebRTC connection to a libdatachannel peer. * Depacketize it with the `H265RtpDepacketizer` media handler in this commit. * Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082. I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader). So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798): > A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.* "Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs. With this commit I can now do the following: * Generate encoded bitstream output from the Intel VPL encoder. * Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler. * Transport the video track over a WebRTC connection to a libdatachannel peer. * Depacketize it with the `H265RtpDepacketizer` media handler in this commit. * Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
Inverse of H264RtpPacketizer. Takes incoming H264 packets and emits H264 NALUs.