-
Notifications
You must be signed in to change notification settings - Fork 29
screencopy: add support for separate cursor capturing #105
base: master
Are you sure you want to change the base?
Conversation
Funny. We had a discussion about this on #sway-devel a few days ago. I'll post it here for reference:
Your proposal is different to what we discussed, but I don't see anything wrong with this approach in a broad sense. Although, maybe cursor position seems a little bit out of place in a screen capturing protocol? |
Interesting, I didn't know about this discussion. Regarding the cursor position, it's perfectly within scope. How else will you know where to put the cursor once you receive it? It can't be tied to |
If you have a virtual pointer, you can know its cursor position. However, I'm not sure it's safe to assume that we're always using this with a virtual pointer. I guess that you could argue that the position of the cursor is required to reconstruct the "whole picture" and is therefore fully in scope for this. |
Requiring a virtual pointer would basically limit this protocol to remote desktop clients. One of the use cases I have in mind is supporting metadata cursor mode in |
I agree sending the cursor position is desirable. The cursor coordinates should be output-buffer-local so that the client can just blit the cursor buffer onto the primary buffer without having to perform any scale/transform/etc guessing work. I'm not sure how to best translate this into protocol design, but I'd prefer to have the output capture and the cursor information to be atomically requested by the client. With this patch the client requests first the cursor info, then captures the output and cursor. All of this relies on global state -- but I don't want to end up in a situation where the client requests cursor data and the compositor has already painted the cursor on the output buffer. |
Regarding using output-buffer-local coordinates for the mouse, I agree it would make the composing the cursor much easier. However, it might interact poorly with I agree the potential race condition is difficult to resolve given the existing protocol design. However, I think the situation you described is only possible when switching between hardware and software cursors. In that case, both the cursor and output will be updated, and the situation should last for no more than one frame, which might be acceptable. One of the ways I thought of is by replacing the |
Maybe we should consider what a "perfect" protocol for capturing cursors would look like and work from there? |
Yes, that sounds like a good idea. I wouldn't be against a screencopy-v2 protocol. With DMA-BUFs and damage tracking v1 has become quite weird. |
But ideally, screencopy-v2 should also include support for capturing toplevels, right? |
I wrote a draft for what a v2 screencopy might look like. The most notable change is the separation of screenshot and screencast into separate objects, so that one object doesn't have to handle both, avoiding in the current "weirdness" with damage tracking and DMA-BUFs in v1. The screenshot part is effectively the the same as version 1 of the v1 protocol, providing screenshots with The screencast portion is designed as a continually updated stream once the client provides a suitable buffer, and it'll continue to send updates until told otherwise. Race conditions between updates are avoided as updates are implemented in the protocol as atomic units between I am not sure how to implement capturing toplevels (with xdg-foreign? wlr foreign toplevel management?) or if it is desirable, but the protocol should be easily extendable to support that use case. |
It would be nice to be able to capture only the cursor without capturing screen. Should be possible with the protocol in this PR, but not with the v2 draft, if I understand correctly. My use case - OBS game capture, image is already captured from the game process itself. |
The v2 draft can be extended to support your use case by adding |
What's the point of the "continue" request? You need to supply a new buffer to be copied into, so the copy/begin request should suffice. It seems that you can only supply one buffer to each screencast begin request. I take it that the user should use this interface to capture either the output buffer or the cursor buffer? If that is to be the way things are done, then perhaps it's better to have separate interfaces for output buffer capturing and cursor capturing. Another way of doing this would be to have the It's probably better to rename the protocol to |
The current v2 draft lets you pass one buffer for the frame that would be reused for every frame. You cause the next frame to be captured by calling The cursor is passed to the client as a I'll think a bit more about this before trying to upstream. |
It's good to be able to use double or triple buffering so that you can wait for a new buffer while you're reading from the last one. If we want the compositor to only copy the damaged regions, we could supply something like "buffer age" with each buffer. See https://emersion.fr/blog/2019/intro-to-damage-tracking/
It does not work in practice. |
Sry late to the party, but what about changing the design to using a capability/request bitmap instead of a cursor mode?
And then have a two stage process:
This way we could extend this protocol very easily with additional modes and assure that all extracted informations are from the same point of time. |
wlr-protocols has migrated to gitlab.freedesktop.org. This pull request has been moved to: https://gitlab.freedesktop.org/wlroots/wlr-protocols/-/merge_requests/105 |
This is a proposal for the version 4 of the screencopy protocol, which will support separate cursor capturing. The intended use case would be for custom composition of the cursor after capture, which is suitable for applications like wayvnc or Looking Glass, and also implementing the metadata cursor mode for xdg-desktop-portal-wlr.
Judging by existing discussions, the ideal place to add this feature would be to the screencopy protocol, which is exactly what is done here.
Issues regarding software cursors has been brought up here. I think there is no real nice solution to capture software cursors separately, nor does it make sense to try. An application could always draw what looks like a cursor on the screen, which could also be nowhere near the actual pointer location. Therefore, specifying 0 for
overlay_cursor
cannot guarantee no visible cursor appears in the captured image.Instead, I think the solution is to capture only hardware cursors separately, and if that is not possible, then we simply report that the cursor is invisible and have it be composed onto the screen. This works for all custom composition use cases. This approach is also taken by other capturing APIs, such as DXGI Desktop Duplication on Windows.
Implementation-wise, this should be fairly simple. The existing
capture_output
andcapture_output_region
can remain unchanged.capture_cursor
can be implemented by reading the hardware cursor, andzwlr_screencopy_pointer_reporter_v1
can be implemented by a few calls inwlr_seat_pointer.c
./cc @emersion @any1 @Xyene