Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3981: /relations recursion #3981

Merged
merged 25 commits into from
Jan 30, 2024
Merged
Changes from 16 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
098b343
Initial Draft for MSC 3981: /relations recursion
justjanne Mar 19, 2023
abddab8
fix accidentally copy-pasted title
justjanne Mar 20, 2023
4f61e43
Remove unnecessarily specific sentence
justjanne Mar 22, 2023
381a355
Add warning to avoid unlimited recursion
justjanne Mar 22, 2023
bbcc81a
Clean-up links in MSC
justjanne Mar 28, 2023
f7e5ca3
Apply reviewer suggestions
justjanne Mar 28, 2023
ffb316d
More clarifications on examples
justjanne Mar 28, 2023
077f6b1
Address feedback on formatting
justjanne Apr 19, 2023
c638630
Address feedback on linking related specs
justjanne Apr 19, 2023
dad341e
Rephrase technical definition of the parameter
justjanne Apr 19, 2023
6d88fe8
Correct mistake in examples
justjanne Apr 19, 2023
123eef1
Rephrase paragraph on O(n+1) issue
justjanne Apr 19, 2023
65cab28
fix: correct mixup between event type and rel type
justjanne May 9, 2023
6baafd8
feat: add clarification for why this was needed
justjanne May 9, 2023
b6119c5
feat: add clarification for how it affects intermediate events
justjanne May 9, 2023
cfd4223
feat: add clarification for how it affects intermediate events
justjanne May 9, 2023
94561ad
Add /version unstable feature flag
weeman1337 May 11, 2023
f404254
feat: improve phrasing of unstable prefix section
justjanne May 11, 2023
03e8d68
feat: add clarification for how clients can make use of this
justjanne May 11, 2023
0e50bdb
Clarify unstable features map
dbkr Dec 13, 2023
a4d8e73
Attempt at resolving the discussion threads for 3981.
dbkr Dec 13, 2023
a027354
Add note that security is discussed elsewhere.
dbkr Jan 3, 2024
6ace9f4
Add recursion_depth response parameter.
dbkr Jan 3, 2024
c160d36
Note that recursion_depth is sent un-prefixed.
dbkr Jan 3, 2024
bdae577
Add summary of security discussions
dbkr Jan 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions proposals/3981-relations-recursion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# MSC3981: `/relations` recursion

The [`/relations`] API allows clients to retrieve events which directly relate
to a given event.

This API has been used as basis of many other features and MSCs since then,
including threads.

Threads was one of the first usages of this API that allowed nested relations -
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR I think this is only an issue with thread & reference relations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is saying only thread & reference relations can have second-order relations? In which case I think probably (although I'd be surprised if references ever did) but I'm not sure it makes a huge difference here.

an event may have an [`m.reaction`] or [`m.replace`] relation to another event,
which in turn may have an `m.thread` relation to the thread root.

This forms a tree of relations, which can currently only be traversed
efficiently in hierarchical, but not in chronological order. Yet, for some
functionality – e.g., to determine which event a read receipt should
reference as per [MSC3771] – chronological order is necessary.
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved

justjanne marked this conversation as resolved.
Show resolved Hide resolved
Previously, clients would be unable to obtain a consistent ordering of
events in threads or related to threads.
Workarounds such as sending a separate `/relations` request per individual
message in each thread are only able to approximate the actual ordering,
as they rely on timestamps.
Copy link
Contributor

@MadLittleMods MadLittleMods May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanding from my previous comment asking for context,

This paragraph touches a bit on the context of why which is good but not technically why. Perhaps an example like below but where it breaks down with the current implementation. Or point out in that example how it breaks down without this MSC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully I've made this a lot more straightforward now, ie. for me it's really just a question of needing a way to get all the events you need to render a thread, and that includes relations to the thread events.


## Proposal

It is proposed to add the `recurse` parameter to the `/relations` API, defined
as follows:

> Whether to include events which relate only indirectly to the given event.
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved
>
> If set to false, only events which have direct a relation with the given
> event will be included.
>
> If set to true, all events which relate to the given event, or relate to
> events that relate to the given event, will be included.
>
> It is recommended that at least 3 levels of relationships are traversed.
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
> Implementations should be careful to not infinitely recurse.
Copy link
Contributor

@ShadowJonathan ShadowJonathan May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph weakens the reliability of the behaviour, IMO, because the most pessimistic outcome of using this API would be that you only have 3 levels of recursively related events, while the behaviour of infinite recursion is not defined.


To address the levels problem, what about making this parameter (or another parameter) accept the amount of levels a client would see the server want to try to resolve? (The server would simply go up to that many levels of search, specifying recurse=1000 to a single-relation event would not cause the server to do any extra work)


To address the issue of infinite recursion, while I'm currently not aware of any way an event can be edited to re-define its relation to another event, it doesn't mean that such a feature could exist in the future (to, for example, edit the message you've replied to, when you made a mistake).

As such, what about specifying that, if a server detects a cycle, it will simply cut it off, and return as-is? This way, the guarantee that it returns all recursively-related events at least once will still be satisfied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph weakens the reliability of the behaviour, IMO, because the most pessimistic outcome of using this API would be that you only have 3 levels of recursively related events

As there are currently no relations defined that allow you to get more than 3 levels of relations, that's an entirely academic exercise for the time being.

I'd suggest that we change this if and when we've defined types of relations that can nest deeper than this. At that point we'd also have a better understanding of what the edge cases might look like.

Copy link
Contributor

@ShadowJonathan ShadowJonathan May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is is that this adds tech debt, if the intent is to recurse a "reasonable" amount of times, and "reasonable" is considered to be 3 levels - due to the current set of relations there exists - at the time of writing, then that wording should be used.

An exact number shouldn't be used, because that bit of behaviour will surely become outdated when more relation types emerge, and would require an MSC to change, as it is defined behaviour.

The fact that this number is 3, given reasonable optimisation concerns and trade-offs, should be an implementation detail, and not a recommendation.


What I'm more worried about is that this sentence weakens the guarantee that all recursively related events are returned, and that a client would have to call this endpoint again with some select events "just to be sure" that they got "all of them", if the client wants to be pedantic. (Which it must, if it wants to display consistent and correct behaviour, when the behaviour of the server is to not return "all" recursively referred events, but only "some", capping off at an arbitrary level)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clokep You originally suggested the number of 3, what's your opinion on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I am inclined to agree that it's easy to imagine needing >3 and having to change this.

Maybe a number like 10 could be a good minimum - then we'd "know" that usage patterns similar to what exists now would definitely be covered, and the only cases that would not be would be ones that allow arbitrary-length reference chains.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clokep You originally suggested the number of 3, what's your opinion on this?

This text was the recommended text (I think from @turt2live) in MSC3771 and MSC3773 and exists in a similar form in the spec today:

Following the event relationships, it has a parent event which qualifies for one of the above. Implementations should not recurse infinitely, though: a maximum of 3 hops is recommended to cover indirect relationships.

tl;dr I'd leave it as is, but maybe the section on why a depth parameter was chosen needs to be expanded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least 3 levels

Should this not be "at most 3 levels"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the spec is trying to ensure servers do the bare minimum level of recursion here: if they want to do more, they can.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the spec is trying to ensure servers do the bare minimum level of recursion here: if they want to do more, they can.

Yes, this is the goal. There are warnings not to infinitely recurse, but we wanted implementations to have a minimum required recursion amount.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - in that case I'd say this is fine as-is then. Infinite recursion is warned against elsewhere and we don't need to tell people exactly how to write their servers. That said, I have to admit I'm struggling to understand the meaning of @ShadowJonathan 's original comment so please correct me if I've misunderstood.

>
> One of: `[true false]`.

In order to be backwards compatible the `recurse` parameter must be
optional (defaulting to `false`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if existing client implementations are good enough to face the recurse parameter set to true by default? That would bring in the awesomeness of recursion to the existing fleet automagically (still with a caveat of limiting recursion to a sane value).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but personally I'd rather just flip the flag in the client once I know it worked that go scouting for what clients implement it and try to make sure they all behave sensibly when the server breaks (old) spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern here is that we may extend the API surface for no actual use, with one option (false, that is) being inherently broken in some way. I mean, it's just not right when reactions don't get pulled along with other messages in the same thread, right? I'm totally fine to punt the removal of this into some future MSC, just in case.


In this situation, topological ordering is intended to refer to the same
ordering that would be applied to the events when requested via the `/messages`
api given the same `dir` parameter.

Regardless of the value of the `recurse` parameter, events will always be
returned in topological ordering. Pagination and limiting shall also apply to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem statement at the beginning of this proposal says "chronological order is necessary" to correctly evaluate read receipts. How will this proposal help address this when the API always returns events in topological order?

Copy link
Contributor

@MadLittleMods MadLittleMods Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't feel like "chronological order" is needed at all and I think it just might be a mistake in wording since topological_ordering is mentioned after. Additionally, origin_server_ts is an untrusted field (can be set to whatever integer) and is even designed to be deliberately massaged by application services so it's not useful to rely on for anything.

What read receipts probably care about is topological_ordering in the room DAG (which /messages uses) but the problem is the timeline in clients like Element mix in a bunch of messages from /sync which are in the order they were received on the server (stream_ordering). So the client has no clue what the latest message in the main timeline or thread timeline is from a topological_ordering perspective.

Good reading: https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering

In terms of a solution to always choose the correct event for the read receipt, maybe we only construct timelines from /messages or ask for the PDU version of the events that include the prev_events in order to construct the DAG on the client and mix in messages appropriately. Maybe this is just a behind-the-scenes representation on the client we keep track of. Or some new endpoints which handle the magic for us, etc (many options).

AFAICT, this MSC improves read receipts on threads because it makes it easier to get the list of events for a given thread in topological_order. (although I'm not totally clear on everything here and asked for better clarification -> #3981 (comment))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton Eric! 🙏

There's another recent proposal, MSC4033, that attempts to solve the ordering problem more holistically which might partially obsolete this one (as far as the ordering problematic is concerned).

Copy link
Contributor

@MadLittleMods MadLittleMods Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks for cross-linking! MSC4033 is definitely tackling the right problem and exposes the same confusion I had while trying to draft my last reply. The spec is not clear what order read receipts use, topological_ordering vs stream_ordering.

I assumed it was topological_ordering because you need to specify event ID's in some places and the threaded receipts example shows a DAG but then also the receipts seem to also use timestamps (it's all very ambiguous).

In any case, the problem of /messages vs /sync using different orderings is the main thing to fight against. The good thing about that proposal is you could just take the highest order that you see in a given timeline for the read receipt (I think) (noted this in a comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise I've removed order-related stuff from the equation here, as I think it was a distraction, and in any case this flag should only change whether this API includes related events or not. It shouldn't affect the order.

topological ordering.

Filters specified via `event_type` or `rel_type` will be applied to leaves
and intermediate events on the graph formed by the related events.
Events that would match the filter, but whose only relation to the original
given event is through a non-matching intermediate event, will not be included.

## Potential issues

Naive implementations might be tempted to provide support for this parameter
through a thin shim which is functionally identical to the client doing
separate recursive `/relations` requests itself. This is ill-advised.

Such an implementation would, given a specifically crafted set of events,
allow a client to cause unreasonable load.

## Alternatives
justjanne marked this conversation as resolved.
Show resolved Hide resolved

1. Clients could fetch all relations recursively client-side, which would
increase network traffic and server load significantly.
2. A new, specialised endpoint could be created for threads, specifically
designed to present separate timelines that, in all other ways, would
behave identically to `/messages`.
3. Twitter-style threads (see [MSC2836]).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I proposed aggregations (in the mess that is MSC1849) the intention was basically to define a graph of relations linking events together:

If it helps, you can think of relations as a "subject verb object" triple, where the subject is the relation event itself; the verb is the rel_type field of the m.relationship and the object is the event_id field.

I assumed that when users wanted to do complicated graph queries (e.g. "give me all the events in this thread, including their aggregations") or "give me the siblings of this twitter thread") we'd end up with a generic API for navigating the relations graph - and that's what kegan & I came up with in [MSC2836].

Personally, my hunch is still that a single generic API for navigating the relation graph would have been a better solution than having lots of incremental MSCs which gradually nudge the existing APIs in the right direction. However, we've ended up down the incrementalist path anyway now, and this MSC seems a good enough way to solve the immediate problem of "give me all the events which relate to this thread".

However, I have a horrible feeling that we're promptly going to hit another limitation for another relations use case (e.g. "I want to see the last 20 beacon locations for the 5 most recent location-shares in this thread. Including if the location-share event subsequently got edited" - or a similar one for VoIP signalling using aggregations, or even MSC4016-style file transfer updates via aggregations)... and find ourselves wanting a more generic API, again.

I'm also dubious about "the server may just ignore your recursion request, impose an arbitrary depth, and the client will have to figure it out". I'd have thought it would be better for the client to state its requirements (i.e. "i need depth=3 for my use case") and for the server to fulfil those... but to cap resource usage by the server then batching up the responses in whatever size it's most comfortable with (or by capping the pagination requests that the client makes). So even if the client asks for some horribly inefficient query, the server can decline to calculate more than 10 results at a time or whatever. Otherwise it's passing a lot of complexity onto the client to work around the server's limitations, and the design of Matrix has always been for the server to do the heavy-lifting rather than the client, in order to make clients as easy to write as possible.

So, TL;DR: I don't love this, but it's an acceptable solution for where we're at right now and I'm not going to block it, lest perfect be the enemy of good. In future I really hope we converge on a robust generic way in a new MSC (or MSC2836 gets dusted off) in order to walk the relations DAG rather than sprouting incremental tweaks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel fairly similarly about this really in that much more advanced uses of relations will likely require a more sophisticated API that can properly control recursion depth and what rel_types get recursively returned, etc. I also think the way we've chosen to implement features, ie. with threads as a live feature in Matrix now that, apart from not working very well itself is also causing other parts of the protocol to be flakey: we need to be fairly tactical here. Obviously we can introduce a more full-featured API separately but equally it's a shame that servers will have both APIs to maintain.

4. Alternatively a `depth` parameter could have been specified, as in [MSC2836].
We believe that a customizable depth would add unnecessary constraints to
server implementers, as different server implementations may have different
performance considerations and may choose different limits. Additionally,
the maximum currently achievable depth is still low enough to avoid this
becoming an issue.

## Security considerations

None.
dbkr marked this conversation as resolved.
Show resolved Hide resolved

## Examples

Given the following graph:

```mermaid
flowchart RL
subgraph Thread
G
E-->D
B-->A
end
B-.->|m.thread|A
G-.->|m.thread|A
E-.->|m.annotation|B
D-.->|m.edit|A
G-->F-->E
D-->C-->B
```

`/messages` with `dir=f` would
return `[A, B, C, D, E, F, G]`.

`/relations` on event `A` with `rel_type=m.thread` and `dir=f` would
return `[B, G]`.

`/relations` on event `A` with `recurse=true` and `dir=f` would
return `[B, D, E, G]`.

`/relations` on event `A` with `recurse=true`, `dir=b` and `limit=2` would
return `[G, E]`.

`/relations` on event `A` with `rel_type=m.annotation`,
`event_type=m.reaction` and `recurse=true` would return `[G, E]`.

## Unstable prefix

Before standardization, `org.matrix.msc3981.recurse` should be used in place
of `recurse`.

[MSC2836]: https://github.com/matrix-org/matrix-spec-proposals/pull/2836
[MSC3771]: https://github.com/matrix-org/matrix-spec-proposals/pull/3771
[`/relations`]: https://spec.matrix.org/v1.6/client-server-api/#get_matrixclientv1roomsroomidrelationseventid
[`m.reaction`]: https://github.com/matrix-org/matrix-spec-proposals/pull/2677
[`m.replace`]: https://spec.matrix.org/v1.6/client-server-api/#event-replacements