Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add support for relationship schema to be passed in at runtime #2

Open
nickchomey opened this issue Oct 2, 2024 · 2 comments

Comments

@nickchomey
Copy link
Owner

nickchomey commented Oct 2, 2024

Feature description

Right now this connector has no knowledge of relationships - it simply receives openCDC records and does CRUD on SurrealDB.

It would be good to be able to automatically

  1. Embed records from one table in another
  • e.g. if a record comes in from the post table and has a field post.user_id = 123, it should automatically change 123 to user:123 so that the user record gets embedded in the post record.
  • Likewise when the table and fields are created, if a relationship schema is available then it should define the TYPE for the field DEFINE FIELD user ON TABLE post TYPE record<user>;
  1. Create bidirectional relationships between records
  • Eg run RELATE user:123 -> wrote -> post:5678

Update from later

Should almost never use embedded records

  1. not bidirectional
  2. dont clean up after themselves - if you embed a record then delete it, the record you embedded it into still thinks its there. Relate table will remove the relation when one side is deleted.

I wrote more here: surrealdb/docs.surrealdb.com#920


To do this, we will need to be able to define and import relationship schemas at startup, and ideally be able to update them while the server is running.

It would surely be best to do this outside of the Conduit configuration files and/or HTTP API.

Ideas:

  • JSON or YAML file that define all relationships
  • NATS client that can do CRUD on the relationship schemas
  • The schema should also support defining RELATE metadata fields

Schema Settings:

  • which relationships/"foreign keys" shoudl be created
    • should they use RELATE or just embed a record from other table?
    • should we perhaps instead just use DEFINE EVENT for all relationships, such that surreal automatically handles their creation each time a record comes in?
  • DEFINE INDEX - there's various types - unique, composite, fulltext, vector etc...
@nickchomey
Copy link
Owner Author

nickchomey commented Oct 15, 2024

There's a variety of ways to create the relations upon snapshot

  • Pre-define events that will create relationships each time gets triggered by a record create/insert (doc here)
  • Bulk insert relations afterward
    • Either use the records that we already have and run INSERT RELATION like we did for the initial bulk insert (doc here), and this doc) - this cant work via RPC.
    • or do a select & relate like this - also doesn't work. It does like a cartesian product/matrix multiplication between the two arrays - connecting all of them to each other.
LET $person = (SELECT VALUE id FROM person);
LET $product = (SELECT VALUE id FROM product);

RELATE $person->order->$product TIMEOUT 3s;

Seems like the 2nd bulk insert option might work for the snapshot- especially since there's no RPC method for bulk insert_relation. Defining events before would surely cause a huge drag by creating relations one-by-one while bulk inserting. Doing the select then relate would only really work if was done at the very end of the snapshot, but then has a huge amount of rows to deal with.

But can then define the events after the initial bulk snapshot, to serve essentially as automatic Foreign Keys in an sql table during CDC. When doing realtime sync, relationships will get created automatically - its surely the only reasonable way to do it.

Yet, the question remains: how do you specify those relations?

The conduit folks suggested that the relationship schema be defined/embedded in each opencdc record by a pipeline processor, and this connector simply have a handler to insert relations if the record has particular relationship fields. (The Neo4j connector simply requires records to have some particular fields so that they can create relations)

I suppose it could work for the post-snapshot batch relation thing...

Or we could have a JSON/YAML file read at runtime that defines the relations. And would avoid the need to do anything with modifying records with a pipeline processor. But it wouldnt be useful if there's multiple destination connectors in the pipeline - better to have a single processor do it all?

Could define it all via an embedded NATS client, allowing updates etc. But that's not something that other users could take advantage of.

@nickchomey
Copy link
Owner Author

nickchomey commented Oct 29, 2024

In lieu of a batch mechanism in the RPC insert_relation command command, surely I can just unmarshal the batch relations struct/map into a string and submit it with the RPC query command and pass a query, vars pair that contains the actual SurrealQL INSERT RELATION INTO batch relation query.

Though, will need to be careful with Sql injection. The vars parameter is apparently sanitized automatically. But not sure how easy/possible it would be to use this with a large batch query.

https://surrealdb.com/docs/surrealdb/reference-guide/security-best-practices#query-safety

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant