Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign lifting maps #116

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

luisfpereira
Copy link
Collaborator

@luisfpereira luisfpereira commented Nov 25, 2024

This PR attempts to propose a redesign for the lifting/feature lifting maps.

This suggestion comes from the realization that the hierarchical structure of the current implementation may be too deep (see diagram below; empty arrows represent inheritance; black diamonds represent composition; blue represents abstract).

liftings

Moreover, the implementation of AbstractLifting.forward seems to suggest it implements a pipeline of the form:

data2domain conversion -> lifting -> domain2domain conversion -> feature lifting -> domain2dict

More clarification on the above:

  • data2domain/domain2domain/domain2dict conversion: a conversion between data structures for the same domain

  • lifting: a domain gets transformed in another

  • feature lifting: the features of a domain are computed/updated

This redesign just makes this pipeline more explicit.

To achieve this goal, three abstract structures are added (see diagram below): Converter, LiftingMap, FeatureLiftingMap.
AbstractLifting is refactored into a instantiable class named LiftingTransform.

An additional "data structure" for a Complex is added: PlainComplex. It simply refactors a dict that is used in the library in something we can control in an easier manner, i.e. operations performed on dict are now encapsulated with PlainComplex.
I'm calling it PlainComplex (for the lack of a better name) because it does not contain all the information stored in e.g. toponex.SimplicialComplex. For example, features are stored as a list of tensors, which means we lose (explicit) track of the original simplex indices (something I guess can be recovered using the stored matrices).

liftings_new

Advantages of the redesign:

  • easier to test (see e.g. test file changed in the first commit of this PR: it will suffice to test the lifting map and not the full pipeline. for example, sign does not need to be tested for SimplicialCliqueLifting as it is not a property of the lifting)

  • lifting maps can be composed in a more natural way avoiding having to always convert between data and the data structure of interest for the computation (will e.g. simplify A Random Latent Clique Lifting from Graphs to Simplicial Complexes pyt-team/challenge-icml-2024#63)

  • steps of the pipeline are independent (e.g. GraphLifting.contains_edge_attr is currently used in Graph2SimplicialLifting._get_lifted_topology, making a supposedly independent pipeline dependent; this is very prone to bugs and may confuse people, see e.g. Neighbourhood Complex Lifting (Graph to Simplicial) pyt-team/challenge-icml-2024#41)

  • it will likely be easier to use liftings from other libraries

Disadvantages of the redesign (there's probably more!):

  • steps of the pipeline need to be compatible

For now, I've applied the changes to SimplicialCliqueLifting and minimally changed the tests to show the code works (some imports need to be commented out for this to work, but I've locally check it runs), so we can discuss the changes.
If they're approved, I'll extend this to the other liftings and start implementing the liftings from the challenge.

Notice that configuration and the discovery of liftings will also need to be greatly modified.

@luisfpereira luisfpereira marked this pull request as draft November 25, 2024 22:17
@luisfpereira
Copy link
Collaborator Author

For completion, let me add more examples of what I mean by composing objects (this applies for all the three main objects of the pipeline - Converter, LiftingMap, FeatureLiftingMap) and expand on the notion of Converter.

NB: by pipeline in this PR I mean the current implementation of AbstractLifting.forward (or the new LiftingTransform.forward).

For Converter, I've created more objects than the ones represented in the diagram (check implementation).
In particular, there's ConverterComposition. This object allows to implement e.g. Complex2Dict by composing Complex2PlainComplex and PlainComplex2Dict .
Notice that above all, this gives us flexibility: I could have implemented Complex2Dict directly, but this way we can reuse code without trading-off performance (the steps to implement directly Complex2Dict would involve all the steps the composition does anyway, so direct implementation can't be much faster).

Notice this kind of composition is only possible with the disentanglement of the "pipeline", i.e. by making the pipeline steps explicit and implemented by external objects. If we decide to have methods like _generate_graph_from_data or _get_lifted_topology, this kind of compositions can't be done without creating a big inheritance mess.

For new contributions, I expect that most contributors will not to have to implement any Converter (especially after we implement all the challenge PRs) and only need to care about LiftingMap.
This is because there's not that many data structures we want to use and with composition, there's no combinatorial explosion happening (e.g. Complex2Dict is just syntax sugar, i.e. I didn't really need to create that object).

Expanding on the composition of LiftingMap, I think the main gain is going from a pipeline like:

data -> domain -> lifting_1 -> domain -> data -> domain -> lifting_2 -> domain -> data

to something like:

data -> domain -> lifting_1 -> domain -> lifting_2 -> domain -> data

i.e. domain2data/data2domain conversions become optional, which will probably bring in some performance gains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant