-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Function to convert Program Graph to PyTorch Geometric Graph #174
Comments
Hi @zehanort, a PyTorch Geometric converter would be great! I would very happily review a patch for that, thanks a lot :) Just thinking ahead - I'm a little wary of adding large dependencies like pytorch-geometric. Perhaps stick the converter in its own module like CC'ing @Zacharias030 as I believe he has some experience working with ProGraML using pytorch geometric Cheers, |
Hi @zehanort, I think this is a great pitch and I would welcome such an addition to the codebase very much! Note that in #107 we were also willing to introduce a dependency on pytorch geometric‘s Data and Batch classes. |
Hi @ChrisCummins, @Zacharias030 and @zehanort , For my Master's thesis I have been working with ProGraML and Pytorch-Geometric, and I have an implementation of the My approach has been to use the The method I have been using takes as input a def to_pyg(graph: ProgramGraph, vocabulary: Optional[Dict[str, int]] = None) -> HeteroData:
# 4 lists, one per edge type
# (control, data, call and type edges)
adjacencies = [[], [], [], []]
edge_positions = [[], [], [], []]
# Create the adjacency lists
for edge in graph.edge:
adjacencies[edge.flow].append([edge.source, edge.target])
edge_positions[edge.flow].append(edge.position)
node_text = [node.text for node in graph.node]
vocab_ids = None
if vocabulary is not None:
vocab_ids = [
vocabulary.get(node.text, len(vocabulary.keys()))
for node in graph.node
]
# Pass from list to tensor
adjacencies = [torch.tensor(adj_flow_type) for adj_flow_type in adjacencies]
edge_positions = [torch.tensor(edge_pos_flow_type) for edge_pos_flow_type in edge_positions]
if vocabulary is not None:
vocab_ids = torch.tensor(vocab_ids)
# Create the graph structure
hetero_graph = HeteroData()
# Vocabulary index of each node
hetero_graph['nodes']['text'] = node_text
hetero_graph['nodes'].x = vocab_ids
# Add the adjacency lists
hetero_graph['nodes', 'control', 'nodes'].edge_index = (
adjacencies[0].t().contiguous() if adjacencies[0].nelement() > 0 else torch.tensor([[], []])
)
hetero_graph['nodes', 'data', 'nodes'].edge_index = (
adjacencies[1].t().contiguous() if adjacencies[1].nelement() > 0 else torch.tensor([[], []])
)
hetero_graph['nodes', 'call', 'nodes'].edge_index = (
adjacencies[2].t().contiguous() if adjacencies[2].nelement() > 0 else torch.tensor([[], []])
)
hetero_graph['nodes', 'type', 'nodes'].edge_index = (
adjacencies[3].t().contiguous() if adjacencies[3].nelement() > 0 else torch.tensor([[], []])
)
# Add the edge positions
hetero_graph['nodes', 'control', 'nodes'].edge_attr = edge_positions[0]
hetero_graph['nodes', 'data', 'nodes'].edge_attr = edge_positions[1]
hetero_graph['nodes', 'call', 'nodes'].edge_attr = edge_positions[2]
hetero_graph['nodes', 'type', 'nodes'].edge_attr = edge_positions[3]
return hetero_graph It first gathers the adjacency list of the graphs, the position attribute of the edges and the text of the nodes. If the vocabulary is given, it converts the text tokens to their respective vocabulary index. After that, the lists are transformed into tensors and stored in their respectives attributes. As you can see, using the I will create a pull request in the following days so that you can do further testing. |
That's great thank you @igabirondo16! Look forward to your PR |
🚀 Feature
It would be nice to have a
programl.to_pyg
function to convert one or more Program Graphs totorch_geometric.data.Data
, i.e. to PyTorch Geometric graphs.Motivation
This would be extremely helpful in order to set up ML/DL pipelines with custom GNNs using the PyTorch Geometric library, which offers a lot of utilities regarding machine/deep learning tasks on graphs and it is a library that seems to gain a lot of popularity lately, especially in research.
Pitch
My idea is a 1-1 map between the nodes, edges and node features of the Program Graph to the PyG Graph, as well as turning the edge type of Program Graph (i.e., the CONTROL / DATA / CALL enum values) into a single edge feature of PyG Graph. Unfortunately, PyTorch Geometric does not (yet) explicitly support graph-level features. They seem to support only node-level features, node-level targets and graph-level targets for the time being. Therefore, a reasonable thing to do is to extend the
torch_geometric.data.Data
object with an additional attribute, as proposed in the documentation. Extending the first introductory example from the docs:I believe I am not forgetting anything (feel free to remind me if I do!).
If you don't have something like that in the works and you are interested, I would love to work on it and send a PR eventually. I intend to write such a tool anyway (i.e. Program Graph -> PyG Graph), so I would love to contribute it to the project as well.
The text was updated successfully, but these errors were encountered: