Skip to content
Tim L edited this page Jun 12, 2013 · 59 revisions

What is first

  • Centrifuge is implemented using Tinkerpop's Blueprints Graph API.
  • Centrifuge was conceived to provide a more informative layout for the LOD Cloud Diagram, but applies to any graph structure.

What we will cover

The page describes a graph clustering and layout algorithm.

Let's get to it

Centrifuge concept

The Centrifuge algorithm applies to arbitrary graph structures and its output reflects the connectivity of the nodes within the graph. The algorithm determines a set of spanning trees rooted at a delegate for each connected component.

G a graph
define centrifuge(G):
   for each connected component C of G:
      N  := delegate(C) # determine C's delegate node N
      C' := reduce(C,N) # reduce C by removing N and its neighborhood
      centrifuge(C')
  • Choosing a delegate can be a user defined function. The naive default is to select the most connected node.
  • Removing N's neighborhood can be a user-defined function. The naive default is to remove all of N's edges.

The name of this algorithm is inspired by the physical properties of the contents of a physical centrifuge, where the larger or smaller constituents of the subject material are dispelled incrementally as the centrifuge increases angular velocity.

Centrifuge implementation

Centrifuge is implemented in the Java package edu.rpi.tw.visualization.graph.layout.centrifuge (in src/java and vsr.jar)

  • edu.rpi.tw.visualization.graph.layout.centrifuge.Centrifuge contains the main method which takes as a single argument the file path to the RDF file containing the void:Linkset assertions.
Applied to void:Linksets

The following SPARQL query is used to create the in-memory Blueprints graph, which is illustrated below in pink.

select distinct ?linkset ?dataset ?triples ?overlap ?target
where {
    ?dataset
       a datafaqs:CKANDataset;
       void:subset ?linkset;
    .
    optional{ ?dataset tag:taggedWithTag ?tag     }
    optional{ ?dataset void:triples      ?triples }
    
    ?linkset 
       void:target  ?target;
       void:triples ?overlap;
    .
    ?target a datafaqs:CKANDataset .
    filter(?dataset != ?target)
}
Implementation call stack

Modeling the Clusters

How to model the output of the centrifuge clustering algorithm:

@base <http://datafaqs.tw.rpi.edu/source/datafaqs/dataset/how-o-is-lod/version/2013-06-04/>

<component/1>
   a vsr:Root, prov:Collection;
   centrifuge:depth 0;
   dcterms:hasPart <http://datahub.io/dataset/dbpedia>,
                   <component/1/1>,
                   <component/1/2>,
                   <component/1/3>,
                   <component/1/4>,
                   <component/1/5>,
                   <http://thedatahub.org/dataset/fao-geopolitical-ontology>,
                   <http://thedatahub.org/dataset/sears>;
.
<component/1/1>
   a centrifuge:Primary, prov:Collection;
   centrifuge:depth 1;
   dcterms:hasPart <http://thedatahub.org/dataset/rkb-explorer-dblp>;
.
<component/1/2>
   a centrifuge:Secondary, prov:Collection;
   centrifuge:depth 1;
   dcterms:hasPart <http://thedatahub.org/dataset/eu-agencies-bodies>
.
<component/1/3>
   a centrifuge:Secondary, prov:Collection;
   centrifuge:depth 1;
   dcterms:hasPart <http://thedatahub.org/dataset/normesh>
.
<component/1/4>
   a centrifuge:Secondary, prov:Collection;
   centrifuge:depth 1;
   dcterms:hasPart <http://thedatahub.org/dataset/productontology>
.
<component/1/5>
   a centrifuge:Secondary, prov:Collection;
   centrifuge:depth 1;
   dcterms:hasPart <http://thedatahub.org/dataset/doapspace>
.
<http://datahub.io/dataset/dbpedia>
   a centrifuge:Primary;
.
<http://thedatahub.org/dataset/fao-geopolitical-ontology>
   a centrifuge:Secondary;
.
<http://thedatahub.org/dataset/sears>;
   a centrifuge:Secondary;
.




<http://datahub.io/dataset/linked-crunchbase>
   dcterms:isPartOf <component/1/global-leafs>;
   # The removal of component 1's delegate would always isolate this dataset.
.
<http://datahub.io/dataset/ecco-tcp-linked-data>
   dcterms:isPartOf <component/1/global-leafs>;
.
<http://datahub.io/dataset/normesh>
   dcterms:isPartOf <component/1/6>;
   # A component created by the removal of <component/1>'s delegate.
   # Will connect to two other datasets.
.

What is next

  • Apply to Reactome; it's part of bio2rdf