-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KGCreator Interface #7
Comments
Possible implementation: class DataFrame2Graph(BaseEstimator, TransformerMixin):
def __init__(self, base_url=u'http://example.com/'):
self.base_url = base_url
def fit(self, x, y=None):
return self
def transform(self, df):
df = df.astype(str)
graph = rdflib.Graph()
for subject, row in df.iterrows():
subject = urllib.parse.quote(subject)
subject = rdflib.term.URIRef(self.base_url + subject)
for predicate, obj in row.iteritems():
predicate = urllib.parse.quote(predicate)
obj = urllib.parse.quote(obj)
predicate = rdflib.term.URIRef(self.base_url + predicate)
obj = rdflib.term.URIRef(self.base_url + obj)
graph.add((subject, predicate, obj))
return graph |
We omit rdflib due to scalability reason. Please try your suggested implementation on the provided datasets. You will see that it would take ages :) |
A similar class as suggested above implemented in here, although creating KG using rdflib appears to require more than creating rdf ntriples from scratch, i.e. avoiding FYI:
Please @heindorf close this issue if answers are satisfying. |
Currently,
KGCreator
is initialized with__init__(self, path, logger=None)
and thetransform
method returns a file path.I would suggest to omit both parameters:
transform
method should not return a path, but the actual data (to make the classKGCreator
more independent of the file system and becausesklearn
's method returns the actual data instead of a file path.transform
method to be an rdflibGraph
.logging
module, e.g., via logging.get_logger(...)The text was updated successfully, but these errors were encountered: