We performed an extensive experiments to show the query processing capabilities of well-known triple stores by using their SPARQL endpoints. In particular, we stress these triple stores with multiple parallel requests from different querying agents. Our experiments revealed the maximum query processing capabilities of these triple stores after that it may leads to denial of service (DoS) attacks. We hope this analysis will help triple stores developers to design workload-aware RDF engines to improve the availability of their public SPARQL endpoints, by avoiding the DOS attacks.
All of the data and results presented in our evaluation are available online from https://github.com/dice-group/RDF-Triplestores-Evaluation under Apache License 2.0 .
Dataset | RDF Dump | Queries |
---|---|---|
DBpedia-3.5.1 | Download | Download queries generated by FEASIBLE |
WatDiv-10M | Download | Download |
WatDiv-100M | Download | Download |
WatDiv-1Billion | Download | Download |
Triplestore | Download | Related info |
---|---|---|
Virtuoso | here | Set the virtuoso.ini file accrodingly. However, the file we used in our experiments is given here. |
Fuseki-TDB | here | Download and unzipp both apache-jena-fuseki-3.13.1 and apache-jena-3.13.1. Follow this tutorial for further guidence. |
GraphDB | Docker Hub | sudo docker run -p 127.0.0.1:7200:7200 -v /path/to/dataset/files:/path/to/dataset/files --name <contianer_name> -e "GDB_JAVA_OPTS= -Dgraphdb.workbench.importDirectory=/path/to/dataset/files" ontotext/graphdb:9.0.0-free |
Blazegraph | here | After unzip, run the BlazegraphStah.sh script, as given here. |
Parliament | here | After unzip run ./StartParliament.sh , given here to start the server and then in new terminal run java -cp "clientJars/*" com.bbn.parliament.jena.joseki.client.RemoteInserter <hostname> <port> <inputfile> [<graph-name>] to upload dataset. |
We used IGUANA, a benchmark execution framework, which can be downloaded from here. We set the iguana.config
file (as given here), for all the individual experiments according to below given guidlines:
connection1.service=http://localhost:8895/sparql
set the SPARQL endpoint address.connection1.update.service=http://localhost:8895/sparql
(optional) used for update/write operations. In our case only read operations (queries) are used, but not write. Therefore it is disabled (commented).sparqlConfig1=<variable>, org.aksw.iguana.tp.tasks.impl.stresstest.worker.impl.SPARQLWorker, 600000, <path/to/queries.txt file>, 0, 0
- Here
<variable>
shows the No. of workers, i.e., 1, 2, 4, 8, 16, 32, 64, 128 and 600000 milli second = 10 minutes is query timeout, - The stresstestArg.timeLimit=3600000 (in milli seconds, 1 hour, time to complete one experiment).
- All the experiments are read based, therefore the update component of IGUANA is disabled (commented).
Once the config file is ready, then start experiment by following below steps:
- Inside the parent IGUANA folder, start IGUANA by
./start-iguana.sh
. Just after it has started successfully, open a new terminal and start the experiment by sending theiguana.config
file to the iguana's core processor. This can be done by running the command./send-config.sh iguana.config
. - Expect the results in the same folder as
results_*.nt
file. Extract Queries-per-Second (QpS) out of the file through a commandgrep queriesPerSecond <result_file> > new_file
. Thisnew_file
contains only QpS values in RDF format (object of the triple shows QpS value). - Upload this
new_file
to Virtuoso (in our case), run SELECT query to get all the QpS values and save it asCSV
file. Add these values and take the average. This is the QpS value (throughput) of the triplestore for given number of parallel users. - Repeat the experimets for different number of users, and then for other triplestores in the same manner. NOTE: Ensure the availability of endpint before starting IGUANA - Also, a bash file
all_exp_script
shown here can be edited with little effort to perform all the experiments in one go.
Diversity scores across different SPARQL query features of the selected benchmarks and percentage of SPARQL clauses and join vertex types of the queries we used, can be downloaded from here, as explained here
All the results obtained were in .nt
format, but we converted them to CSV
, as discussed above. The overall results along with plots can be downloaded from here while individual benchmark results can be seen from below links:
WatDiv-100-Million triples results
WatDiv-1-Billion triples results
WatDiv-10-Million triples results
Our findings and methodologies are detailed in the following publication:
- Khan, H., Ali, M., Ngonga Ngomo, A.-C., & Saleem, M. (2021). When is the Peak Performance Reached? An Analysis of RDF Triple Stores. Semantics-2021, Link to the paper
@inproceedings{hashim2021peak,
title={When is the Peak Performance Reached? An Analysis of RDF Triple Stores},
author={Hashim Khan, Manzoor Ali and Ngomo, Axel-Cyrille Ngonga and Saleem, Muhammad},
booktitle={Further with Knowledge Graphs: Proceedings of the 17th International Conference on Semantic Systems, 6-9 September 2021, Amsterdam, The Netherlands},
volume={53},
pages={154},
year={2021},
organization={IOS Press}
}
- Hashim Khan (DICE, Paderborn University)
- Manzoor Ali (DICE, Paderborn University)
- Axel-Cyrille Ngonga Ngomo (DICE, Paderborn University)
- Muhammad Saleem (AKSW, University of Leipzig)