This repository contains the dataset, experimental scripts, and results for the study on ontology partitioning and SPARQL query optimization. The research focuses on improving the execution time of complex SPARQL queries by splitting large RDF/XML ontologies and leveraging parallel query execution in Apache Jena Fuseki.
- π ./data - stores all ontology (or RDF/XML structures) files (original and partitioned), as well as any sample datasets or additional resources needed for experimentation and demonstrations.
- π ./benchmarking-data β experiments data
- π ./benchmarking-data/benchmark.xlsx β final tests results data: tables, charts
- π ./benchmarking-data/sparql-queries β test SPARQL queries categorized by execution time (fast -1, medium -2, slow - 3)
- π ./benchmarking-data/results-time - contains JSON files capturing the execution time for SPARQL queries of different categories (fast -1, medium -2, slow - 3) across various ontology partition configurations (1β15 parts)
- π§ ./benchmarking-data/scripts - Python scripts for benchmarking execution and results calculation
- π§ ./scripts/ontology-creation - Python scripts for ontology creation (PDF to JSON; JSON to XML/RDF ontology with different splitting options)
- π ./parsed-pdfs-json - Stores files related to PDFs from the Dataset, including original PDFs (optional) and JSON outputs resulting from parsing scripts
- π ./docs/ β methodology, findings, and implementation details - TODO
Please support @malakhovks. Despite the Wartime in Ukraine, R&D in the field of Digital Health and Ontology Engineering are being resumed:
Via credit card: https://send.monobank.ua/jar/5ad56oNAcD
Public Address to Receive USDT (BEP20): 0x1128A7b84728123dd4F55176c378754Dd396A674
Pay me via Trust Wallet: https://link.trustwallet.com/send?asset=c20000714_t0x55d398326f99059fF775485246999027B3197955&address=0x1128A7b84728123dd4F55176c378754Dd396A674
- SPARQL query optimization
- Ontology partitioning (sharding)
- Parallel query execution
- Apache Jena Fuseki performance benchmarking
- Semantic Web & RDF processing
The repository will be updated with further optimizations, including machine learning-based query performance prediction and dynamic ontology partitioning.
Contributions and discussions are welcome!
If you use this repository in your research, please cite it as follows:
πΉ APA citation format for articles:
- Palagin, O.V., Petrenko, M.G., Kaverinskiy, V.V., & Malakhov, K.S. (2025). Method for Increasing the Efficiency of OWL/RDF-Structures Processing in Apache Jena Semantic Web Framework Environment. Cybernetics and Systems Analysis, __(_), __ - __. https://doi.org/
- Kaverinskiy, V.V., Petrenko, M.G., & Malakhov, K.S. (2025).
πΉ BibTeX citation format for repository:
@misc{OntoSplit,
author = {Kyrylo Malakhov and Vladislav Kaverinskiy},
title = {OntoSplit: Ontology Partitioning and SPARQL Query Optimization},
year = {2024},
howpublished = {GitHub Repository},
url = {https://github.com/knowledge-ukraine/OntoSplit}
}
EBSCO articles dataset (domain knowledge: rehabilitation medicine) + JSON of every article
wget -O ./ebsco-rehabilitation-dataset.zip https://cdn.e-rehab.pp.ua/u/ebsco-rehabilitation-dataset.zip
This study would not have been possible without the financial support of the National Research Foundation of Ukraine (Open Funder Registry: 10.13039/100018227). Our work was funded by Grant contract: