Skip to content

This repository contains the RDF/XML ontologies, scripts, and results for the study on ontology partitioning and SPARQL query optimization. The research focuses on improving the execution time of complex SPARQL queries by splitting large RDF/XML ontologies and leveraging parallel query execution in Apache Jena Semantic Web Framework (Fuseki)

License

Notifications You must be signed in to change notification settings

knowledge-ukraine/OntoSplit

Repository files navigation

OntoSplit

This repository contains the dataset, experimental scripts, and results for the study on ontology partitioning and SPARQL query optimization. The research focuses on improving the execution time of complex SPARQL queries by splitting large RDF/XML ontologies and leveraging parallel query execution in Apache Jena Fuseki.

πŸ“‚ Contents:

  • πŸ“‚ ./data - stores all ontology (or RDF/XML structures) files (original and partitioned), as well as any sample datasets or additional resources needed for experimentation and demonstrations.
  • πŸ“Š ./benchmarking-data – experiments data
  • πŸ“Š ./benchmarking-data/benchmark.xlsx – final tests results data: tables, charts
  • πŸ“œ ./benchmarking-data/sparql-queries – test SPARQL queries categorized by execution time (fast -1, medium -2, slow - 3)
  • πŸ“œ ./benchmarking-data/results-time - contains JSON files capturing the execution time for SPARQL queries of different categories (fast -1, medium -2, slow - 3) across various ontology partition configurations (1–15 parts)
  • πŸ”§ ./benchmarking-data/scripts - Python scripts for benchmarking execution and results calculation
  • πŸ”§ ./scripts/ontology-creation - Python scripts for ontology creation (PDF to JSON; JSON to XML/RDF ontology with different splitting options)
  • πŸ“• ./parsed-pdfs-json - Stores files related to PDFs from the Dataset, including original PDFs (optional) and JSON outputs resulting from parsing scripts
  • πŸ“– ./docs/ – methodology, findings, and implementation details - TODO

πŸš€ Sponsor this project

Please support @malakhovks. Despite the Wartime in Ukraine, R&D in the field of Digital Health and Ontology Engineering are being resumed:

Via credit card: https://send.monobank.ua/jar/5ad56oNAcD

Public Address to Receive USDT (BEP20): 0x1128A7b84728123dd4F55176c378754Dd396A674

Pay me via Trust Wallet: https://link.trustwallet.com/send?asset=c20000714_t0x55d398326f99059fF775485246999027B3197955&address=0x1128A7b84728123dd4F55176c378754Dd396A674

πŸ” Key Topics:

  • SPARQL query optimization
  • Ontology partitioning (sharding)
  • Parallel query execution
  • Apache Jena Fuseki performance benchmarking
  • Semantic Web & RDF processing

πŸš€ Future Work:

The repository will be updated with further optimizations, including machine learning-based query performance prediction and dynamic ontology partitioning.

Contributions and discussions are welcome!

πŸ“– How to Cite

If you use this repository in your research, please cite it as follows:

πŸ”Ή APA citation format for articles:

  • Palagin, O.V., Petrenko, M.G., Kaverinskiy, V.V., & Malakhov, K.S. (2025). Method for Increasing the Efficiency of OWL/RDF-Structures Processing in Apache Jena Semantic Web Framework Environment. Cybernetics and Systems Analysis, __(_), __ - __. https://doi.org/
  • Kaverinskiy, V.V., Petrenko, M.G., & Malakhov, K.S. (2025).

πŸ”Ή BibTeX citation format for repository:

@misc{OntoSplit,
  author = {Kyrylo Malakhov and Vladislav Kaverinskiy},
  title = {OntoSplit: Ontology Partitioning and SPARQL Query Optimization},
  year = {2024},
  howpublished = {GitHub Repository},
  url = {https://github.com/knowledge-ukraine/OntoSplit}
}

πŸ“• Dataset

EBSCO articles dataset (domain knowledge: rehabilitation medicine) + JSON of every article

DOI

wget -O ./ebsco-rehabilitation-dataset.zip https://cdn.e-rehab.pp.ua/u/ebsco-rehabilitation-dataset.zip

πŸ’³ Funding

This study would not have been possible without the financial support of the National Research Foundation of Ukraine (Open Funder Registry: 10.13039/100018227). Our work was funded by Grant contract:

About

This repository contains the RDF/XML ontologies, scripts, and results for the study on ontology partitioning and SPARQL query optimization. The research focuses on improving the execution time of complex SPARQL queries by splitting large RDF/XML ontologies and leveraging parallel query execution in Apache Jena Semantic Web Framework (Fuseki)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages