Skip to content

julius383/kamusi

Repository files navigation

Swahili-English Dictionary Data and Interface

This repository contains code that extracts data from a Swahili-English dictionary in PDF form into a JSON dataset. Additionally it contains a simple CLI for searching through the data as well as a PWA serving the same purpose.

An article explaining the code in this repo is can be read online here or from docs/kamusi.qmd

Usage

The finished results are already part of the repo and can be used in two ways. The CLI interface requires the availability of duckdb and a Python environment with rich installed. First load the database into DuckDB and create a FTS index with:

duckdb data/kamusi.db < create_kamusi.sql

Then query for a term by passing it as an argument to query_kamusi.py

Example Usage of CLI

Using the PWA can be done by visiting the online version here or alternatively starting a server in the website directory of this repo:

cd website && python -m http.server

Alternatively if you want to run the extraction code then install the project's Python requirements:

pip install --requirement requirements.txt

Then run python kamusi.py

  • TODO: add link to GitHub repo to PWA