This app converts various sources into dictionary format suitable for e-readers. Currently only kindle format is supported.
As an input you can specify tab delimited file, tab delimited pronunciation files and wiktionary data. All sources can be specified at the same time and the app will mix them in one dictionary.
The internal structure of the dictionary is "word class" -> "meaning" -> "translation". However, kindle presentation is for clarity and more concise display organised differently.
-
Wiktionary
Download latest English Wiktionary backup file from: Wikimedia Downloads search for
enwiktionary
, click on it and on the page findpages-articles.xml.bz2
file. -
Pronunciation files
There is a project on github.com collecting pronunciation data. Go to ipa-dict
For this you need to have Kindle Previewer installed.
Download Windows version and install it with wine
.
Then you can take an inspiration in [convert-en-cs.sh] file.
Download Mac version and run:
/Applications/Kindle\ Previewer\ 3.app/Contents/lib/fc/bin/kindlegen -c1 -gen_ff_mobi7 -dont_append_source data/kindle-en-cs/content.opf
Download Windows version and run:
"C:\users\pejuko\Local Settings\Application Data\Amazon\Kindle Previewer 3\lib\fc\bin\kindlegen.exe" -c1 -gen_ff_mobi7 -dont_append_source data/kindle-en-cs/content.opf
It is recommended to build the app in release mode. Processing wiktionary data may be very slow. On modern computers it takes around 3 minutes and another 2 minutes for kindlegen.
To generate English-Czech dictionary run:
cargo run --release -- -w data/enwiktionary.xml.bz2 -wp Czech -o data/kindle-en-cs -t "English-Czech dictionary" -a pejuko
Run:
cargo run -- -h
Or try to find inspiration in convert-en-cs.sh file.
Code and documentation: MIT
Dictionaries published here are available under the Creative Commons Attribution-ShareAlike License and are generated solely from wiktionary data.