- author
-
tajpulo
- version
-
1.0.0
We need a specification which assigns categories to tokens in a file.
-
Consider a formal grammar (which involves strings) and given a document, you want to extract all occuring strings.
-
Consider a formal grammar and given a document, you want to syntax highlight occuring tokens.
-
Consider a formal grammar (specifying some program module) and given a document, you want to extract the public API defined within.
In this case, it makes sense to partition the document (which was written in a formal grammar) into a sequence of tokens (identified by byte offsets) and assign categories to them. This process is actually very common in parsers (“tokenization”). syntok
is a specification to serialize this data in a grammar-independent manner. That’s it (and nothing more).
The categorization (choice of categories) is not part of the specification (but the parser writer). Hierarchical structure are not represented in syntok files.
Because we want to interoperability between tools and syntok is a universal serialization format for this job.
- Do you write a parser?
-
Make sure to enable serialization of the parsed document into a
syntok
file. - Do you want to process a source document?
-
Let the grammar-specific tool generate syntok output and then process the syntok file. Be aware that syntok does not make it easy to extract all fancy grammar-specific details like document hierarchy. It targets syntax highlighting and synactically simple usecases.
A syntok file is an XML file described in this document:
… with the following (relevant) tools:
-
An XSD file to verify some properties of a syntok file
-
A python script to verify remaining properties of a syntok file
-
A python script taking a syntok file to generate colorized CLI output (TODO)
-
A JavaScript-powered webpage taking a syntok file to generate colorized output on an HTML page (TODO)
-
A python script to generate syntok template by Unicode categories
-
A python script taking a tree-sitter dump and the original file to generate the syntok file
See the LICENSE file (Hint: MIT license).
Please report any issues on the Github issues page.