Skip to content

mit-nlp/Text.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TEXT: Numerous tools for text processing

Build Status

This package is a julia implementation of:

  1. Text classification based on BoW models (e.g. topic/langauge id)
  2. Language ID (training and processing) based on word and character n-grams
  3. Lewis's SMART stop list for English
  4. tfidf/tfllr text feature normalization
  5. ngram feature extractors

Prerequistes

  • Stage - Needed for logging and memoization (Note: requires manual install)
  • Ollam - online learning modules (Note: requires manual install)
  • Devectorize - macro-based devectorization
  • DataStructures - for DefaultDict
  • Devectorize
  • GZip
  • Iterators - for iterator helper functions

Install

This is an experimental package which is not currently registered in the julia central repository. You can install via:

Pkg.clone("https://github.com/saltpork/Stage.jl")
Pkg.clone("https://github.com/mit-nlp/Ollam.jl")
Pkg.clone("https://github.com/mit-nlp/Text.jl")

Usage

See test/runtests.jl for detailed usage.

License

This package was created for the DARPA XDATA and Memex program under an Apache v2 License.

About

Numerous tools for text processing

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages