Skip to content

Analyze key metrics like number of words, readability, complexity, code duplication, … of any kind of text

Notifications You must be signed in to change notification settings

ad-si/Textalyzer

Repository files navigation

Textalyzer

Analyze key metrics like number of words, readability, complexity, etc. of any kind of text.

CLI Web
CLI Screenshot Web Screenshot

Usage

# Word frequency histogram
textalyzer histogram <filepath>

# Find duplicated code blocks (default: minimum 3 non-empty lines)
textalyzer duplication <path> [<additional paths...>]

# Find duplications with at least 5 non-empty lines
textalyzer duplication --min-lines=5 <path> [<additional paths...>]

# Include single-line duplications
textalyzer duplication --min-lines=1 <path> [<additional paths...>]

The duplication command analyzes files for duplicated text blocks. It can:

  • Analyze multiple files or recursively scan directories
  • Filter duplications based on minimum number of non-empty lines with --min-lines=N (default: 2)
  • Detect single-line duplications when using --min-lines=1
  • Rank duplications by number of consecutive lines
  • Show all occurrences with file and line references
  • Utilize multithreaded processing for optimal performance on all available CPU cores
  • Use memory mapping for efficient processing of large files with minimal memory overhead

Related

  • jscpd - Copy/paste detector for programming source code.
  • megalinter - Code quality and linter tool.
  • pmd - Source code analysis tool.
  • qlty - Code quality and security analysis tool.
  • superdiff - Find duplicate code blocks in files.
  • wf - Command line utility for counting word frequency.

Rewrite in Rust

This CLI tool was originally written in JavaScript and was later rewritten in Rust to improve the performance.

Before:

hyperfine --warmup 3 'time ./cli/index.js examples/1984.txt'
Benchmark #1: time ./cli/index.js examples/1984.txt
  Time (mean ± σ):     390.3 ms ±  15.6 ms    [User: 402.6 ms, System: 63.5 ms]
  Range (min … max):   366.7 ms … 425.7 ms

After:

hyperfine --warmup 3 'textalyzer histogram examples/1984.txt'
Benchmark #1: textalyzer histogram examples/1984.txt
  Time (mean ± σ):      40.4 ms ±   2.5 ms    [User: 36.0 ms, System: 2.7 ms]
  Range (min … max):    36.9 ms …  48.7 ms

Pretty impressive 10x performance improvement! 😁

About

Analyze key metrics like number of words, readability, complexity, code duplication, … of any kind of text

Topics

Resources

Stars

Watchers

Forks