Skip to content

A new version of phraug, which is a set of simple Python scripts for pre-processing large files

License

Notifications You must be signed in to change notification settings

zygmuntz/phraug2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ef2a6d4 · Sep 7, 2016

History

54 Commits
Aug 22, 2013
Oct 26, 2014
Jun 21, 2014
Nov 15, 2015
Sep 7, 2016
Jan 29, 2015
Jan 29, 2015
Sep 20, 2015
Apr 16, 2016
Aug 28, 2013
Aug 22, 2013
Jul 17, 2014
Jan 29, 2015
Aug 22, 2013
Nov 15, 2015
Jun 5, 2014
Jan 29, 2015
Aug 22, 2013

Repository files navigation

phraug2

A new version of phraug (pron. frog) with improved command line arguments parsing, thanks to jofusa.

This is a set of simple Python scripts for pre-processing large files, things like splitting and format conversion. The names phraug comes from a great book, Made to Stick, by Chip and Dan Heath.

See http://fastml.com/processing-large-files-line-by-line/ for the basic idea.

There's always at least one input file and usually one or more output files. An input file always stays unchanged.

For documentation:

Example:

>python split.py
usage: split.py [-h] [-p PROBABILITY] [-r RANDOM_SEED] [-s] [-c]
				input_file output_file1 output_file2
split.py: error: too few arguments

>python split.py -h
usage: split.py [-h] [-p PROBABILITY] [-r RANDOM_SEED] [-s] [-c]
				input_file output_file1 output_file2

split a file into two randomly, line by line.

positional arguments:
  input_file            path to an input file
  output_file1          path to the first output file
  output_file2          path to the second output file

optional arguments:
  -h, --help            show this help message and exit
  -p PROBABILITY, --probability PROBABILITY
						probability of writing to the first file (default 0.9)
  -r RANDOM_SEED, --random_seed RANDOM_SEED
						random seed
  -s, --skip_headers    skip the header line
  -c, --copy_headers    copy the header line to both output files

About

A new version of phraug, which is a set of simple Python scripts for pre-processing large files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages