This small tool has been created as a part of NGS data analysis homework in Bioinformatics Institute :)
BODBuilder provides pipeline for constructing De-Bruijn according to its mathematical definition and also utilizes heuristics to simplify it and prune from presumably wrong connections.
- Takes a single or multiple files with supported data formats (no need to explicitly specify format).
- Builds primary oriented multigraph G - creates connections in a graph using nodes with size
k
and primary edges with sizesk+1
between them. Each kmer is processed simultaneously with own reverse-complement one. - Simplifies graph until convergence - contracts edges and creates minor graph G` of a graph G. Its property - no passing vertices
.
- Removes tips from graph (low-covered edges - user-specified threshold).
- Tries to remove bulges from graph by removing every badly covered edge.
- Stores graph in
.dot
format. - If option
--draw
is provided, draws graph in.png
format.
Final graph is guaranteed to have only high-covered edges related to mean edge coverage investigated after condensing and removing tips and low-covered edges. Moreover, final graph doesn't have passing vertices.
- Clone repository:
git clone https://github.com/Abusagit/BODBuilder.git
- Install requirements (if needed):
pip install numpy tqdm networkx
- Typical command:
Keep --draw
to obtain picture in .png
format (I assume you added tool directory to PATH or provided absolute path):
py <path_to_repository>/build_graph.py -i <input_sequence_file> [<another_sequence_file>] \
-o <output_directory> \
-k <kmer sise - odd> \
-b <lower bound of coverage for edges removing> \
--draw
Argument | Description |
---|---|
-h, --help |
Show this help message and exit |
-i, --input [fasta|fastq|fna|fa] |
Path to your file(s) with data for graph |
-o, --outdir [./] |
Output directory |
-k , --kmer-size |
Size of a kmer to be used for graph building |
-b , --bad_cov [100] |
Threshold coverage for clipping edge |
--draw |
Stay with this option to get picture of a final graph |
--force |
Force override dir if it is`nt empty |