You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 18, 2023. It is now read-only.
Recently we've realised that the way we run the HINGE pipeline causes a process in the chain getting killed for large PacBio and ONT datasets. I think this is related to issue 130 where a large, single las file is being read into the memory. The dataset and las file properties as follows:
I'm pasting the log from our wrapper and the error for the PacBio dataset (the error for the ONT dataset was the same). What should I add in the pipeline to prevent this issue?
Running HINGE for sample EQ0170-E01-c05-1
*********Executing the command**************
hinge correct-head EQ0170E01c051_27667_subreads.fasta EQ0170E01c051_27667_subreads_f.fasta fasta_map.txt
------------------------------------------------
*********Executing the command**************
fasta2DB hinge EQ0170E01c051_27667_subreads_f.fasta
------------------------------------------------
*********Executing the command**************
DBsplit -x500 -s100 hinge
------------------------------------------------
*********Executing the command**************
HPC.daligner -t5 -T32 hinge| csh -v > /dev/null 2>&1
------------------------------------------------
*********Executing the command**************
LAmerge hinge.las hinge*.las
------------------------------------------------
*********Executing the command**************
DASqv -c100 hinge hinge.las > /dev/null 2>&1
------------------------------------------------
*********Executing the command**************
hinge filter --db hinge --las hinge.las -x hinge --config /HINGE/utils/nominal.ini
------------------------------------------------
[2017-11-29 03:10:07.279] [log] [info] Reads filtering
[2017-11-29 03:10:07.279] [log] [info] name of db: hinge, name of .las file hinge.las
[2017-11-29 03:10:07.280] [log] [info] name of fasta: , name of .paf file
[2017-11-29 03:10:07.280] [log] [info] Parameters passed in
[filter]
length_threshold = 1000;
quality_threshold = 0.23;
n_iter = 3; // filter iteration
aln_threshold = 1000;
min_cov = 5;
cut_off = 300;
theta = 300;
use_qv = true;
[running]
n_proc = 12;
[draft]
min_cov = 10;
trim = 200;
edge_safe = 100;
tspace = 900;
step = 50;
[consensus]
min_length = 4000;
trim_end = 200;
best_n = 1;
quality_threshold = 0.23;
[layout]
hinge_slack = 1000
min_connected_component_size = 8
[2017-11-29 03:10:07.808] [log] [info] Las files: hinge.las
[2017-11-29 03:10:07.808] [log] [info] # Reads: 1175869
[2017-11-29 03:10:48.786] [log] [info] No debug restrictions.
[2017-11-29 03:10:49.845] [log] [info] use_qv_mask set to true
[2017-11-29 03:10:49.845] [log] [info] use_qv_mask set to true
[2017-11-29 03:10:49.845] [log] [info] number processes set to 12
[2017-11-29 03:10:49.845] [log] [info] LENGTH_THRESHOLD = 1000
[2017-11-29 03:10:49.845] [log] [info] QUALITY_THRESHOLD = 0.23
[2017-11-29 03:10:49.845] [log] [info] N_ITER = 3
[2017-11-29 03:10:49.845] [log] [info] ALN_THRESHOLD = 1000
[2017-11-29 03:10:49.845] [log] [info] MIN_COV = 5
[2017-11-29 03:10:49.845] [log] [info] CUT_OFF = 300
[2017-11-29 03:10:49.845] [log] [info] THETA = 300
[2017-11-29 03:10:49.845] [log] [info] EST_COV = 0
[2017-11-29 03:10:49.845] [log] [info] reso = 40
[2017-11-29 03:10:49.845] [log] [info] use_coverage_mask = true
[2017-11-29 03:10:49.845] [log] [info] COVERAGE_FRACTION = 3
[2017-11-29 03:10:49.845] [log] [info] MIN_REPEAT_ANNOTATION_THRESHOLD = 10
[2017-11-29 03:10:49.845] [log] [info] MAX_REPEAT_ANNOTATION_THRESHOLD = 20
[2017-11-29 03:10:49.845] [log] [info] REPEAT_ANNOTATION_GAP_THRESHOLD = 300
[2017-11-29 03:10:49.845] [log] [info] NO_HINGE_REGION = 500
[2017-11-29 03:10:49.845] [log] [info] HINGE_MIN_SUPPORT = 7
[2017-11-29 03:10:49.845] [log] [info] HINGE_BIN_PILEUP_THRESHOLD = 7
[2017-11-29 03:10:49.845] [log] [info] HINGE_READ_UNBRIDGED_THRESHOLD = 6
[2017-11-29 03:10:49.845] [log] [info] HINGE_BIN_LENGTH = 200
[2017-11-29 03:10:49.845] [log] [info] HINGE_TOLERANCE_LENGTH = 100
[2017-11-29 03:10:50.025] [log] [info] name of las: hinge.las
[2017-11-29 03:10:50.025] [log] [info] Load alignments from hinge.las
[2017-11-29 03:10:50.025] [log] [info] # Alignments: 1527933550
/HINGE/inst/bin/hinge: line 8: 12721 Killed Reads_filter "$@"
hinge filter --db hinge --las hinge.las -x hinge --config /HINGE/utils/nominal.ini did not produce a return code of 0, quiting!
The text was updated successfully, but these errors were encountered:
Hi Govinda, it seems like it worked, I'll let you know after I have a more thorough evaluation. By the way, what I also realised is the huge number of 'las's that are produced during
Hi guys,
Recently we've realised that the way we run the HINGE pipeline causes a process in the chain getting killed for large PacBio and ONT datasets. I think this is related to issue 130 where a large, single las file is being read into the memory. The dataset and las file properties as follows:
PacBio dataset: 1,282,848 reads, 6.4 Gb yield
hinge.las file size: 138Gb
ONT dataset: 756,656 reads, 6.4 Gb yield
hinge.las (gzipped) file size: 104G
I'm pasting the log from our wrapper and the error for the PacBio dataset (the error for the ONT dataset was the same). What should I add in the pipeline to prevent this issue?
The text was updated successfully, but these errors were encountered: