Software program for checking sample matching for NGS data
-
VCF files in
vcf_dir
:python ncm.py -V -d vcf_dir -O output_Dir -N outputfileName -bed bed_file
-
VCF lists in
bam_list_file
:python ncm.py -V -l vcf_list_file -O output_Dir -N outputfileName -bed bed_file
Change your configruration.
Please open ncm.py
, find run_mpileup()
function, modifying below lists as your configurations:
SAMTOOLS="/NAS/nas100-5/tools/samtools-0.1.19/samtools"
BCFTOOLS="/NAS/nas100-5/tools/samtools-0.1.19/bcftools/bcftools"
REF="/NAS/nas33-2/mpileup/hg19.fasta" or REF="/NAS/nas33-2/mpileup/GRCh37-lite.fa"
-
BAM files in
bam_dir
:python ncm.py -B -d bam_dir -O output_Dir -N outputfileName -bed bed_file
-
Bam files in
bam_list_file
:python ncm.py -B -d bam_list_file -O output_Dir -N outputfileName -bed bed_file
Usage : ./ngscheckmate_fastq <options> -1 fastqfile1 [-2 fastqfile2] patternfile(.pt) > output.vaf
Input arguments (required)
patternfile : a binary file containing sequences flanking representative snv sites, along with markers indicating the snv index and whether the sequence represents reference or alternative allele.
fastqfile1 : see below 'Options'.
Options
-1, --fastq1 <fastq_file_1> : fastq file for SE data or first fastq file for a PE data. (required)
-2, --fastq2 <fastq_file_2> : second fastq file for a PE data.
-s, --ss <subsampling_rate> : subsampling rate (default 1.0)
-d, --depth <desired_depth> : as an alternative to a user-defined subsampling rate, let the program compute the subsampling rate given a user-defined desired_depth and the data.
-R, --reference_length <reference_length> : The reference length (default : 3E9) to be used for computing subsampling rate. If the data is NOT WGS from human, and if you're using the -d option, it is highly recommended to specify the reference length. For instance, if your data is human RNA-seq, the total reference length could be about 3% of the human genome, which can be set as 1E8.
-L, --pattern_length <pattern_length> : The length of the flanking sequences being used to identify SNV sites. Default is 21bp. It is recommended not to change this value, unless you have created your own pattern file with a different pattern length.
-p, --maxthread <number_of_threads> : number of threads to use (default : 1 )
-j, --nodeptherror : in case estimated subsampling rate is larger than 1, do not stop but reset it to 1 and continue.
Usage: python vaf_ncm.py -f -I <input_directory> -O <output_directory> -N output
-I : input directory that contains the output (vaf) files of ngscheckmate_fastq.
-O : output directory
-N : output_filename_tag
This set of scripts generates the .pt file used by the fastq module, given a bed file containing a set of SNP positions. It assumes a file containing a whole genome sequence and the bowtie alignment program.
This script with a set of xgmml templates is used for generating a graph representing matching files as connected nodes. The output format is in .xgmml, which can be read by Cytoscape.
source("graph/ngscheckmate2xgmml.R")
create.xgmml.from.ngscheckmateout(label.file,ngscheckmateoutput.file,output.xgmml)
- Label file : a tab-delimited text file containing a bam file name (1st column), an individual identifier (2nd column) and optionally, a file identifier (3rd column) for each line. An individual identifier must be unique to a subject (e.g. both tumor and normal samples from the same individual must have the same individual identifier). A file identifier must be unique to a file name.
- ngscheckmateoutput.file : the output text file of NGSCheckMate. It is a tab-delimited text file containing two bam file names (1st and 2nd columns), VAF correlation (3rd column) and average depth (4th column). It may contain either all pairs or matched pairs, depending on the option used to run NGSCheckMate. Either type works.
- Sample label file (sample.label.txt) and ngscheckmateouput.file (sample.input.txt) can be found in the subdirectory graph/.
Software programs : Alice Lee, Sejoon Lee & Soo Lee
The logos ( &
) have been created by Fritz Lekschas. They are composed of the following great icons:
- DNA created by Irene Hoffman (CC BY 3.0 US)
- King created by Yuri Mazursky (CC BY 3.0 US)
- Queen created by Yuri Mazursky (CC BY 3.0 US)