Workflow Type | Applicable Kingdom | Last Known Changes | Command-line Compatibility | Workflow Level |
---|---|---|---|---|
mycosnp_variants | Fungi | v1.5 | Yes | Sample-level |
mycosnp_tree | Fungi | v1.5 | Yes | Set-level |
WDL wrappers of CDCGov/mycosnp-nf designed for Terra.bio integration. These workflows conduct Candiozyma (Candida) auris variant calling and subsequent single nucleotide polymorphism (SNP) phylogenetic tree reconstruction.
mycosnp_variants
calls variants for inputted reads referencing the C. auris B11204 assembly accession GCA_016772135 by default. Users can optionally reference a separate C. auris clade data directory, FASTA, or directory as described below.
Note that mycosnp_tree
requires at least 4 genomes that reference the same reference in mycosnp_variants
.
- reference optionally takes a presupplied reference clade directory depicted here. The default is
GCA_016772135
. - ref_fasta optionally takes a reference FASTA (requires suffix
.fa
) that will be indexed via BWA and generate a reference directory. - ref_tar optionally takes a gzipped tarchive (
.tar.gz
) with the same directory structure as the provided reference clades:
data/reference
├── B11221 # Prebuilt clade directory
├── Clade1
│ ├── bwa
| | ├── bwa # BWA index for alignment
| | | ├── reference.am
| | | ├── reference.ann
| | | ├── reference.bwt
| | | ├── reference.pac
| | | └── reference.sa
│ ├── dict
| | └── reference.dict # Picard dictionary
│ ├── fai
| | └── reference.fa.fai # FASTA index file
│ ├── masked
| | └── reference.fa # Masked reference sequence
│ └── Clade1.fasta
├── Clade2
├── Clade3
├── Clade4
├── Clade5
└── GCA_016772135 # Default reference
- strain optionally delineates the strain name for VCF gene name annotation. MycoSNP currently only annotates with respect to the default strain, "B11205", so changing this option will simply bypass VCF annotation.
Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
---|---|---|---|---|---|
mycosnp_variants | read1 | File | Illumina forward read file in FASTQ format (compression optional) | Required | |
mycosnp_variants | read2 | File | Illumina reverse read file in FASTQ format (compression optional) | Required | |
mycosnp_variants | samplename | String | Name of sample to be analyzed | Required | |
mycosnp | coverage | Int | Coverage is used to calculate a down-sampling rate that results in the specified coverage. For example, if coverage is 70, then FASTQ files are down-sampled such that, when aligned to the reference, the result is approximately 70x coverage | 0 | Optional |
mycosnp | cpu | Int | Number of CPUs to allocate to the task | 8 | Optional |
mycosnp | debug | Boolean | If true, keeps .nextflow/ and work/ directories |
false | Optional |
mycosnp | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
mycosnp | docker | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | Optional |
mycosnp | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 64 | Optional |
mycosnp | min_depth | Int | Min depth for a base to be called as the consensus sequence, otherwise it will be called as an N; set to 0 to disable | 10 | Optional |
mycosnp | reference | String | Reference clade | "GCA_016772135" | Optional |
mycosnp | sample_ploidy | Int | 1 | Ploidy of sample (GATK) | Optional |
mycosnp | strain | String | Reference strain | "B11205" | Optional |
mycosnp_variants | ref_fasta | File | Reference FASTA file | Optional | |
mycosnp_variants | ref_tar | File | Reference gzipped compressed tarchive | Optional | |
version_capture | timezone | String | Alternative timezone | Optional |
Variable | Type | Description |
---|---|---|
analysis_date | String | Date of the analysis |
assembly_size | Int | Size of the assembly |
average_q_score_after_trimming | Float | Average quality score after trimming |
average_q_score_before_trimming | Float | Average quality score before trimming |
consensus_n_variant_min_depth | Int | Minimum depth for consensus N variant |
full_results | File | Full results file |
gc_after_trimming | Float | GC content after trimming |
gc_before_trimming | Float | GC content before trimming |
mean_coverage_depth | Float | Mean coverage depth |
multiqc | File | MultiQC report |
myco_bam | File | BAM file |
myco_bam_bai | File | BAM index file |
mycosnp_docker | String | Docker image used for MycoSNP |
mycosnp_variants_analysis_date | String | Date of the MycoSNP variants analysis |
mycosnp_variants_version | String | Version of the MycoSNP variants |
mycosnp_version | String | Version of MycoSNP |
number_n | Int | Number of N bases |
paired_reads_after_trimming | Int | Number of paired reads after trimming |
paired_reads_after_trimming_percent | String | Percentage of paired reads after trimming |
percent_reference_coverage | Float | Percentage of reference coverage |
reads_after_trimming | Int | Number of reads after trimming |
reads_after_trimming_percent | String | Percentage of reads after trimming |
reads_before_trimming | Int | Number of reads before trimming |
reads_mapped | Int | Number of reads mapped |
reference_length_coverage_after_trimming | Float | Reference length coverage after trimming |
reference_length_coverage_before_trimming | Float | Reference length coverage before trimming |
reference_name | String | Name of the reference genome used |
reference_strain | String | Reference strain used |
unpaired_reads_after_trimming | Int | Number of unpaired reads after trimming |
unpaired_reads_after_trimming_percent | String | Percentage of unpaired reads after trimming |
vcf | File | Compressed variant call format (VCF) file depicting SNPs |
vcf_index | File | Compressed index file for the VCF |
mycosnp_tree
reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade5 C. auris. VCF data generated from wf_mycosnp_variants.wdl are used as inputs.
NOTE: At least four samples, including reference, are required
- reference optionally takes a presupplied reference clade directory delineated here.
- ref_fasta optionally takes a reference FASTA (requires suffix
.fa
) that will be indexed via BWA and generate a reference directory. - strain is passed to output but does not change workflow function.
Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
---|---|---|---|---|---|
mycosnp_tree | vcf | Array[File] | VCF files (.vcf.gz) containing SNP data for phylogenetic analysis. These files can be generated from wf_mycosnp_variants.wdl |
Required | |
mycosnp_tree | vcf_index | Array[File] | Index files for the VCF files | Required | |
mycosnp_tree | ref_fasta | File | Reference FASTA input | Optional | |
mycosnptree | cpu | Int | Number of CPUs to allocate to the task | 8 | Optional |
mycosnptree | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
mycosnptree | docker | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | Optional |
mycosnptree | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 64 | Optional |
mycosnptree | reference | String | Preexisting reference directory | "GCA_016772135" | Optional |
mycosnptree | strain | String | mycosnp-nf reference strain name | "B11205" | Optional |
version_capture | timezone | String | Alternative timezone | Optional |
Variable | Type | Description |
---|---|---|
mycosnp_alignment | File | Concatenated SNP alignment file |
mycosnp_docker | String | Docker image used for MycoSNP |
mycosnp_fastree_tree | File | Phylogenetic tree inferred using FastTree (heuristic maximum likelihood) |
mycosnp_iqtree_tree | File | Phylogenetic tree inferred using IQ-TREE (high quality maximum likelihood) |
mycosnp_rapidnj_tree | File | Phylogenetic tree inferred using RapidNJ (neighbor-joining method) |
mycosnp_tree_analysis_date | String | Date of the analysis |
mycosnp_tree_full_results | File | Full results file |
mycosnp_tree_vcf_csv | File | SNP variants formatted as a CSV table |
mycosnp_tree_version | String | Version of the mycosnp_tree WDL workflow |
mycosnp_version | String | Version of MycoSNP |
mycosnptree_snpdists | File | SNP distances file |
reference_name | String | Name of the reference |
reference_strain | String | Reference strain used |