Skip to content

A POC for automating certain processes that use the Illumina Connected Analytics (ICA) CLI.

License

Notifications You must be signed in to change notification settings

SBIMB/ica-v2-poc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ica-v2-poc

Introduction

This is a simple proof of concept (POC) for automating certain processes that use the Illumina Connected Analytics (ICA) CLI. The main processes that we wish to automate are:

  • uploading files for analysis
  • run Nextflow pipelines for analysis
  • trigger download of output file(s)
  • delete output file after download succeeds

We can use a combination of both the API and the CLI. However, we will almost exclusively use the CLI.

Before we can begin, we need to have an existing project or create a new project. For the rest of this README, we'll be referring to the existing project, SGDP.

Authentication

Authentication is required in order to use the API or the CLI. After logging in to the UI, an API key needs to be created. Instructions for generating an API key can be found over here.

There are two ways to authenticate in order to make use of the API:

  1. API key + JWT for the entire API, except for the POST /tokens endpoint.
  2. API key + Basic Authentication (username/email and password) for the POST /tokens endpoint.

When using the CLI, authentication takes place when running the command:

icav2 config set

There will be prompts. The defaults can be used by simply pressing Enter or Return. When the API key is prompted, provide the value that has been generated in the UI.

icav2 config set
Creating $HOME/.icav2/config.yaml
Initialize configuration settings [default]
server-url [ica.illumina.com]: 
x-api-key : myAPIKey
output-format (allowed values table,yaml,json defaults to table) : 
colormode (allowed values none,dark,light defaults to none) :

The $HOME/.icav2/config.yaml file can be modified if the default settings are wished to be changed. In our case, our output format is JSON.

Our goal is to create a process that achieves the following:

  • upload data to ICA
  • start a pipeline run (or analysis) in ICA
  • periodically check the status of the ongoing analysis
  • download results when analysis is complete
  • delete results and the uploaded files

A diagram illustrating a single file upload-analysis-download-delete process can be seen below:
Upload-Download ICA Bash Process

DRAGEN Pipeline for Pair of FASTQ Files

When running the DRAGEN Germline Whole Genome 4-3-6 pipeline using a pair of .fastq files, we need to also provide a .csv file containg some metadata about the .fastq file pair. As an illustration, suppose the names of the .fastq files for a given analysis are mysample_R1_001.fastq and mysample_R2_001.fastq, then the .csv file will contain the following data:

RGID,RGSM,RGLB,Lane,Read1File,Read2File
mysample,mysample,RGLB,1,mysample_R1_001.fastq,mysample_R2_001.fastq

The reference file to be used is the chm13_v2-cnv.hla.methylation_combined.rna_v4.tar.gz. Reference files for the DRAGEN pipelines always have a .tar extension.

The Nextflow pipeline (or workflow) responsible for the uploading, triggering of DRAGEN analysis, polling analysis status, downloading output, and then deleting the analysis output and uploaded files can be found over here.

FASTQ-Upload-Analyse-Download-Delete Workflow

The workflow passes data from process to process in the form of a .txt file called data.txt. All necessary data (like ids) gets written to the .txt file as the workflow implements the different processes. This is an example of what the data.txt file would look like:

sampleId:ERR1019050
read1:fil.85255ad2588d4e5fe75a08dcaabcc45f
read2:fil.6bcfeca6252941dde75b08dcaabcc45f
ref_tar:fil.2e3fd8d802ee4963da2208dc484ea8f0
read1Name:ERR1019050_R1_001.fastq
read2Name:ERR1019050_R2_001.fastq
analysisId:9aa57a35-7e66-4d4e-9c05-729767ff0290
analysisRef:regan_dragen_germline_whole_genome_test_05-DRAGEN Germline Whole Genome 4-3-6-a7f59145-3f93-4579-9129-c2b726dc4414
outputFolderId:fol.7cdafdb7363062eef75b08edbbcdd56a

If the files are already in ICA and don't need to be uploaded, then a shorter workflow can be used, i.e. the fastq_single_pair_dragen_analysis_no_upload workflow. This workflow skips the upload process. Consequently, the file ids of the already uploaded files will need to be provided in the params.json.

FASTQ-Analyse-Download-Delete Workflow

Finally, an even shorter workflow exists, i.e. the fastq_single_pair_dragen_analysis_download workflow. This one is to be used when files have already been uploaded and the DRAGEN analysis has already run to completion. In that case, only the download and delete processes are required.

FASTQ-Download-Delete Workflow

To run any of these workflows, simply enter the directory where the main.nf and params.json files are in, and then run:

nextflow main.nf -params-file params.json

DRAGEN Pipeline for BAM Files

When running the DRAGEN Germline Whole Genome 4-3-6 pipeline using a .bam file as input, the .bam indexes (.bai) are required when realignment is disabled. Since all the .bam files are stored in a single directory, the sampleId needs to be provided in the params.json so that we know which file to upload and analyse. The corresponding index file (.bam.bai) will also be uploaded. The workflow for the uploading, triggering of DRAGEN analysis, polling analysis status, downloading analysis output, and deleting output and uploaded files can be found over here.

Similarly to the workflow logic for the .fastq files, data gets written to a .txt file called data.txt. This is an example of what the data.txt file would look like:

sampleId:myBamFile
bam:fil.85255ad2588d4e5fe75a08dcaabcc45f
bai:fil.6bcfeca6252941dde75b08dcaabcc45f
ref_tar:fil.2e3fd8d802ee4963da2208dc484ea8f0
bamFileName:myBamFile.bam
baiFileName:myBamFile.bam.bai
analysisId:9aa57a35-7e66-4d4e-9c05-729767ff0290
analysisRef:regan_dragen_germline_whole_genome_test_05-DRAGEN Germline Whole Genome 4-3-6-a7f59145-3f93-4579-9129-c2b726dc4414
outputFolderId:fol.7cdafdb7363062eef75b08edbbcdd56a

About

A POC for automating certain processes that use the Illumina Connected Analytics (ICA) CLI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published