- How to use this file
- Introduction
- Copying the pipeline
- Directories
- Prerequisites
- Start the pipeline
- More Reading
This README file gives a global introduction on how to work on the server, cloning the pipeline repository and how to run the pipeline. We tried to be as complete and precise as possible.
If you have any question, comment, complain or suggestion or if you encounter any conflicts or errors in this document or the pipeline, please contact your Bioinformatics Unit ([email protected]) or open an Issue
!
Text written in boxes is code, which usually can be executed in your Linux terminal. You can just copy/paste it.
Sometimes it is "special" code for R or any other language. If this is the case, it will be explicitly mentioned in the instructions.
Text in a red box indicates directory or file names
Text in brackets "<>" indicates that you have to replace it and the brackets by your own appropiate text.
epiGBSia a reduced representation bisulfite sequencing method for cost-effective exploration and comparative analysis of DNA methylation and genetic variation in hundreds of samples de novo. This method uses genotyping by sequencing of bisulfite-converted DNA followed by reliable de novo reference construction, mapping, variant calling, and distinction of single-nucleotide polymorphisms (SNPs) versus methylation variation.
This pipeline is cloned from https://github.com/thomasvangurp/epiGBS and adapted to run in a conda environment on the NIOO-bioinformatics servers.
The original reference is accessible here.
To start a new analysis project based on this pipeline, follow the following steps:
- Clone and rename the pipeline-skeleton from our GitLab server by typing in the terminal. Replace by your NIOO login-name. Cloning will only work, if you have logged in to gitlab at least once before:
https://github.com/nioo-knaw/epiGBS-pipeline.git
- Enter
epiGBS
and download epiGBS code
cd epiGBS
git submodule init
git submodule update
This file contains this content with general information about how to run this pipeline.
Place your epiGBS read files here.
Contains the code of the epiGBS pipeline.
Here, you can store all other files, like metadata etc.
This is the directory, where you will analyse your data and all output will be generated.
The following system dependencies are required:
- git
- zlib1g
- zlib1g-dev
- bzip2
- python-pip
- python-dev
- pigz
- libfreetype6-dev
- pkg-config
- gfortran
- liblapack-dev
Next to that, you have to install different dependencies using conda before starting the pipeline for the first time:
conda env create -f environment.yaml
source activate epiGBS
if you already have created the environment epiGBS
before, than activate it by
source activate epiGBS
You can deactivate the environment after pipeline execution with
source deactivate
(see Prerequisites for detailed instructions).
source activate epiGBS
see Directories
The barcode file is tab-delimited and contains at least the following columns: Flowcell
, Lane
, Barcode_R1
, Barcode_R2
, Barcode_R2
, Sample
, ENZ_R1
, ENZ_R2
, Wobble_R1
, Wobble_R2
. All other fields are mandatory.
# barcodes.tsv
Flowcell Lane Barcode_R1 Barcode_R2 Sample history Country PlateName Row Column ENZ_R1 ENZ_R2 Wobble_R1 Wobble_R2 Species
H53KHCCXY 5 AACT CCAG BUXTON_178 C BUXTON BUXTON_WUR_AseI_NsiI_final_run1 1 2 AseI NsiI 3 3 Scabiosa columbaria
H53KHCCXY 5 CCTA CCAG WUR_178 C WUR BUXTON_WUR_AseI_NsiI_final_run1 2 2 AseI NsiI 3 3 Scabiosa columbaria
H53KHCCXY 5 TTAC CCAG BUXTON_169 C BUXTON BUXTON_WUR_AseI_NsiI_final_run1 3 2 AseI NsiI 3 3 Scabiosa columbaria
H53KHCCXY 5 AGGC CCAG WUR_169 C WUR BUXTON_WUR_AseI_NsiI_final_run1 4 2 AseI NsiI 3 3 Scabiosa columbaria
H53KHCCXY 5 GAAGA CCAG BUXTON_175 SD BUXTON BUXTON_WUR_AseI_NsiI_final_run1 5 2 AseI NsiI 3 3 Scabiosa columbaria
H53KHCCXY 5 CCTTC CCAG WUR_175 SD WUR BUXTON_WUR_AseI_NsiI_final_run1 6 2 AseI NsiI 3 3 Scabiosa columbaria
Execute the following commands:
# make some links to python files
bash links.sh
# change directory to analysis
cd analysis
# open the demultiplex.sh script
nano demultiplex.sh
# adjust the --r1, --r2, --barcodes flags accordingle to your input names. Choose a name for --output_dir. Close nano with Ctrl+x
# execute the bash script
bash demultiplex.sh
# if you have a large fastq input file, demultiplexing will take long. Use the Snakefile instead.
nano make_reference.sh
# adjust all paths accordingly to your choices from the previous step
bash make_reference.sh
nano mapping_variant_calling.sh
# adjust all paths accordingly to your choices from the previous step, except path to --tmpdir
bash mapping_variant_calling.sh