This README file gives a global introduction on how to work on the server, cloning the pipeline repository and how to run the pipeline. We tried to be as complete and precise as possible. If you have any question, comment, complain or suggestion or if you encounter any conflicts or errors in this document or the pipeline, please contact your Bioinformatics Unit ([email protected]) or open an Issue!

Enjoy your analysis and happy results!

Text written in boxes is code, which usually can be executed in your Linux terminal. You can just copy/paste it.
Sometimes it is "special" code for R or any other language. If this is the case, it will be explicitly mentioned in the instructions.

Text in a red box indicates directory or file names

Text in brackets "<>" indicates that you have to replace it and the brackets by your own appropiate text.

Introduction

epiGBSia a reduced representation bisulfite sequencing method for cost-effective exploration and comparative analysis of DNA methylation and genetic variation in hundreds of samples de novo. This method uses genotyping by sequencing of bisulfite-converted DNA followed by reliable de novo reference construction, mapping, variant calling, and distinction of single-nucleotide polymorphisms (SNPs) versus methylation variation.

This pipeline is cloned from https://github.com/thomasvangurp/epiGBS and adapted to run in a conda environment on the NIOO-bioinformatics servers.

The original reference is accessible here.

Copying the pipeline

To start a new analysis project based on this pipeline, follow the following steps:

Clone and rename the pipeline-skeleton from our GitLab server by typing in the terminal. Replace by your NIOO login-name. Cloning will only work, if you have logged in to gitlab at least once before:

https://github.com/nioo-knaw/epiGBS-pipeline.git

Enter epiGBS and download epiGBS code

cd epiGBS
git submodule init
git submodule update

Directories

The toplevel `README` file

This file contains this content with general information about how to run this pipeline.

The `data` directory

Place your epiGBS read files here.

The `src` directory

Contains the code of the epiGBS pipeline.

The `docs` directory

Here, you can store all other files, like metadata etc.

The `analysis` directory

This is the directory, where you will analyse your data and all output will be generated.

Prerequisites

The following system dependencies are required:

git
zlib1g
zlib1g-dev
bzip2
python-pip
python-dev
pigz
libfreetype6-dev
pkg-config
gfortran
liblapack-dev

Next to that, you have to install different dependencies using conda before starting the pipeline for the first time:

conda env create -f environment.yaml
source activate epiGBS

if you already have created the environment epiGBS before, than activate it by

source activate epiGBS

You can deactivate the environment after pipeline execution with

source deactivate

Start the pipeline

Activate the epiGBS environment

(see Prerequisites for detailed instructions).

source activate epiGBS

Add read files in the `data` directory.

see Directories

Make a barcode file and add to the `data` directory

The barcode file is tab-delimited and contains at least the following columns: Flowcell, Lane, Barcode_R1, Barcode_R2, Barcode_R2, Sample, ENZ_R1, ENZ_R2, Wobble_R1, Wobble_R2. All other fields are mandatory.

# barcodes.tsv
Flowcell        Lane    Barcode_R1      Barcode_R2      Sample  history Country PlateName       Row     Column  ENZ_R1  ENZ_R2  Wobble_R1       Wobble_R2       Species
H53KHCCXY       5       AACT    CCAG    BUXTON_178      C       BUXTON  BUXTON_WUR_AseI_NsiI_final_run1 1       2       AseI    NsiI    3       3       Scabiosa columbaria
H53KHCCXY       5       CCTA    CCAG    WUR_178 C       WUR     BUXTON_WUR_AseI_NsiI_final_run1 2       2       AseI    NsiI    3       3       Scabiosa columbaria
H53KHCCXY       5       TTAC    CCAG    BUXTON_169      C       BUXTON  BUXTON_WUR_AseI_NsiI_final_run1 3       2       AseI    NsiI    3       3       Scabiosa columbaria
H53KHCCXY       5       AGGC    CCAG    WUR_169 C       WUR     BUXTON_WUR_AseI_NsiI_final_run1 4       2       AseI    NsiI    3       3       Scabiosa columbaria
H53KHCCXY       5       GAAGA   CCAG    BUXTON_175      SD      BUXTON  BUXTON_WUR_AseI_NsiI_final_run1 5       2       AseI    NsiI    3       3       Scabiosa columbaria
H53KHCCXY       5       CCTTC   CCAG    WUR_175 SD      WUR     BUXTON_WUR_AseI_NsiI_final_run1 6       2       AseI    NsiI    3       3       Scabiosa columbaria

Execute the pipeline

Execute the following commands:

# make some links to python files
bash links.sh

# change directory to analysis
cd analysis

# open the demultiplex.sh script
nano demultiplex.sh
# adjust the --r1, --r2, --barcodes flags accordingle to your input names. Choose a name for --output_dir. Close nano with Ctrl+x
# execute the bash script
bash demultiplex.sh

# if you have a large fastq input file, demultiplexing will take long. Use the Snakefile instead.

nano make_reference.sh
# adjust all paths accordingly to your choices from the previous step
bash make_reference.sh

nano mapping_variant_calling.sh
# adjust all paths accordingly to your choices from the previous step, except path to --tmpdir
bash mapping_variant_calling.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contents

How to use this file

Enjoy your analysis and happy results!

Introduction

Copying the pipeline

Directories

The toplevel `README` file

The `data` directory

The `src` directory

The `docs` directory

The `analysis` directory

Prerequisites

Start the pipeline

Activate the epiGBS environment

Add read files in the `data` directory.

Make a barcode file and add to the `data` directory

Execute the pipeline

More Reading

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
analysis		analysis
data		data
doc		doc
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
environment.yaml		environment.yaml
links.sh		links.sh

nioo-knaw/epiGBS-pipeline

Folders and files

Latest commit

History

Repository files navigation

Contents

How to use this file

Enjoy your analysis and happy results!

Introduction

Copying the pipeline

Directories

The toplevel README file

The data directory

The src directory

The docs directory

The analysis directory

Prerequisites

Start the pipeline

Activate the epiGBS environment

Add read files in the data directory.

Make a barcode file and add to the data directory

Execute the pipeline

More Reading

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

The toplevel `README` file

The `data` directory

The `src` directory

The `docs` directory

The `analysis` directory

Add read files in the `data` directory.

Make a barcode file and add to the `data` directory

Packages