Skip to content

LukaLmelias/viral-metagenomics-snakemake-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

viral_metagenomics_pipeline

Workflow DAG

Summary of the snakemake jobs (example with toy dataset)

Workflow DAG

Installation

requirements:

  • conda: Instructions on installing miniconda can be installed here
$ git clone https://git.wur.nl/lenarotoluka.lmelias/viral_metagenomics_pipeline.git
  • Setup snakemake conda environment:

create a snakemake conda environment similar to the one used in making the pipeline, to avoid issues that may arise from other version of snakemake

(you might already have a conda environment named snakemake; you could use a different name for this one)

$ conda env create -n snakemake --file ./tools/yamls/snakemake.yml

Activate the snakemake conda env; and cd into scripts directory, all the analysis will be run in the scripts directory

$ conda activate snakemake; cd ../scripts

Usage

I recommend running the pipeline step by step for ease of fixing errors that you will likely encouter :)

Parameters for the tools used can be changed as necessary in this config

Change slurm params e.g. email here: slurm_defaults.yaml

1. Raw reads quality checks: fastp

$ snakemake -p -j 36  --profile slurmProfile --until fastp

Merge fastp reports into one report with multiqc

$ snakemake -p -j 36  --profile slurmProfile --until fastp_multiqc

2. pre-assembly classification: Kraken2

$ snakemake -p -j 36  --profile slurmProfile --until kraken2

Merge kraken2 reports into one report using multiqc

$ snakemake -p -j 36  --profile slurmProfile --until kraken2_multiqc

3. Host-exclusion: Krakentools

$ snakemake -p -j 36  --profile slurmProfile --until krakentools

4. Metagenome Assembly: metaSPADes

$ snakemake -p -j 36  --profile slurmProfile --until metaspades

5. Assembly quality checks: checkV

change config to download checkv db first (link to the params config)

$ snakemake -p -j 36  --profile slurmProfile --until checkv

6. Select good quality contigs: checkv_select_nodes & filterContigs_on_checkv_quality

contigs can be filtered based on checkv completeness. define your completenes threshold at checkv params and run the part below

Skip to step 7 if you do not wish to filtere

checkv_select_nodes

$ snakemake -p -j 36  --profile slurmProfile --until checkv_select_nodes

filterContigs_on_checkv_quality

$ snakemake -p -j 36  --profile slurmProfile --until filterContigs_on_checkv_quality

CheckV analysis notebook: analyse_checkv.ipynb

7. viral contig prediction: geNomad

$ snakemake -p -j 36  --profile slurmProfile --until genomad

8. viral contig identification: BLASTn

$ snakemake -p -j 36  --profile slurmProfile --until blastn

geNomad & BLASTn analysis notebook: analyse_geNomad_and_blast.ipynb

9. Metagenome binning: Metabat2

$ snakemake -p -j 36  --profile slurmProfile --until metabat2

9. Metagenome binning: VAMB

VAMB inlvolves a number of steps

  1. Concatenate all the contigs from metaspades to a single file: concatenate_contigs
$ snakemake -p -j 36  --profile slurmProfile --until concatenate_contigs
  1. Create a minimap index of the concatenated contigs: minimapindex
$ snakemake -p -j 36  --profile slurmProfile --until minimap2Index
  1. Map the contigs back their samples: minimap
$ snakemake -p -j 36  --profile slurmProfile --until minimap2
  1. Generate sorted bam files: samtools view & sort
$ snakemake -p -j 36  --profile slurmProfile --until samtools
  1. And finally run VAMB
$ snakemake -p -j 36  --profile slurmProfile --until vamb

VAMB & Metabat2 bins analysis notebook: analyse_binning.ipynb

10. Taxonomic profilling: Sourmash

Sourmash also has a number of steps

indexing and creating signatures may take time depending on your datasize

Sourmash parameters can be changed in this config as necessary.

  1. Generate ncbi database signature
$ snakemake -p -j 36  --profile slurmProfile --until sourmash_sketch_ncbi
  1. Index the ncbi siganture to create an on disk SBT
$ snakemake -p -j 36  --profile slurmProfile --until sourmash_index_ncbi
  1. create signature of the query sequences/contigs/reads
$ snakemake -p -j 36  --profile slurmProfile --until sourmash_sketch_dna
  1. Query the ncbi signature/sbt using sourmash gather
$ snakemake -p -j 36  --profile slurmProfile --until sourmash_gather
  1. Generate taxonomic report (krona/kraken) using sourmash tax metagenom
$ snakemake -p -j 36  --profile slurmProfile --until tax_metagenome
  1. Annotate the taxnomic report
$ snakemake -p -j 36  --profile slurmProfile --until tax_annotate

Sourmash analysis notebook : analyse_sourmash.ipynb

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published