Skip to content

J0bbie/VariantAnnotation_VEP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Workflow - Annotation using VEP

license

Maintainer

Table of Content

Introduction

Workflow for annotating (somatic) variants using Variant Effect Predictor (VEP) for GRCh37 or GRCh38.

Briefly, it will perform the following annotations:

  • Standard annotations (using custom GTF for GRCh37)
  • gnoMAD Allele Frequencies - Genome and Exome (v2.1.1)
  • ClinVar
  • SingleLetterAA

The scripts within this workflow will generate a folder (cache) containing all the annotation files required for VEP and additional downstream analysis.

Installation of VEP

Following the author's instructions, install VEP and required CPAN plugins.

In addition, the following tools are also required:

  • bcftools
  • bgzip
  • tabix

The following command can be used if all requirements are met:

# Install cache for GRCh37
perl INSTALL.pl --NO_TEST --NO_HTSLIB --AUTO alcf --PLUGINS SingleLetterAA --CACHEDIR /mnt/onco0002/repository/general/annotation/VEP/GRCh37 --PLUGINSDIR /mnt/onco0002/repository/software/ensembl-vep/Plugins/GRCh37/ --CONVERT --SPECIES homo_sapiens_vep_104_GRCh37

# Install cache for GRCh38, this needs to be a separate folder.
perl INSTALL.pl --NO_TEST --NO_HTSLIB --AUTO alcf --PLUGINS SingleLetterAA --CACHEDIR /mnt/onco0002/repository/general/annotation/VEP/GRCh38 --PLUGINSDIR /mnt/onco0002/repository/software/ensembl-vep/Plugins/GRCh38/ --CONVERT --SPECIES homo_sapiens_vep_104_GRCh38

If the SingleLetterAA module is not found, download it to the ENSEMBL VEP folder using this command:

wget https://raw.github.com/Ensembl/VEP_plugins/release/104/SingleLetterAA.pm

Configuration of additional annotations

Additional files will be downloaded and further processed using scripts/generateCache.R. This will generate all required files in a user-defined folder (cache) which can be used during annotation.

Currently, the following (non-standard) annotations are added:

  • GENCODE
    • The default GENCODE version of VEP (GRCh37) is kept at v19. Hence, we utilize a VEP-friendly custom GTF (latest version of GENCODE) to overwrite the default annotations in case of discrepancy.
  • gnoMAD
    • To add the gnoMAD Allele Frequencies, the gnoMAD VCFs (SNV and InDels) will be processed to greatly reduce their size. Only the genomic positions, REF, ALT and AF will be retained.
  • ClinVar
    • Required the ClinVar VCF files.

Generate the VEP command(s)

Using scripts/performVEP.R, we can generate the corresponding VEP command for either GRCh37 or GRCh38 with the respective (additional) annotations.

Example command (GRCh37):

Rscript --vanilla scripts/performVEP.R -b GRCh37 -i  ~/test/CPCT02010257T.purple.somatic.vcf.gz -x /mnt/onco0002/repository/software/ensembl-vep/vep -g /mnt/onco0002/repository/software/ensembl-vep/Plugins/GRCh37/noChrPrefix_gencode.v38lift37.annotation.gtf.bgz -p /mnt/onco0002/repository/software/ensembl-vep/Plugins/ -c /mnt/onco0002/repository/general/annotation/VEP/ -f /mnt/onco0002/repository/general/annotation/VEP/GRCh37/homo_sapiens/104_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz

Example command (GRCh38):

Rscript --vanilla scripts/performVEP.R -b GRCh38 -i asd.vcf -x /mnt/onco0002/repository/software/ensembl-vep/vep -g /mnt/onco0002/repository/software/ensembl-vep/Plugins/GRCh38/noChrPrefix_gencode.v38.annotation.gff3.bgz -p /mnt/onco0002/repository/software/ensembl-vep/Plugins/ -c /mnt/onco0002/repository/general/annotation/VEP/ -f /mnt/onco0002/repository/general/annotation/VEP/GRCh38/homo_sapiens/104_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

About

Variant annotation workflow using VEP

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages