A pipeline for analyzing Oxford Nanopore 16S rRNA sequencing data from clinical samples.
- Free software: MIT license
MABA16S processes 16S rRNA sequencing data to classify reads at the genus and species levels, providing detailed taxonomic identification for clinical microbiology.
- Filtlong:
- Minimum read length: 1200 bp
- Kraken2:
- Genera with a minimum of 50 reads are processed further
- extract_kraken_reads.py from kraken-tools
- Minimap2: alignment to SILVA reference sequences for specific genera
- Samtools consensus
- BLASTn
- reads are classified on genus level using kraken2 and SILVA database
- reads for each genus are extracted
- each genus readset is mapped to the first species in the SILVA database of this genus
- consensus sequence is extracted and BLASTed to the SILVA database to obtain a species ID
- results are compiled and written to a spreadsheet
As a quickstart to use this pipeline you need Python 3.6 or higher, conda environment manager and snakemake.
Usage
git clone https://github.com/MUMC-MEDMIC/MABA16S
cd MABA16S/maba16s
python cli.py snakemake -i folders_containing_nanopore16s_reads -o my_output_directory --cores 1
# input are directories which hold your nanopore reads. Naming of the output will be done based on the names of these directories
The output directory contains the following structure:
my_output_directory/
├── kraken2/
│ ├── {sample}/
│ │ ├── krakenreport_filtered.txt # Filtered Kraken2 report
│ │ ├── output.txt # Full Kraken2 classification output
│ │ ├── reads/ # Genus-specific reads (FASTQ files)
├── kraken2consensus/
│ ├── {sample}/
│ │ ├── reference_fastas/ # Reference FASTA files used for alignment
│ │ ├── aligned_reads/ # BAM files for aligned reads
│ │ ├── consensus_fastas/ # Consensus FASTA files
├── BLAST/
│ ├── {sample}/
│ │ ├── *_BLASTn.txt # BLASTn results for consensus sequences
├── QC/
│ ├── {sample}/
│ │ ├── {sample}_qcPreprocessing.txt # Preprocessing QC metrics
│ │ ├── {sample}_qcPostAnalysis.txt # Extended QC metrics
├── reports/
│ ├── {sample}.xlsx # Comprehensive report for each sample
├── sankeys/
├── {sample}_sankey.html # Interactive Sankey diagrams
This package was created with Cookiecutter_ and the audreyr/cookiecutter-pypackage
_ project template.
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _audreyr/cookiecutter-pypackage
: https://github.com/audreyr/cookiecutter-pypackage