Skip to content

bcgsc/AMPSeek

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMPSeek

Anti-microbial property, protein structure and toxicity prediction from peptide sequences

Introduction

Background and Rationale

Microbes developing resistace to the current drugs, Antimicrobial Resistance (AMR) has become a global threat, even one of the biggest of public health and development problems, according to World Health Organization [1]. In 2019, almost 5 million deaths related to bacterial AMR worldwide were documented. [2]. Moreover, the issue extends beyond health, significantly impacting the global economic balance as well. The World Bank estimates global Gross Domestic Product to decrease by 1.1% percent if AMR is still a part of our lives by 2050 [3]. This effect is equivalent with the outcome of the Great Recession [3]. Luckily, this is an area for proteomics and bioinformatics and scientific community tries to obtain reliable drugs instead of current ones.

One alternative drug which is helpful to AMR is Antimicrobial Peptides (AMPs), short peptide sequences, that are generated by both eukaryotic and prokaryotic organisms [4]. AMPs are active agains different microbes like fungi, bacteria, viruses, and yiests [4]. Their major mode of action is lysis of the target organism [4]. They engage the cell membrane of the target organism based on their charge, hidrophobicity, and 3D structure [5]. Since they only engage with the cell wall of the target organism, the possibility of gaining the AMR[5]. Due to their wide effects on different target organisms and low-possibility of generating AMR, it is a valid candidate for drug research.

Historically in vivo or in vitro methods including Nuclear Magnetic Resonance have been used for AMP discovery and evaluation 6. However, this methods are both time consuming and expensive [7]. Thanks to the technological advancements, now prediction of structure, activity, and toxicity of peptides can be done in silico [8,9,10]. By predicting the AMP structure, activity, and toxicity in silico; inexpensive, convinient, fast, and accurate analyses can be done in AMR and AMP research. By identifying the properties of these AMPs, we can detect, and gain information about possible AMP drugs and pave the way for new software that can use predictions.

Purpose and Main Software

AMPSeek proposes a pipeline to predict AMP activity, structure, and toxicity of a given peptide sequence. It uses _AMPlify_ [8] for activity, LocalColabFold [9] for 3D structure prediction, and tAMPer [10] for toxicity prediction.

Pipeline

This pipeline is built on pipeline manager Nextflow [11] and requires git, conda and Singularity to run. You can find the installations by clicking on the hyperlinks attached to the text: Nextflow, git, conda, Singularity.

Pipeline Steps

This pipeline is comprised of 4 major and 2 minor steps:

  1. PREP: This step installs the given peptide sequence .fa file --download_from <url> from the internet.
  2. RUNAMPLIFY: This step runs AMPlify on the input .fasta file and saves the result of AMP activity prediction.
  3. RUNCOLABFOLD: This step runs LocalColabFold on the input .fa file and saves the 3D structure prediction.
  4. RUNTAMPER: This step runs tAMPer on the input .fa and zipped structure files. It saves the toxicity report at the end of its prediction.
  5. COMPILERESULTS: This step compiles the results from steps 2, 3, and 5 into an interactive .html file.

Pipeline

Main Images/Environments:

  1. AMPlify: "quay.io/biocontainers/amplify:1.1.0--hdfd78af_0" container image has been used. This contains AMPlify v1.1.0 and its dependencies.
  2. LocalColabFold: "biohpc/localcolabfold:1.5" container image has been used. This contains LocalColabFold v1.5.0 and its dependencies.
  3. tAMPer: "itsberkeucar/tamper:latest" container image has been used. This mainly contains tAMPer and its requirements.
  4. Visualization: "itsberkeucar/ampseek-visualization:latest" container image has been used. This mainly contains Jinja2, matplotlib, and pandas.

Pipeline Input

This pipeline only takes one input file, a FASTA file. The FASTA file contains the peptide sequences that are wanted to be predicted in antimicrobial property and toxicity.

The default input is the example input AMPSeek/data/AMPSeek_data_10.fasta which contains 10 peptides, a subsetted version of test file ÀMPlify/data files. We provide 3 more datasets of size 20, 100, and 200 in AMPSeek/data/alternative_data subfolder.

Users can download their file from the internet using the flags:

nextflow AMPSeek.nf --download_from <url>

One other option for users is to manually store the FASTA file that want to provide to the pipeline in the folder AMPSeek/data. It is important for users to have only the file that they want to run in that folder, but nothing else.

The pipeline can check different peptide sequences in different lengths and predict their activity, 3D structure, charge, and toxicity.

Intermediate Result Files and Folders:

As a result of prediction and zip stages, pipeline generates different intermediate files. The intermediate generated prediction files for activity, 3D structure, and toxicity are stored in AMPSeek/output folder. In this folder:

  1. foldings folder: This folder contains the output of peptide structure prediction.
  2. *.tsv file: This is the report of the run for AMPlify. It contains the data pipeline uses this file for the final report's AMP activity data representation.
  3. foldings/*.csv file: This file contains the tAMPer's report for the toxicity. The pipeline interacts with this file to generate the final report's toxicity data representation.

Second and third files get deleted after compiling the ultimate result which contains relevant information for the pipeline.

Output File:

The ultimate output file of the pipeline is the output/results.html file if there was no custom path for final output given. The file contains these information:

  1. Peptides properties: ID, Sequence, Length, and Charge
  2. Antimicrobial activity predictions: AMPlify prediction and score
  3. Toxicity predictions: tAMPer prediction and score
  4. Summary statistics and plots:
    • Piecharts of Bioactivity and Toxicty Property Distribution of the total peptide batch.
    • AMPlify vs. tAMPer score scatter plot for peptides.
    • AMPlify attention distribution across amino acid residues.
    • Interactive tertiary structure prediction plots.

Installation and Default Run

First, clone this repository to your local:

git clone https://github.com/berkeucar/AMPSeek.git

Next, change your directory to the project directory:

cd AMPSeek

Check if Singularity or Docker is installed properly:

singularity --version
docker --version

Check if NextFlow is installed properly:

nextflow -h

Now, you are ready to run the pipeline (with default inputs):

nextflow AMPSeek.nf -profile singularity
nextflow AMPSeek.nf -profile docker

or you can run pipeline with giving the <url> you want your data to download from.

nextflow AMPSeek.nf -profile singularity --download_from <url> 

The input directory that the pipeline checks can be changed with --data_path flag. You can also specify the output file name and location with --output_file and --output_path flags respectively.

Furthermore, the number of threads, timeout amount, and maximum memory can be specified with --threads, --time, and --mem flags, respectively.

Note: If you have your data, you can manually put the data into AMPSeek/data folder, but you need to delete the example input (AMPSeek/data/AMPSeek_data_10.fasta).

References

[1] World Health Organization. (2023, November 21). Antimicrobial Resistance. https://www.who.int/news-room/fact-sheets/detail/antimicrobial-resistance

[2] Antimicrobial Resistance Collaborators (2022). Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet (London, England), 399(10325), 629–655. https://doi.org/10.1016/S0140-6736(21)02724-0

[3] World Bank Group. (2017). Drug-Resistant Infections: A Threat to Our Economic Future. https://documents1.worldbank.org/curated/en/323311493396993758/pdf/final-report.pdf

[4] Reddy, K. V. R., Yedery, R. D., & Aranha, C. (2004). Antimicrobial peptides: Premises and promises. International Journal of Antimicrobial Agents, 24(6), 536–547. https://doi.org/10.1016/j.ijantimicag.2004.09.005

[5] Lei, J., Sun, L., Huang, S., Zhu, C., Li, P., He, J., Mackey, V., Coy, D. H., & He, Q. (2019). The antimicrobial peptides and their potential clinical applications. American Journal of Translational Research, 11(7), 3919–3931.

[6] Porcelli, F., Ramamoorthy, A., Barany, G., & Veglia, G. (2013). On the Role of NMR Spectroscopy for Characterization of Antimicrobial Peptides. In G. Ghirlanda & A. Senes (Eds.), Membrane Proteins: Folding, Association, and Design (pp. 159–180). Humana Press. https://doi.org/10.1007/978-1-62703-583-5_9

[7] Wishart, D. S. (2019). NMR metabolomics: A look ahead. Journal of Magnetic Resonance, 306, 155–161. https://doi.org/10.1016/j.jmr.2019.07.013

[8] Li, C., Sutherland, D., Hammond, S. A., Yang, C., Taho, F., Bergman, L., … Birol, I. (2022). AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BMC Genomics, 23(1), 77. https://doi.org/10.1186/s12864-022-08310-4

[9] Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., & Steinegger, M. (2022). ColabFold: Making protein folding accessible to all. Nature Methods, 19(6), Article 6. https://doi.org/10.1038/s41592-022-01488-1

[10] bcgsc/tAMPer: tAMPer: antimicrobial peptides toxicity prediction. (n.d.). Retrieved December 14, 2023, from https://github.com/bcgsc/tAMPer

[11] Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319. doi:10.1038/nbt.3820

About

Anti-microbial property, protein structure and toxicity detection from peptide sequences

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •