Workflow Overview

These documents describe a standardized Nextflow workflow for running the Carafe tool for experiment-specific in silico spectral library generation for DIA data analysis. The source code for the workflow can be found at: https://github.com/mriffle/nf-carafe-ai-ms.

How to Run

This workflow uses the Nextflow standardized workflow platform. The Nextflow platform emphasizes ease of use, workflow portability, and containerization of the individual steps. To run this workflow, you do not need to install any of the software components of the workflow. There is no need to worry about installing necessary software libraries, version incompatibilities, or compiling or installing complex and fickle software.

To run the workflow you need only install Nextflow, which is relatively simple. To run the individual steps of the workflow on your own computer, you will need to install Docker. After these are installed, you will need to edit the pipeline configuration file to supply the locations of your data and execute a simple Nextflow command, such as:

nextflow run -resume -r main mriffle/nf-carafe-ai-ms -c pipeline.config

The entire workflow will be run automatically, downloading Docker images as necessary, and the results output to the results directory. See How to Install the Workflow for more details on how to install Nextflow and Docker. See How to Run the Workflow for more details on how to run the workflow. And see Output & Results for more details on how to retrieve the results.

Workflow Components

The workflow is made up of the following software components, each may be run multiple times for different tasks.

  • PanoramaWeb (https://panoramaweb.org/home/project-begin.view)

    Users may optionally use WebDAV URLs as locations for input data files in PanoramaWeb. The workflow will automatically download files as necessary.

  • msconvert (https://proteowizard.sourceforge.io/)

    If users supply Thermo RAW files (.raw) as input, they will be converted to mzML using msconvert.

  • unzip (Bruker data extraction)

    If users supply Bruker .d.zip files as input, they will be unzipped to .d directories for processing. Bruker .d directories may also be supplied directly as input (local paths only; PanoramaWeb requires .d.zip files).

  • DIA-NN (https://github.com/vdemichev/DiaNN)

    DIA-NN (1.8.1) is used to generate data as input to Carafe. Newer versions (2.x) may be used by building a custom Docker image. See Using a Custom DIA-NN Version for instructions.

  • Carafe (https://github.com/Noble-Lab/Carafe)

    Carafe uses AI to generate an enhanced spectral library for the supplied FASTA.