Workflow Parameters

The workflow parameters should be included in a configuration file, an example of which can be found at https://raw.githubusercontent.com/mriffle/nf-carafe-ai-ms/main/resources/pipeline.config

The parameters in this file should be changed to indicate the locations of your data, the options you’d like to use for the software included in the workflow, and the capabilities and configuration for the system on which you are running the workflow steps.

The configuration file is roughly organized as:

params {
...
}

profiles {
...
}

mail {
...
}
  • The params section includes locations of data and configuration options for a specific run of the workflow.

  • The profiles sections includes parameters that describe the capabilities of the systems that run the steps of the workflow. For example, if running on your local system, this will include things like how many cores and how much RAM may be used by the steps of the workflow. This will not need to be changed for each run of the workflow.

  • The mail section includes configuration options for sending email. This is optional and only necessary if you wish to send emails when the workflow completes. This will not need to be changed for each run of the workflow.

Below is a complete description of all parameters that may be included in these sections.

Note

This workflow can process files stored in PanoramaWeb. When specifying directories or file locations, any paths that begin with https:// will be interpreted as being PanoramaWeb locations.

For example, to process a single raw file stored in PanoramaWeb, you would have the following in your pipeline.config file:

spectra_file = 'https://panoramaweb.org/_webdav/path/to/@files/RawFiles/my_file.raw'

To process multiple files from a PanoramaWeb directory:

spectra_dir = 'https://panoramaweb.org/_webdav/path/to/@files/RawFiles'
spectra_dir_glob = '*.raw'

Where the URL is the WebDav URL of the file or directory on the Panorama server.

Bruker data on PanoramaWeb: Only .d.zip files can be downloaded from PanoramaWeb (not .d directories). Use a glob pattern like *.d.zip when processing Bruker data from PanoramaWeb.

The params Section

Parameters for the params section

Req?

Parameter Name

Description

carafe_fasta_file

FASTA file used by Carafe to generate final spectral library.

*

spectra_file

Path to a single spectra file or directory to process. Supported types: Thermo RAW (.raw), mzML (.mzML), Bruker raw directory (.d), or Bruker zipped raw (.d.zip). May be a local path, S3 URI, or PanoramaWeb URL. Note: Bruker .d directories cannot be downloaded from PanoramaWeb; use .d.zip files instead. Mutually exclusive with spectra_dir.

*

spectra_dir

Path to a directory containing spectra files (local path or PanoramaWeb WebDAV URL). Supported file types: .raw, .mzML, .d, or .d.zip. Note: Bruker .d directories cannot be downloaded from PanoramaWeb; use .d.zip files instead. Mutually exclusive with spectra_file. Use with spectra_dir_glob to select which files to process.

spectra_dir_glob

Glob pattern to select files from spectra_dir. All matched files must be the same type (.raw, .mzML, .d, or .d.zip). Default: '*.raw'

output_format

The final output format of the generated spectral library. Must be one of 'diann' or 'encyclopedia'. Default: 'diann'

cli_options

Command line options to pass to Carafe. The default includes sensible settings for most general DIA searches. Do not set the -mode, -varMod, -maxVar, -ms, -db, -i, -se, -lf_type, or -device parameters, these are handled by the workflow. See https://github.com/Noble-Lab/Carafe for more details.

include_phosphorylation

Set to true to include phosphorylation (STY) as a variable modification in the Carafe spectral library. Default: false.

include_oxidized_methionine

Set to true to include oxidized methionine (M) as a variable modification in the Carafe spectral library. Default: false.

max_mod_option

The maximum number of variable modifications allowed per peptide, specified as a Carafe CLI argument. Ignored if no variable modifications are enabled. Default: '-maxVar 1'.

diann_fasta_file

The FASTA file used by DIA-NN. If not set carafe_fasta_file will be used. Default: not set.

diann_params

The command line parameters passed to DIA-NN. Default: '--unimod4 --qvalue 0.01 --cut \'K*,R*,!*P\' --reanalyse --smart-profiling'

peptide_results_file

The path to a .TSV or .parquet file output by DIA-NN containing peptide identifications. If this parameter is set, the DIA-NN search will be skipped and this file used. Default: none (run DIA-NN).

msconvert.do_demultiplex

If starting with raw files, this is the value used by msconvert for the do_demultiplex parameter. Default: true.

msconvert.do_simasspectra

If starting with raw files, this is the value used by msconvert for the do_simasspectra parameter. Default: true.

email

The email address to which a notification should be sent upon workflow completion. If no email is specified, no email will be sent. To send email, you must configure mail server settings (see below).

The profiles Section

The example configuration file includes this profiles section:

profiles {

    // "standard" is the profile used when the steps of the workflow are run
    // locally on your computer. These parameters should be changed to match
    // your system resources (that you are willing to devote to running
    // workflow jobs).
    standard {
        params.max_memory = '8.GB'
        params.max_cpus = 4
        params.max_time = '240.h'

        params.mzml_cache_directory = '/data/mass_spec/nextflow/nf-carafe-ai-ms/mzml_cache'
        params.panorama_cache_directory = '/data/mass_spec/nextflow/panorama/raw_cache'
    }
}

These parameters describe the capability of your local computer for running the steps of the workflow. Below is a description of each parameter:

Parameters for the profiles/standard section

Req?

Parameter Name

Description

params.max_memory

The maximum amount of RAM that may be used by steps of the workflow. Default: 8 gigabytes.

params.max_cpus

The number of cores that may be used by the workflow. Default: 4 cores.

params.max_time

The maximum amount of a time a step in the workflow may run before it is stopped and error generated. Default: 240 hours.

params.mzml_cache_directory

When msconvert converts a RAW file to mzML, the mzML file is cached for future use. This specifies the directory in which the cached mzML files are stored.

params.panorama_cache_directory

If the RAW files to be processed are in PanoramaWeb, the RAW files will be downloaded to and cached in this directory for future use.

The mail Section

This is a more advanced and entirely optional set of parameters. When the workflow completes, it can optionally send an email to the address specified above in the params section. For this to work, the following parameters must be changed to match the settings of your email server. You may need to contact your IT department to obtain the appropriate settings.

The example configuration file includes this mail section:

mail {
    from = 'address@host.com'
    smtp.host = 'smtp.host.com'
    smtp.port = 587
    smtp.user = 'smpt_user'
    smtp.password = 'smtp_password'
    smtp.auth = true
    smtp.starttls.enable = true
    smtp.starttls.required = false
    mail.smtp.ssl.protocols = 'TLSv1.2'
}

Below is a description of each parameter:

Parameters for the profiles/standard section

Req?

Parameter Name

Description

from

The email address from which the email should be sent.

smtp.host

The internet address (host name or ip address) of the email SMTP server.

smtp.port

The port on the host to connect to. Most likely will be 587.

smtp.user

If authentication is required, this is the username.

smtp.password

If authentication is required, this is the password.

smtp.auth

Whether or not (true or false) authentication is required.

smtp.starttls.enable

Whether or not to enable TLS support.

smtp.starttls.required

Whether or not TLS is required.

smtp.ssl.protocols

SSL protocol to use for sending SMTP messages.