QuantMS pipeline (DIANN, nf-core updates, documentation, benchmarking)
pipelines - ZS Copenhagen and online
QuantMS pipeline
The goal is to make a state of affairs. Compatibility with nf-core, preparing local for nf-core/modules (which can still be patched to be locally updated). We as the organizing team are interested in running DIA-NN through the workflow, which can be done either via QuantMS or the nf-core/dia_proteomics_analysis` subworkflow.
Additionally exploring the new tools nf-docs and nf-metro for documentation
and deployment instructions can be done during the hackathon.
Last, if people want to rather explore datasets, benchmarking of microbial datasets can be done.
DIA-NN updates
- document how to use a DIA-NN image using a private registry
- DIA-NN updates: https://github.com/bigbio/quantms/issues/663
- try to use
ext.argssee PR660
Subworkflows
Process names are not aligned, but I mapped them one to one in order.
-
DIANN subworkflow in
nf-core/modulesrepo atsubworkflows/nf-core/dia_proteomics_analysisinclude { QUANTMSUTILS_DIANNCFG } from '../../../modules/nf-core/quantmsutils/dianncfg/main' include { QUANTMSUTILS_MZMLSTATISTICS } from '../../../modules/nf-core/quantmsutils/mzmlstatistics/main' include { QUANTMSUTILS_DIANN2MZTAB } from '../../../modules/nf-core/quantmsutils/diann2mztab/main' include { DIANN as DIANN_INSILICOLIBRARYGENERATION } from '../../../modules/nf-core/diann/main' include { DIANN as DIANN_PRELIMINARYANALYSIS } from '../../../modules/nf-core/diann/main' include { DIANN as DIANN_ASSEMBLEEMPIRICALLIBRARY } from '../../../modules/nf-core/diann/main' include { DIANN as DIANN_INDIVIDUALANALYSIS } from '../../../modules/nf-core/diann/main' include { DIANN as DIANN_FINALQUANTIFICATION } from '../../../modules/nf-core/diann/main' -
DIANN under local modules in
bigbio/quantms. Process names are not aligned, but I mapped them one to one in order. So files could be compared to the ones innf-core/modulesrepo.include { GENERATE_CFG } from '../modules/local/diann/generate_cfg/main' include { MSSTATS_LFQ } from '../modules/local/msstats/msstats_lfq/main' include { CONVERT_RESULTS } from '../modules/local/diann/convert_results/main' include { INSILICO_LIBRARY_GENERATION } from '../modules/local/diann/insilico_library_generation/main' include { PRELIMINARY_ANALYSIS } from '../modules/local/diann/preliminary_analysis/main' include { ASSEMBLE_EMPIRICAL_LIBRARY } from '../modules/local/diann/assemble_empirical_library/main' include { INDIVIDUAL_ANALYSIS } from '../modules/local/diann/individual_analysis/main' include { FINAL_QUANTIFICATION } from '../modules/local/diann/final_quantification/main'
Compare and take inspiration by Jonathan Mannings way to write modules and subworkflows?
Updates to nf-core
To get familiar with nf-core templates and requirements, one could try to move
some tools for the use of others to nf-core/modules repo. Any in
subworkflows/localmodules/localmodules/bigbio
One could use and update modules which have a local version, but are maintained by others
in nf-core/modules repo. For example:
- Update ThermoRawFileParser (C#) to use
nf-core/modules/thermorawfileparserversion instead ofmodules/bigbio/thermorawfileparser
Exercise: Add a module to nf-core/modules
- if the process is based on a python or conda package, wave allows easy containerization
- nf-tests need to be added
Useful hints.
List of candidates (tbc)
- pmultiqc (Python)
- msstats (R)
nf-core lint
.nf-core.yaml file deactivates some things for linting. check what and how.
Run
# in quantms repo
nf-core pipelines lint -d .nf-docs
Add or use ewels/nf-docs
nf-metro
Add a new metro-map based on a configuration file: pinin4fjords/nf-metro
- maybe add to deployment instructions (manual updates or actions)
Comparisons using PRIDE: DIA datasets
Run experiments, compare outputs to results provided on PRIDE. Familiarize with running quantms. Should be supplemented with inhouse data, which is now all DIA on Bruker experiments.
- PXD054415 - comparing
DDA and DIA on metaproteomics dataset with known compositions
- could use a subset of samples
- SDRF
- PXD049262
- growth experiment
- photosynthetic metabolism of purple sulphur bacteria Halorhodospira halophila
- cultivated with various sulphur compounds
- SDRF
Included benchmark dataset in quantms
Mentioned as an example for DIA
Performance benchmarking
- running quantms on a single machine, single VM, on Azure batch, on HPCs with apptainers:
- runtime, costs, etc.
DIANN docker files
- update docker container to latest version of DIA-NN, test with apptainer: bigbio/quantms-containers