Workflow#

Overview#

digraph flow { rankdir=UD compound=true subgraph processes { node[shape=square] buildfastadatabase[label="Build\nFasta\nDatabase"] comet[label="Comet"] comet_and_tandem[label="Combine"] tandem diaumpire[label="Generate\n Pseudospectra"] spectrast find_swath_windows[label="Finding\nSwath\nwindows"] create_transitions[label="Create\nTransitions"] osw pyprophet_legacy[label="Pyprophet"] feature_alignment[label="Feature\nAlignment"] swath2stats } subgraph data { node[shape=circle] subgraph inputs { rank=same fasta[label="Fasta\nFiles"] dda dia } } subgraph cluster_spectra { label="Spectra" style=dashed diaumpire dda choose_spectra[shape=triangle,label="or"] } subgraph cluster_spectral_libgen { label="Creating The Spectral Library"; style=dashed; // searching_inputs[style=invis,shape=point,fixedsize=true,label="",size=0,shape=point,height=0,width=0] subgraph cluster_searching { label="Searching" style=dashed comet tandem {comet tandem}->comet_and_tandem } spectrast } // searching_inputs -> {tandem comet}[ltail=cluster_searching] subgraph cluster_diapart { // edge[constraint=false] style=dashed label="DIA Analysis" create_transitions osw find_swath_windows pyprophet_legacy feature_alignment swath2stats } // choose_spectra->searching_inputs[lhead=cluster_searching ,ltail=cluster_spectra] // buildfastadatabase->searching_inputs[lhead=cluster_searching] // these are the per-sample steps { edge[style="dashed,bold"] { choose_spectra } -> {comet tandem} { dia -> diaumpire } diaumpire->choose_spectra dda->choose_spectra dia ->osw } // these are the all-samples-combined-steps { spectrast->create_transitions dia->find_swath_windows comet_and_tandem -> spectrast buildfastadatabase -> spectrast fasta->buildfastadatabase osw -> pyprophet_legacy -> feature_alignment -> swath2stats } // these are steps that are not related to samples { { buildfastadatabase -> {comet tandem} } find_swath_windows->{osw,create_transitions} create_transitions -> osw } }

Software dependencies#

  • NextFlow 25.10.4 (>= 22.10)

  • Java 17 (>= 17, <= 25)

Experimental spectra in proprietary formats need to be converted to .mzML or .mzXML with tools like ProteoWizard msconvert 3.0.21354.

Software components#

Following software packages are used internally

  • Biopython SeqIO – for reading and writing of .fasta files.

  • OpenMS 3.4.1

  • ProteoWizard msconvert 3.0.22088 – performs conversion between .mgf, .mzML and .mzXML.

  • Trans-Proteomic Pipeline (TPP) 6.1.9

    • Comet 2022.01.0

    • X! Tandem 2017.02.01.4

    • InteractParser – combines multiple files produced by Comet and X! Tandem

    • InterProphetParser – combines results from Comet and X! Tandem

    • SpectraST

    • Comet and X! Tandem predict spectra from a database of peptide sequences and try to match experimental MS/MS spectra to them. Search results (PSMs) from both tools are combined and passed to SpectraST that performs search on experimental spectra database again, validating the search results and building a spectral database.

  • DIA-Umpire SE 2.2.8 – performs deconvolution of raw spectra into pseudospectra.

  • Mayu – determines protein and peptide identification false discovery rates.

  • specrast2tsv.py from msproteomicstools/msproteomicstools – converts spectral library into .tsv accepted by OpenSWATH Workflow

  • OpenSWATH Workflow

  • SWATH2stats 1.31.0 – transforms extracted SWATH/DIA data into a format directly-usable by statistics packages.

  • PyProphet 2.2.5 and 0.24.1 – performs statistical validation of the results, version selected based on pyprophet_use_legacy switch.

  • Python 3.9.9

  • Java OpenJDK 11.0.15