Configuration#
The main configuration file needs to be created by the user. An example is available below. Alternatively all the switches can be passed from the command line.
Main configuration file#
Template#
Example configuration that can be saved as gladiator-config.nf
params {
fragment_mass_tolerance = 0.02 // fragment mass tolerance in Dalton, 0.02 is a sensible default
precursor_mass_tolerance = 10 // in ppm (parts per million)
protFDR = 0.01 // passed to mayu as cutoffrate for finding the peptide probability
max_missed_cleavages = 1
pyprophet_subsample_ratio = null // when set to null, it resolves to subsample ratio of 1 / {number of samples}
// DIA-Umpire, Comet and X! Tandem configuration can be customized by making copies
// of 'config/diaumpire.params', 'config/comet.params' or 'config/xtandem.xml'.
//
// The custom files are not used until their location is specified, for example:
//
// diaumpireconfig = "./diaumpire.params"
// comet_template = "./comet.params"
// xtandem_template = "./xtandem.xml"
}
The configuration file is passed to nextflow explicitly
nextflow -c gladiator-config.nf ...
Options#
fragment_mass_tolerancepassed to Comet as
fragment_bin_tol
passed to Xtandem asspectrum, fragment monoisotopic mass error
default:0.02precursor_mass_tolerancepassed to Comet as
peptide_mass_tolerance
passed to Xtandem asspectrum, parent monoisotopic mass error {plus,minus}
default:10protFDRpassed to Mayu as
cutoffrate
default:0.01irt_traml_file(optional)a URL to a
.TraMLfile containing retention times (ftp://...,https://...)use_irtenables use of
irt_traml_file
default:falsemax_missed_cleavagespassed to Comet as
allowed_missed_cleavage
passed to Xtandem asscoring, maximum missed cleavage sites
default:1libgen_methodselects software package used for generation of peptide library
enum:diaumpire,dda,custom
default:diaumpirepyprophet_use_legacyenables older PyProphet 0.24.1, otherwise PyProphet 2.2.5 is used
default:falsepyprophet_fixed_seedpassed to
pyprophet score, for more reproductible results
default:falsepyprophet_subsample_ratiosubsample ratio, a number or
null
nullis a special value that translates to 1 / number of samples
default:nullfastafilespath to the input files containing protein sequences
default:fasta/*.fastasdrfpath to a
.sdrfmetadata fileFollowing configuration parameters of glaDIAtor-nf might be overriden by what is found in
.sdrffragment_mass_tolerancebased oncomment[fragment mass tolerance]
Fragment mass tolerance needs to be given in Da and consistent across samples. Missing unit is interpreted as Da.precursor_mass_tolerancebased oncomment[precursor mass tolerance]
Precursor mass tolerance needs to be given in ppm and consistent across samples. Missing unit is interpreted as ppm.diafilesbased oncomment[file uri]
DIA files collected from.sdrfare added to the list ofdiafiles.
default:
null
Fine-tuning#
Make a copy of config/diaumpire.params, config/comet.params or config/xtandem.xml to customize the configuration of related components.
It is necessary to also pass their locations when calling nextflow with --diaumpireconfig="./diaumpire.params", --comet_template="./comet.params" and --xtandem_template="./xtandem.params" parameters.
DIA-Umpire#
Our default configuration file
config/diaumpire.params
Thread = @PROCESS_THREAD_COUNT@
#Precursor-fragments grouping parameters
RPmax = 25
RFmax = 300
CorrThreshold = 0.2
DeltaApex = 0.6
RTOverlap = 0.3
#Fragment intensity adjustments
# change BoostComplementaryIon if later using database search results to build libraries for Skyline/OpenSWATH
## [2023-05-30 Tue]
## what did the original gladiator author mean by this?
## he forgot.
## in dia-umpire repo example BoostComplementaryIon is True.
AdjustFragIntensity = true
BoostComplementaryIon = true
#Export detected MS1 features (output feature file can be loaded and mapped to RAW data in BatMass)
ExportPrecursorPeak = false
#Signal extraction: mass accuracy and resolution
# resolution parameter matters only for data generated in profile mode
SE.MS1PPM = 15
SE.MS2PPM = 25
SE.Resolution = 60000
#Signal extraction: signal to noise filter
SE.SN = 1.1
SE.MS2SN = 1.1
#Signal extraction: minimum signal intensity filter
# for Thermo data, filtering is usually not necessary. Set SE.EstimateBG to false and SE.MinMSIntensity and SE.MinMSMSIntensity to a low value, e.g. 1
# for older Q Exactive data, or when too many MS1 features are extracted, set SE.EstimateBG to yes (or apply SE.MinMSIntensity and SE.MinMSMSIntensity values based on BatMass visualization)
SE.EstimateBG = true
SE.MinMSIntensity = 1
SE.MinMSMSIntensity = 1
#Signal extraction: peak curve detection and isotope grouping
# for older Q Exactive data, or when too many MS1 features are extracted, set SE.NoMissedScan to 1
SE.NoMissedScan = 2
SE.MaxCurveRTRange = 2
SE.RemoveGroupedPeaks = true
SE.RemoveGroupedPeaksRTOverlap = 0.3
SE.RemoveGroupedPeaksCorr = 0.3
SE.MinNoPeakCluster = 2
SE.MaxNoPeakCluster = 4
#Signal extraction: filtering of MS1 features
# if interested in modified peptides, increase MassDefectOffset parameter, or set SE.MassDefectFilter to false
SE.IsoPattern = 0.3
SE.MassDefectFilter = true
SE.MassDefectOffset = 0.1
#Signal extraction: other
SE.StartCharge = 1
SE.EndCharge = 5
SE.MS2StartCharge = 2
SE.MS2EndCharge = 5
SE.MinFrag=10
SE.StartRT = 0
SE.EndRT = 9999
SE.MinMZ = 200
SE.MinPrecursorMass = 600
SE.MaxPrecursorMass = 5000
#Isolation window setting
#The current version supports the following window type: SWATH (fixed window size), V_SWATH (variable SWATH window), MSX, MSE, pSMART
WindowType=SWATH
#Fix window size (For SWATH)
# for Thermo data, this will be determined from raw data automatically
#WindowSize=15
#Variable SWATH window setting (start m/z, end m/z, separated by Tab)
# for Thermo data, this will be determined from raw data automatically
#==window setting begin
#==window setting end
Comparing to an example found at Nesvilab/DIA-Umpire, following modifications are present
- Thread = 6
+ Thread = @PROCESS_THREAD_COUNT@
ExportPrecursorPeak = false
-ExportFragmentPeak = false
-SE.MS1PPM = 30
-SE.MS2PPM = 40
-SE.SN = 2
-SE.MS2SN = 2
-SE.MinMSIntensity = 10
-SE.MinMSMSIntensity = 10
-SE.MaxCurveRTRange = 1
-SE.Resolution = 17000
-SE.StartCharge = 2
-SE.EndCharge = 4
+SE.MS1PPM = 15
+SE.MS2PPM = 25
+SE.SN = 1.1
+SE.MS2SN = 1.1
+SE.MinMSIntensity = 1
+SE.MinMSMSIntensity = 1
+SE.MaxCurveRTRange = 2
+SE.Resolution = 60000
+SE.StartCharge = 1
+SE.EndCharge = 5
-SE.MS2EndCharge = 4
-SE.NoMissedScan = 1
+SE.MS2EndCharge = 5
+SE.NoMissedScan = 2
+SE.RemoveGroupedPeaks = true
+SE.RemoveGroupedPeaksRTOverlap = 0.3
+SE.RemoveGroupedPeaksCorr = 0.3
-SE.MinPrecursorMass = 700
+SE.MinPrecursorMass = 600
-WindowSize=25
+#WindowSize=15
pending Above changes require further explanation and justification.
pending We could provide some practical hints to the users.
pending The configuration file could use cleanup of comments and reordering of entries to match example from DIA-Umpire.
Comet#
glaDIAtor-nf is using Comet 2022.01
Our default configuration file
config/comet.params
# comet_version 2022.01 rev. 0
# Comet MS/MS search engine parameters file.
# Everything following the '#' symbol is treated as a comment.
database_name = @DDA_DB_FILE@
decoy_search = 0 # 0=no (default), 1=concatenated search, 2=separate search
peff_format = 0 # 0=no (normal fasta, default), 1=PEFF PSI-MOD, 2=PEFF Unimod
peff_obo = # path to PSI Mod or Unimod OBO file
num_threads = @PROCESS_THREAD_COUNT@ # 0=poll CPU to set num threads; else specify num threads directly (max 128)
#
# masses
#
peptide_mass_tolerance = @PRECURSOR_MASS_TOLERANCE@
peptide_mass_units = 2 # 0=amu, 1=mmu, 2=ppm
mass_type_parent = 1 # 0=average masses, 1=monoisotopic masses
mass_type_fragment = 1 # 0=average masses, 1=monoisotopic masses
precursor_tolerance_type = 1 # 0=MH+ (default), 1=precursor m/z; only valid for amu/mmu tolerances
isotope_error = 3 # 0=off, 1=0/1 (C13 error), 2=0/1/2, 3=0/1/2/3, 4=-8/-4/0/4/8 (for +4/+8 labeling)
#
# search enzyme
#
search_enzyme_number = 1 # choose from list at end of this params file
search_enzyme2_number = 0 # second enzyme; set to 0 if no second enzyme
num_enzyme_termini = 2 # 1 (semi-digested), 2 (fully digested, default), 8 C-term unspecific , 9 N-term unspecific
allowed_missed_cleavage = @MAX_MISSED_CLEAVAGES@ # maximum value is 5; for enzyme search
#
# Up to 9 variable modifications are supported
# format: <mass> <residues> <0=variable/else binary> <max_mods_per_peptide> <term_distance> <n/c-term> <required> <neutral_loss>
# e.g. 79.966331 STY 0 3 -1 0 0 97.976896
#
variable_mod01 = 15.9949 M 0 3 -1 0 0 0.0
variable_mod02 = 0.0 X 0 3 -1 0 0 0.0
variable_mod03 = 0.0 X 0 3 -1 0 0 0.0
variable_mod04 = 0.0 X 0 3 -1 0 0 0.0
variable_mod05 = 0.0 X 0 3 -1 0 0 0.0
variable_mod06 = 0.0 X 0 3 -1 0 0 0.0
variable_mod07 = 0.0 X 0 3 -1 0 0 0.0
variable_mod08 = 0.0 X 0 3 -1 0 0 0.0
variable_mod09 = 0.0 X 0 3 -1 0 0 0.0
max_variable_mods_in_peptide = 5
require_variable_mod = 0
#
# fragment ions
#
# ion trap ms/ms: 1.0005 tolerance, 0.4 offset (mono masses), theoretical_fragment_ions = 1
# high res ms/ms: 0.02 tolerance, 0.0 offset (mono masses), theoretical_fragment_ions = 0, spectrum_batch_size = 15000
#
fragment_bin_tol = @FRAGMENT_MASS_TOLERANCE@ # binning to use on fragment ions
fragment_bin_offset = 0.0 # offset position to start the binning (0.0 to 1.0)
theoretical_fragment_ions = 1 # 0=use flanking peaks, 1=M peak only
use_A_ions = 0
use_B_ions = 1
use_C_ions = 0
use_X_ions = 0
use_Y_ions = 1
use_Z_ions = 0
use_Z1_ions = 0
use_NL_ions = 0 # 0=no, 1=yes to consider NH3/H2O neutral loss peaks
#
# output
#
output_sqtfile = 0 # 0=no, 1=yes write sqt file
output_txtfile = 0 # 0=no, 1=yes write tab-delimited txt file
output_pepxmlfile = 1 # 0=no, 1=yes write pepXML file
output_mzidentmlfile = 0 # 0=no, 1=yes write mzIdentML file
output_percolatorfile = 1 # 0=no, 1=yes write Percolator pin file
print_expect_score = 1 # 0=no, 1=yes to replace Sp with expect in out & sqt
num_output_lines = 5 # num peptide results to show
sample_enzyme_number = 1 # Sample enzyme which is possibly different than the one applied to the search.
# Used to calculate NTT & NMC in pepXML output (default=1 for trypsin).
#
# mzXML parameters
#
scan_range = 0 0 # start and end scan range to search; either entry can be set independently
precursor_charge = 0 0 # precursor charge range to analyze; does not override any existing charge; 0 as 1st entry ignores parameter
override_charge = 0 # 0=no, 1=override precursor charge states, 2=ignore precursor charges outside precursor_charge range, 3=see online
ms_level = 2 # MS level to analyze, valid are levels 2 (default) or 3
activation_method = HCD # activation method; used if activation method set; allowed ALL, CID, ECD, ETD, ETD+SA, PQD, HCD, IRMPD, SID
#
# misc parameters
#
digest_mass_range = 600.0 5000.0 # MH+ peptide mass range to analyze
peptide_length_range = 5 63 # minimum and maximum peptide length to analyze (default 1 63; max length 63)
num_results = 100 # number of search hits to store internally
max_duplicate_proteins = 20 # maximum number of additional duplicate protein names to report for each peptide ID; -1 reports all duplicates
max_fragment_charge = 3 # set maximum fragment charge state to analyze (allowed max 5)
max_precursor_charge = 6 # set maximum precursor charge state to analyze (allowed max 9)
nucleotide_reading_frame = 0 # 0=proteinDB, 1-6, 7=forward three, 8=reverse three, 9=all six
clip_nterm_methionine = 0 # 0=leave protein sequences as-is; 1=also consider sequence w/o N-term methionine
spectrum_batch_size = 15000 # max. # of spectra to search at a time; 0 to search the entire scan range in one loop
decoy_prefix = DECOY_ # decoy entries are denoted by this string which is pre-pended to each protein accession
equal_I_and_L = 1 # 0=treat I and L as different; 1=treat I and L as same
output_suffix = # add a suffix to output base names i.e. suffix "-C" generates base-C.pep.xml from base.mzXML input
mass_offsets = # one or more mass offsets to search (values substracted from deconvoluted precursor mass)
precursor_NL_ions = # one or more precursor neutral loss masses, will be added to xcorr analysis
#
# spectral processing
#
minimum_peaks = 10 # required minimum number of peaks in spectrum to search (default 10)
minimum_intensity = 0 # minimum intensity value to read in
remove_precursor_peak = 0 # 0=no, 1=yes, 2=all charge reduced precursor peaks (for ETD), 3=phosphate neutral loss peaks
remove_precursor_tolerance = 1.5 # +- Da tolerance for precursor removal
clear_mz_range = 0.0 0.0 # for iTRAQ/TMT type data; will clear out all peaks in the specified m/z range
#
# additional modifications
#
add_Cterm_peptide = 0.0
add_Nterm_peptide = 0.0
add_Cterm_protein = 0.0
add_Nterm_protein = 0.0
add_G_glycine = 0.0000 # added to G - avg. 57.0513, mono. 57.02146
add_A_alanine = 0.0000 # added to A - avg. 71.0779, mono. 71.03711
add_S_serine = 0.0000 # added to S - avg. 87.0773, mono. 87.03203
add_P_proline = 0.0000 # added to P - avg. 97.1152, mono. 97.05276
add_V_valine = 0.0000 # added to V - avg. 99.1311, mono. 99.06841
add_T_threonine = 0.0000 # added to T - avg. 101.1038, mono. 101.04768
add_C_cysteine = 57.021464 # added to C - avg. 103.1429, mono. 103.00918
add_L_leucine = 0.0000 # added to L - avg. 113.1576, mono. 113.08406
add_I_isoleucine = 0.0000 # added to I - avg. 113.1576, mono. 113.08406
add_N_asparagine = 0.0000 # added to N - avg. 114.1026, mono. 114.04293
add_D_aspartic_acid = 0.0000 # added to D - avg. 115.0874, mono. 115.02694
add_Q_glutamine = 0.0000 # added to Q - avg. 128.1292, mono. 128.05858
add_K_lysine = 0.0000 # added to K - avg. 128.1723, mono. 128.09496
add_E_glutamic_acid = 0.0000 # added to E - avg. 129.1140, mono. 129.04259
add_M_methionine = 0.0000 # added to M - avg. 131.1961, mono. 131.04048
add_H_histidine = 0.0000 # added to H - avg. 137.1393, mono. 137.05891
add_F_phenylalanine = 0.0000 # added to F - avg. 147.1739, mono. 147.06841
add_U_selenocysteine = 0.0000 # added to U - avg. 150.0379, mono. 150.95363
add_R_arginine = 0.0000 # added to R - avg. 156.1857, mono. 156.10111
add_Y_tyrosine = 0.0000 # added to Y - avg. 163.0633, mono. 163.06333
add_W_tryptophan = 0.0000 # added to W - avg. 186.0793, mono. 186.07931
add_O_pyrrolysine = 0.0000 # added to O - avg. 237.2982, mono 237.14773
add_B_user_amino_acid = 0.0000 # added to B - avg. 0.0000, mono. 0.00000
add_J_user_amino_acid = 0.0000 # added to J - avg. 0.0000, mono. 0.00000
add_X_user_amino_acid = 0.0000 # added to X - avg. 0.0000, mono. 0.00000
add_Z_user_amino_acid = 0.0000 # added to Z - avg. 0.0000, mono. 0.00000
#
# COMET_ENZYME_INFO _must_ be at the end of this parameters file
#
[COMET_ENZYME_INFO]
0. Cut_everywhere 0 - -
1. Trypsin 1 KR P
2. Trypsin/P 1 KR -
3. Lys_C 1 K P
4. Lys_N 0 K -
5. Arg_C 1 R P
6. Asp_N 0 D -
7. CNBr 1 M -
8. Glu_C 1 DE P
9. PepsinA 1 FL P
10. Chymotrypsin 1 FWYL P
11. No_cut 1 @ @
Three example configuration files can be found in Comet 2022.01 documentation
comet.params.low-low for low res MS1 and low res MS2 e.g. ion trap
comet.params.high-low high res MS1 and low res MS2 e.g. Velos-Orbitrap
comet.params.high-high high res MS1 and high res MS2 e.g. Q Exactive or Q-Tof
The difference is in following parameters
|
|
|
|
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Our configuration file is based on high-high variant, with following changes
- database_name = /some/path/db.fasta
+ database_name = @DDA_DB_FILE@
# 0=poll CPU to set num threads; else specify num threads directly (max 128)
- num_threads = 0
+ num_threads = @PROCESS_THREAD_COUNT@
- peptide_mass_tolerance = 20.00
+ peptide_mass_tolerance = @PRECURSOR_MASS_TOLERANCE@
# 0=amu, 1=mmu, 2=ppm
peptide_mass_units = 2
# maximum value is 5; for enzyme search
- allowed_missed_cleavage = 2
+ allowed_missed_cleavage = @MAX_MISSED_CLEAVAGES@
# binning to use on fragment ions
- fragment_bin_tol = 1.0005
+ fragment_bin_tol = @FRAGMENT_MASS_TOLERANCE@
# 0=no, 1=yes write Percolator pin file
- output_percolatorfile = 0
+ output_percolatorfile = 1
# activation method; used if activation method set; allowed ALL, CID, ECD, ETD, ETD+SA, PQD, HCD, IRMPD, SID
- activation_method = ALL
+ activation_method = HCD
@PRECURSOR_MASS_TOLERANCE@, @MAX_MISSED_CLEAVAGES@, @FRAGMENT_MASS_TOLERANCE@ come from the main configuration file
...
fragment_mass_tolerance=0.02
precursor_mass_tolerance=10
max_missed_cleavages=1
...
Based on the example configurations of Comet, one might wish to manually adjust following parameters in comet.params
isotope_error = 0for low resolution MS1fragment_bin_offset = 0.4andtheoretical_fragment_ions = 1for low resolution MS2
All parameters offered by Comet 2022.01 are described at uwpr.github.io/Comet.
X! Tandem#
glaDIAtor-nf is using X! Tandem 2017.02.01.4
Our default configuration file
config/xtandem.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="tandem-input-style.xsl"?>
<bioml>
<note>list path parameters</note>
<note>spectrum parameters</note>
<note type="input" label="spectrum, fragment monoisotopic mass error">@FRAGMENT_MASS_TOLERANCE@</note>
<note type="input" label="spectrum, parent monoisotopic mass error plus">@PRECURSOR_MASS_TOLERANCE@</note>
<note type="input" label="spectrum, parent monoisotopic mass error minus">@PRECURSOR_MASS_TOLERANCE@</note>
<note type="input" label="spectrum, parent monoisotopic mass isotope error">yes</note>
<note type="input" label="spectrum, fragment monoisotopic mass error units">Daltons</note>
<note>The value for this parameter may be 'Daltons' or 'ppm': all other values are ignored</note>
<note type="input" label="spectrum, parent monoisotopic mass error units">ppm</note>
<note>The value for this parameter may be 'Daltons' or 'ppm': all other values are ignored</note>
<note type="input" label="spectrum, fragment mass type">monoisotopic</note>
<note>values are monoisotopic|average </note>
<note>spectrum conditioning parameters</note>
<note type="input" label="spectrum, dynamic range">100.0</note>
<note>The peaks read in are normalized so that the most intense peak
is set to the dynamic range value. All peaks with values of less that
1, using this normalization, are not used. This normalization has the
overall effect of setting a threshold value for peak intensities.</note>
<note type="input" label="spectrum, total peaks">50</note>
<note>If this value is 0, it is ignored. If it is greater than zero (lets say 50),
then the number of peaks in the spectrum with be limited to the 50 most intense
peaks in the spectrum. X! tandem does not do any peak finding: it only
limits the peaks used by this parameter, and the dynamic range parameter.</note>
<note type="input" label="spectrum, maximum parent charge">4</note>
<note type="input" label="spectrum, use noise suppression">yes</note>
<note type="input" label="spectrum, minimum parent m+h">500.0</note>
<note type="input" label="spectrum, minimum fragment mz">150.0</note>
<note type="input" label="spectrum, minimum peaks">15</note>
<note type="input" label="spectrum, threads">@PROCESS_THREAD_COUNT@</note>
<note type="input" label="spectrum, sequence batch size">1000</note>
<note>residue modification parameters</note>
<note type="input" label="residue, modification mass">57.022@C</note>
<note>The format of this parameter is m@X, where m is the modfication
mass in Daltons and X is the appropriate residue to modify. Lists of
modifications are separated by commas. For example, to modify M and C
with the addition of 16.0 Daltons, the parameter line would be
+16.0@M,+16.0@C
Positive and negative values are allowed.
</note>
<note type="input" label="residue, potential modification mass">16@M</note>
<note>The format of this parameter is the same as the format
for residue, modification mass (see above).</note>
<note type="input" label="residue, potential modification motif"></note>
<note>The format of this parameter is similar to residue, modification mass,
with the addition of a modified PROSITE notation sequence motif specification.
For example, a value of 80@[ST!]PX[KR] indicates a modification
of either S or T when followed by P, and residue and the a K or an R.
A value of 204@N!{P}[ST]{P} indicates a modification of N by 204, if it
is NOT followed by a P, then either an S or a T, NOT followed by a P.
Positive and negative values are allowed.
</note>
<note>protein parameters</note>
<note type="input" label="protein, taxon">other mammals</note>
<note>This value is interpreted using the information in taxonomy.xml.</note>
<note type="input" label="protein, cleavage site">[RK]|{P}</note>
<note>this setting corresponds to the enzyme trypsin. The first characters
in brackets represent residues N-terminal to the bond - the '|' pipe -
and the second set of characters represent residues C-terminal to the
bond. The characters must be in square brackets (denoting that only
these residues are allowed for a cleavage) or french brackets (denoting
that these residues cannot be in that position). Use UPPERCASE characters.
To denote cleavage at any residue, use [X]|[X] and reset the
scoring, maximum missed cleavage site parameter (see below) to something like 50.
</note>
<note type="input" label="protein, modified residue mass file"></note>
<note type="input" label="protein, cleavage C-terminal mass change">+17.002735</note>
<note type="input" label="protein, cleavage N-terminal mass change">+1.007825</note>
<note type="input" label="protein, N-terminal residue modification mass">0.0</note>
<note type="input" label="protein, C-terminal residue modification mass">0.0</note>
<note type="input" label="protein, homolog management">no</note>
<note>if yes, an upper limit is set on the number of homologues kept for a particular spectrum</note>
<note type="input" label="protein, quick acetyl">no</note>
<note type="input" label="protein, quick pyrolidone">no</note>
<note>model refinement parameters</note>
<note type="input" label="refine">yes</note>
<note type="input" label="refine, modification mass"></note>
<note type="input" label="refine, sequence path"></note>
<note type="input" label="refine, tic percent">20</note>
<note type="input" label="refine, spectrum synthesis">yes</note>
<note type="input" label="refine, maximum valid expectation value">0.1</note>
<note type="input" label="refine, potential N-terminus modifications">+42.010565@[</note>
<note type="input" label="refine, potential C-terminus modifications"></note>
<note type="input" label="refine, unanticipated cleavage">yes</note>
<note type="input" label="refine, potential modification mass"></note>
<note type="input" label="refine, point mutations">no</note>
<note type="input" label="refine, use potential modifications for full refinement">no</note>
<note type="input" label="refine, point mutations">no</note>
<note type="input" label="refine, potential modification motif"></note>
<note>The format of this parameter is similar to residue, modification mass,
with the addition of a modified PROSITE notation sequence motif specification.
For example, a value of 80@[ST!]PX[KR] indicates a modification
of either S or T when followed by P, and residue and the a K or an R.
A value of 204@N!{P}[ST]{P} indicates a modification of N by 204, if it
is NOT followed by a P, then either an S or a T, NOT followed by a P.
Positive and negative values are allowed.
</note>
<note>scoring parameters</note>
<note type="input" label="scoring, minimum ion count">4</note>
<note type="input" label="scoring, maximum missed cleavage sites">@MAX_MISSED_CLEAVAGES@</note>
<note type="input" label="scoring, x ions">no</note>
<note type="input" label="scoring, y ions">yes</note>
<note type="input" label="scoring, z ions">no</note>
<note type="input" label="scoring, a ions">no</note>
<note type="input" label="scoring, b ions">yes</note>
<note type="input" label="scoring, c ions">no</note>
<note type="input" label="scoring, cyclic permutation">no</note>
<note>if yes, cyclic peptide sequence permutation is used to pad the scoring histograms</note>
<note type="input" label="scoring, include reverse">no</note>
<note>if yes, then reversed sequences are searched at the same time as forward sequences</note>
<note type="input" label="scoring, cyclic permutation">no</note>
<note type="input" label="scoring, include reverse">no</note>
<note>output parameters</note>
<note type="input" label="output, log path"></note>
<note type="input" label="output, message">testing 1 2 3</note>
<note type="input" label="output, one sequence copy">no</note>
<note type="input" label="output, sequence path"></note>
<note type="input" label="output, path">output.xml</note>
<note type="input" label="output, sort results by">protein</note>
<note>values = protein|spectrum (spectrum is the default)</note>
<note type="input" label="output, path hashing">no</note>
<note>values = yes|no</note>
<note type="input" label="output, xsl path">tandem-style.xsl</note>
<note type="input" label="output, parameters">yes</note>
<note>values = yes|no</note>
<note type="input" label="output, performance">yes</note>
<note>values = yes|no</note>
<note type="input" label="output, spectra">yes</note>
<note>values = yes|no</note>
<note type="input" label="output, histograms">yes</note>
<note>values = yes|no</note>
<note type="input" label="output, proteins">yes</note>
<note>values = yes|no</note>
<note type="input" label="output, sequences">yes</note>
<note>values = yes|no</note>
<note type="input" label="output, one sequence copy">no</note>
<note>values = yes|no, set to yes to produce only one copy of each protein sequence in the output xml</note>
<note type="input" label="output, results">valid</note>
<note>values = all|valid|stochastic</note>
<note type="input" label="output, maximum valid expectation value">0.1</note>
<note>value is used in the valid|stochastic setting of output, results</note>
<note type="input" label="output, histogram column width">30</note>
<note>values any integer greater than 0. Setting this to '1' makes cutting and pasting histograms
into spread sheet programs easier.</note>
<note type="description">ADDITIONAL EXPLANATIONS</note>
<note type="description">Each one of the parameters for X! tandem is entered as a labeled note
node. In the current version of X!, keep those note nodes
on a single line.
</note>
<note type="description">The presence of the type 'input' is necessary if a note is to be considered
an input parameter.
</note>
<note type="description">Any of the parameters that are paths to files may require alteration for a
particular installation. Full path names usually cause the least trouble,
but there is no reason not to use relative path names, if that is the
most convenient.
</note>
<note type="description">Any parameter values set in the 'list path, default parameters' file are
reset by entries in the normal input file, if they are present. Otherwise,
the default set is used.
</note>
<note type="description">The 'list path, taxonomy information' file must exist.
</note>
<note type="description">The directory containing the 'output, path' file must exist: it will not be created.
</note>
<note type="description">The 'output, xsl path' is optional: it is only of use if a good XSLT style sheet exists.
</note>
</bioml>
It is based on an example default_input.xml found in ftp://ftp.thegpm.org/projects/tandem/source/tandem-linux-17-02-01-4.zip.
Following changes were applied
- <note type="input" label="list path, default parameters">default_input.xml</note>
- <note>This value is ignored when it is present in the default parameter
- list path.</note>
- <note type="input" label="list path, taxonomy information">taxonomy.xml</note>
- <note type="input" label="spectrum, fragment monoisotopic mass error">0.4</note>
- <note type="input" label="spectrum, parent monoisotopic mass error plus">100</note>
- <note type="input" label="spectrum, parent monoisotopic mass error minus">100</note>
+ <note type="input" label="spectrum, fragment monoisotopic mass error">@FRAGMENT_MASS_TOLERANCE@</note>
+ <note type="input" label="spectrum, parent monoisotopic mass error plus">@PRECURSOR_MASS_TOLERANCE@</note>
+ <note type="input" label="spectrum, parent monoisotopic mass error minus">@PRECURSOR_MASS_TOLERANCE@</note>
- <note type="input" label="spectrum, threads">1</note>
+ <note type="input" label="spectrum, threads">@PROCESS_THREAD_COUNT@</note>
- <note type="input" label="residue, potential modification mass"></note>
+ <note type="input" label="residue, potential modification mass">16@M</note>
+ <note type="input" label="protein, quick acetyl">no</note>
+ <note type="input" label="protein, quick pyrolidone">no</note>
- <note type="input" label="refine, potential N-terminus modifications"></note>
+ <note type="input" label="refine, potential N-terminus modifications">+42.010565@[</note>
- <note type="input" label="scoring, maximum missed cleavage sites">1</note>
+ <note type="input" label="scoring, maximum missed cleavage sites">@MAX_MISSED_CLEAVAGES@</note>
- <note type="input" label="output, path hashing">yes</note>
+ <note type="input" label="output, path hashing">no</note>
@FRAGMENT_MASS_TOLERANCE@, @PRECURSOR_MASS_TOLERANCE@ and @MAX_MISSED_CLEAVAGES@ come from the main configuration file
...
fragment_mass_tolerance=0.02
precursor_mass_tolerance=10
max_missed_cleavages=1
...
All configuration options of X! Tandem are described at https://www.thegpm.org/TANDEM/api/index.html.