Subset preparation#
Scripts collected here were used to generate small mzML files, to be used to perform e2e test with glaDIAtor-nf.
Procedure:
Carry out the Tutorial analysis with raw files
210820_Grad090_LFQ_A_01.rawand210820_Grad090_LFQ_B_01.raw.aux script: ConvertAndPeakPick.R
Parse and filter peptide quant matrix output
DIA-peptide-matrix.tsvpeptide intensity \(\geqslant\) 3rd quantile (\(log2(exp) \geqslant 22\))
\(log_2 FC\) between A and B sample is \(\leqslant\) 1rd quantile (\(\leqslant 0.038208\))
only consider
HUMAN_proteinsonly consider proteotypic hits ^1/… …
ignore peptides with post-translational modifications
script: Collect_peptide_examples.R
output: list of peptides to locate in
mzML
Carry out DIA → peptide search with MSFragger (v4.3)
input files:
210820_Grad090_LFQ_A_01.PickPeak.mzML210820_Grad090_LFQ_B_01.PickPeak.mzMLdecoyed
210820_Human_Ref_Swiss_Can.fastadia.params2MSFragger config file.
aux script: Create_decoyed_human.R – create reference protein sequences with reversed
DECOY_recordsscript: Drive_MSFragger_Tutorial.R – perform
mzMLvs protein search with MSFraggeroutput:
.PickPeak_rank1.pepXMLsearch results files
mzMLsubsettingfrom
pepXMLfiles we take the scanIDs for spectra related to selected peptides. (MS2 scan pick) peptide → scanID assignment can be one → many: take the single best hit, ranked bynum_matched_ions/tot_num_ionsratios. (the higher the better)complete MS2 scans into complete MS1-MS2-MS2-MS2…MS2 duty cyles
mark selected \(\pm 1\) cycles (\(3\) in total)
make selection non-redundant (if the same cycle was marked multiple times, include it only once)
using
mzR, exctract marked cycles[!WARNING]
mzRruinsmzMLformat, you can fix that later manually with OpenMSFileConvertermzML (bad) → mzXML → mzML (good) conversionscript: SubSetMZML.R
Test glaDIAtor-nf with smaller-scale
mzMLfilesresults: test run completes in \(\approx 8\) minutes.