How do I run proteoQ

The framework of proteoQ can be divided conceptually into the units of data normalization and informatic analysis.

Data normalization

The modules in data normalization need to be run in order, starting with the upload of metadata by load_expts and ending with the compilation of protein table by standPrn.

preprocessingmetadatanormalizationhypothesis testsload_exptsnormPSMpurgePSMPSM2PepmergePepstandPeppurgePepPep2PrnstandPrnpepSigprnSigexpt_smry.xlsxfrac_smry.xlsxfastaentrezfilter_ = ...width, height, ...filter_ = ...filter_ = ...slice_ = ...width, height, ...filter_ = ...slice_ = ...grp_1 = ~ Term[...]grp_2 = ~ Term_2[...]filter_ = ...grp_1 = ~ Term[...]grp_2 = ~ Term_2[...]filter_ = ...Sample_IDTMT_SetLCMS_InjectionTMT_ChannelReference...RAW_FilePSM_File...extract_psm_rawsextract_rawsUniProtRefSeqUni2EntrezRef2Entrez

In stead of putting all components into one large unit, utilities in data normalization were divided into smaller pieces for flexible workflows and data mining. One of the features is that we may tailor various filter_ conditions in individual steps, to remove inappropriate entries. Another example is that we may execute repetitively standPep and standPrn with different slice_ conditions to achieve the mixed-bed normalization of data.1

The modules of purgePSM and purgePep may be viewed as optional in that they do not add columns to the respective input datum.

After peptide and protein normalization, we may apply pepSig and prnSig to calculate significance p-values. Methods based on long-run frequency interpretation of probability are used in these two modules.

The modules of pepSig and prnSig, which calculate significance p-values, are arguably part of the normalization steps. One benefit in having them early in the procedure is to allow data subsetting against p-values in downstream analyses. For instance, we may subset data by the lowest p-values for heat map visualization. In addition, prnSig is required for modules such as prnGSPA that uses protein p-values. For the quasi-essentiality, they were assigned to the normalization unit.

NB: every time sample(s) were voided from expt_smry.xlsx, all the units in data normalization need to be re-executed to update the result files of Peptide.txt, Protein.txt etc. A cleaner alternative is to carry out the analysis from scratch as a brand new task.

Informatic analysis

The proteoQ modules in informatics can be executed more independently without specific orders. For instance, we may run from GSEA to Heat map or vice versa. It is also a valid idea to run alongside utilities under the same module. In the example of protein Trend analysis, it comprises the analysis part of anal_prnTrend and the visualization part of plot_prnTrend. We will need to first execute anal_prnTrend, and then plot_prnTrend.

Peptide → Histogram
1 TWh
Peptide → Heat map
1 TWh
Peptide → MDS
1 TWh
Peptide → PCA
1 TWh
Peptide → LDA
1 TWh
Peptide → Correlation
1 TWh
Peptide → Volcano plot
1 TWh
Peptide → NMF
1 TWh
Protein → Histogram
1 TWh
Protein → Heat map
1 TWh
Protein → MDS
1 TWh
Protein → PCA
1 TWh
Protein → LDA
1 TWh
Protein → Correlation
1 TWh
Protein → Volcano plot
1 TWh
Protein → NMF
1 TWh
Protein → Trend
1 TWh
Protein → GSEA
1 TWh
Protein → GSVA
1 TWh
Protein → GSPA
1 TWh
Histogram → pepHist
1 TWh
Histogram → prnHist
1 TWh
Heat map → pepHM
1 TWh
Heat map → prnHM
1 TWh
MDS → pepMDS
1 TWh
MDS → prnMDS
1 TWh
PCA → pepPCA
1 TWh
PCA → prnPCA
1 TWh
LDA → pepLDA
1 TWh
LDA → prnLDA
1 TWh
Correlation → pepCorr_logFC
1 TWh
Correlation → pepCorr_logInt
1 TWh
Correlation → prnCorr_logFC
1 TWh
Correlation → prnCorr_logInt
1 TWh
Volcano plot → pepVol
1 TWh
Volcano plot → prnVol
1 TWh
NMF → anal_pepNMF
1 TWh
NMF → plot_pepNMFCoef
1 TWh
NMF → plot_pepNMFCon
1 TWh
NMF → anal_prnNMF
1 TWh
NMF → plot_prnNMFCoef
1 TWh
NMF → plot_prnNMFCon
1 TWh
NMF → plot_metaNMF
1 TWh
Trend → anal_prnTrend
1 TWh
Trend → plot_prnTrend
1 TWh
GSEA → prnGSEA
1 TWh
GSVA → prnGSVA
1 TWh
GSPA → prnGSPA
1 TWh
GSPA → prnGSPAHM
1 TWh
GSPA → gspaMap
1 TWh
pepHist → filter_
1 TWh
prnHist → filter_
1 TWh
pepHM → filter_
1 TWh
prnHM → filter_
1 TWh
pepMDS → filter_
1 TWh
prnMDS → filter_
1 TWh
pepPCA → filter_
1 TWh
prnPCA → filter_
1 TWh
pepLDA → filter_
1 TWh
prnLDA → filter_
1 TWh
pepCorr_logFC → filter_
1 TWh
pepCorr_logInt → filter_
1 TWh
prnCorr_logFC → filter_
1 TWh
prnCorr_logInt → filter_
1 TWh
pepVol → filter_
1 TWh
prnVol → filter_
1 TWh
anal_pepNMF → filter_
1 TWh
anal_prnNMF → filter_
1 TWh
plot_metaNMF → filter_
1 TWh
anal_prnTrend → filter_
1 TWh
prnGSEA → filter_
1 TWh
prnGSVA → filter_
1 TWh
prnGSPA → filter_
1 TWh
prnGSPAHM → filter_
1 TWh
gspaMap → filter_
1 TWh
plot_prnTrend → filter2_
1 TWh
plot_pepNMFCoef → filter2_
1 TWh
plot_pepNMFCon → filter2_
1 TWh
plot_prnNMFCoef → filter2_
1 TWh
plot_prnNMFCon → filter2_
1 TWh
prnGSPAHM → filter2_
1 TWh
gspaMap → filter2_
1 TWh
pepHM → arrange_
1 TWh
prnHM → arrange_
1 TWh
plot_metaNMF → arrange_
1 TWh
pepHist → ggplot2
1 TWh
prnHist → ggplot2
1 TWh
pepMDS → ggplot2
1 TWh
prnMDS → ggplot2
1 TWh
pepPCA → ggplot2
1 TWh
prnPCA → ggplot2
1 TWh
pepLDA → ggplot2
1 TWh
prnLDA → ggplot2
1 TWh
pepVol → ggplot2
1 TWh
prnVol → ggplot2
1 TWh
plot_prnTrend → ggplot2
1 TWh
gspaMap → ggplot2
1 TWh
pepHM → pheatmap
1 TWh
prnHM → pheatmap
1 TWh
plot_pepNMFCoef → pheatmap
1 TWh
plot_pepNMFCon → pheatmap
1 TWh
plot_prnNMFCoef → pheatmap
1 TWh
plot_prnNMFCon → pheatmap
1 TWh
plot_metaNMF → pheatmap
1 TWh
prnGSPAHM → pheatmap
1 TWh
Peptide
8 TWh
Peptide
Protein
12 TWh
Protein
Histogram
2 TWh
Histogram
Heat map
2 TWh
Heat map
MDS
2 TWh
MDS
PCA
2 TWh
PCA
LDA
2 TWh
LDA
Correlation
4 TWh
Correlation
Volcano plot
2 TWh
Volcano plot
NMF
7 TWh
NMF
Trend
2 TWh
Trend
GSEA
1 TWh
GSEA
GSVA
1 TWh
GSVA
GSPA
3 TWh
GSPA
pepHist
2 TWh
pepHist
prnHist
2 TWh
prnHist
pepHM
3 TWh
pepHM
prnHM
3 TWh
prnHM
pepMDS
2 TWh
pepMDS
prnMDS
2 TWh
prnMDS
pepPCA
2 TWh
pepPCA
prnPCA
2 TWh
prnPCA
pepLDA
2 TWh
pepLDA
prnLDA
2 TWh
prnLDA
pepCorr_logFC
1 TWh
pepCorr_logFC
pepCorr_logInt
1 TWh
pepCorr_logInt
prnCorr_logFC
1 TWh
prnCorr_logFC
prnCorr_logInt
1 TWh
prnCorr_logInt
pepVol
2 TWh
pepVol
prnVol
2 TWh
prnVol
anal_pepNMF
1 TWh
anal_pepNMF
plot_pepNMFCoef
2 TWh
plot_pepNMFCoef
plot_pepNMFCon
2 TWh
plot_pepNMFCon
anal_prnNMF
1 TWh
anal_prnNMF
plot_prnNMFCoef
2 TWh
plot_prnNMFCoef
plot_prnNMFCon
2 TWh
plot_prnNMFCon
plot_metaNMF
3 TWh
plot_metaNMF
anal_prnTrend
1 TWh
anal_prnTrend
plot_prnTrend
2 TWh
plot_prnTrend
prnGSEA
1 TWh
prnGSEA
prnGSVA
1 TWh
prnGSVA
prnGSPA
1 TWh
prnGSPA
prnGSPAHM
3 TWh
prnGSPAHM
gspaMap
3 TWh
gspaMap
filter_
25 TWh
filter_
filter2_
7 TWh
filter2_
arrange_
3 TWh
arrange_
ggplot2
12 TWh
ggplot2
pheatmap
8 TWh
pheatmap

Variable arguments (varargs) are broadly employed in proteoQ, typically for the purpose of flexible data row subsetting and ordering. The data files coming out of Data normalization are considered primary and termed df. We can then apply varargs of filter_ and arrange_ to utilities that read df. For example, prnHM imports the protein table, let’s say Protein.txt, and can take filter_ and arrange_ varargs for row filtration and ordering, respectively.

There are also utilities that read data files from informatic analysis. These files are termed secondary data files, df2. One example is plot_prnNMFCon that imports the consensus findings from anal_prnNMF for heat map visualization. The corresponding vararg for data filtration is filter2_.

The gspaMap is more unique in that it processes both df, the protein table from Data normalization, and df2, the gene set results from prnGSPA. Thus, it is possible to apply both filter_ and filter2_ when calling the module.2

The graphic visualization is typically facilitated through ggplot2 or pheatmap, with the later being used exclusively for heat maps. Parameter passing via dot-dot-dot is generally applicable. To take advantage of the full-fledged ggplot2, results are exported when possible. In the following example, we pass the findings from prnMDS for external ggplot2 visualization:

library(ggplot2)
res <- prnMDS()
p <- ggplot(res) + ...

Variable arguments

Utility Vararg_ df Vararg2_ df2
normPSM filter_ Mascot, F[…].csv; MaxQuant, msms[…].txt; SM, PSMexport[…].ssv NA NA
PSM2Pep NA NA NA NA
mergePep filter_ TMTset1_LCMSinj1_Peptide_N.txt NA NA
standPep slice_ Peptide.txt NA NA
Pep2Prn filter_ Peptide.txt NA NA
standPrn slice_ Protein.txt NA NA
pepHist filter_ Peptide.txt NA NA
prnHist filter_ Protein.txt NA NA
pepSig filter_ Peptide[_impNA].txt NA NA
prnSig filter_ Protein[_impNA].txt NA NA
pepMDS filter_ Peptide[_impNA][_pVal].txt NA NA
prnMDS filter_ Protein[_impNA][_pVal].txt NA NA
pepPCA filter_ Peptide[_impNA][_pVal].txt NA NA
prnPCA filter_ Protein[_impNA][_pVal].txt NA NA
pepEucDist filter_ Peptide[_impNA][_pVal].txt NA NA
prnEucDist filter_ Protein[_impNA][_pVal].txt NA NA
pepCorr_logFC filter_ Peptide[_impNA][_pVal].txt NA NA
prnCorr_logFC filter_ Protein[_impNA][_pVal].txt NA NA
pepHM filter_, arrange_ Peptide[_impNA][_pVal].txt NA NA
prnHM filter_, arrange_ Protein[_impNA][_pVal].txt NA NA
anal_prnTrend filter_ Protein[_impNA][_pVal].txt NA NA
plot_prnTrend NA NA filter2_ […]Protein_Trend_{NZ}[_impNA][…].txt
anal_pepNMF filter_ Peptide[_impNA][_pVal].txt NA NA
anal_prnNMF filter_ Protein[_impNA][_pVal].txt NA NA
plot_pepNMFCon NA NA filter2_ […]Peptide_NMF[…]_consensus.txt
plot_prnNMFCon NA NA filter2_ […]Protein_NMF[…]_consensus.txt
plot_pepNMFCoef NA NA filter2_ […]Peptide_NMF[…]_coef.txt
plot_prnNMFCoef NA NA filter2_ […]Protein_NMF[…]_coef.txt
plot_metaNMF filter_, arrange_ Protein[_impNA][_pVal].txt NA NA
prnGSPA filter_ Protein[_impNA]_pVals.txt NA NA
prnGSPAHM NA NA filter2_ […]Protein_GSPA_{NZ}[_impNA]_essmap.txt
gspaMap filter_ Protein[_impNA]_pVal.txt filter2_ […]Protein_GSPA_{NZ}[_impNA].txt
anal_prnString filter_ Protein[_impNA][_pVals].txt NA NA

  1. See also the help documents in proteoQ and the online README↩︎

  2. See also ?proteoQ::gspaMap for details.↩︎

Qiang Zhang
Qiang Zhang
Instructor of Medicine

My research interests include mass spectrometry-based proteomics and automation in data analysis.