How do I run proteoQ

The framework of proteoQ can be divided conceptually into the units of data normalization and informatic analysis.

Data normalization

The modules in data normalization need to be run in order, starting with the upload of metadata by load_expts and ending with the compilation of protein table by standPrn.

In stead of putting all components into one large unit, utilities in data normalization were divided into smaller pieces for flexible workflows and data mining. One of the features is that we may tailor various filter_ conditions in individual steps, to remove inappropriate entries. Another example is that we may execute repetitively standPep and standPrn with different slice_ conditions to achieve the mixed-bed normalization of data.1

The modules of purgePSM and purgePep may be viewed as optional in that they do not add columns to the respective input datum.

After peptide and protein normalization, we may apply pepSig and prnSig to calculate significance p-values. Methods based on long-run frequency interpretation of probability are used in these two modules.

The modules of pepSig and prnSig, which calculate significance p-values, are arguably part of the normalization steps. One benefit in having them early in the procedure is to allow data subsetting against p-values in downstream analyses. For instance, we may subset data by the lowest p-values for heat map visualization. In addition, prnSig is required for modules such as prnGSPA that uses protein p-values. For the quasi-essentiality, they were assigned to the normalization unit.

NB: every time sample(s) were voided from expt_smry.xlsx, all the units in data normalization need to be re-executed to update the result files of Peptide.txt, Protein.txt etc. A cleaner alternative is to carry out the analysis from scratch as a brand new task.

Informatic analysis

The proteoQ modules in informatics can be executed more independently without specific orders. For instance, we may run from GSEA to Heat map or vice versa. It is also a valid idea to run alongside utilities under the same module. In the example of protein Trend analysis, it comprises the analysis part of anal_prnTrend and the visualization part of plot_prnTrend. We will need to first execute anal_prnTrend, and then plot_prnTrend.

Variable arguments (varargs) are broadly employed in proteoQ, typically for the purpose of flexible data row subsetting and ordering. The data files coming out of Data normalization are considered primary and termed df. We can then apply varargs of filter_ and arrange_ to utilities that read df. For example, prnHM imports the protein table, let’s say Protein.txt, and can take filter_ and arrange_ varargs for row filtration and ordering, respectively.

There are also utilities that read data files from informatic analysis. These files are termed secondary data files, df2. One example is plot_prnNMFCon that imports the consensus findings from anal_prnNMF for heat map visualization. The corresponding vararg for data filtration is filter2_.

The gspaMap is more unique in that it processes both df, the protein table from Data normalization, and df2, the gene set results from prnGSPA. Thus, it is possible to apply both filter_ and filter2_ when calling the module.2

The graphic visualization is typically facilitated through ggplot2 or pheatmap, with the later being used exclusively for heat maps. Parameter passing via dot-dot-dot is generally applicable. To take advantage of the full-fledged ggplot2, results are exported when possible. In the following example, we pass the findings from prnMDS for external ggplot2 visualization:

library(ggplot2)
res <- prnMDS()
p <- ggplot(res) + ...

Variable arguments

Utility Vararg_ df Vararg2_ df2
normPSM filter_ Mascot, F[…].csv; MaxQuant, msms[…].txt; SM, PSMexport[…].ssv NA NA
PSM2Pep NA NA NA NA
mergePep filter_ TMTset1_LCMSinj1_Peptide_N.txt NA NA
standPep slice_ Peptide.txt NA NA
Pep2Prn filter_ Peptide.txt NA NA
standPrn slice_ Protein.txt NA NA
pepHist filter_ Peptide.txt NA NA
prnHist filter_ Protein.txt NA NA
pepSig filter_ Peptide[_impNA].txt NA NA
prnSig filter_ Protein[_impNA].txt NA NA
pepMDS filter_ Peptide[_impNA][_pVal].txt NA NA
prnMDS filter_ Protein[_impNA][_pVal].txt NA NA
pepPCA filter_ Peptide[_impNA][_pVal].txt NA NA
prnPCA filter_ Protein[_impNA][_pVal].txt NA NA
pepEucDist filter_ Peptide[_impNA][_pVal].txt NA NA
prnEucDist filter_ Protein[_impNA][_pVal].txt NA NA
pepCorr_logFC filter_ Peptide[_impNA][_pVal].txt NA NA
prnCorr_logFC filter_ Protein[_impNA][_pVal].txt NA NA
pepHM filter_, arrange_ Peptide[_impNA][_pVal].txt NA NA
prnHM filter_, arrange_ Protein[_impNA][_pVal].txt NA NA
anal_prnTrend filter_ Protein[_impNA][_pVal].txt NA NA
plot_prnTrend NA NA filter2_ […]Protein_Trend_{NZ}[_impNA][…].txt
anal_pepNMF filter_ Peptide[_impNA][_pVal].txt NA NA
anal_prnNMF filter_ Protein[_impNA][_pVal].txt NA NA
plot_pepNMFCon NA NA filter2_ […]Peptide_NMF[…]_consensus.txt
plot_prnNMFCon NA NA filter2_ […]Protein_NMF[…]_consensus.txt
plot_pepNMFCoef NA NA filter2_ […]Peptide_NMF[…]_coef.txt
plot_prnNMFCoef NA NA filter2_ […]Protein_NMF[…]_coef.txt
plot_metaNMF filter_, arrange_ Protein[_impNA][_pVal].txt NA NA
prnGSPA filter_ Protein[_impNA]_pVals.txt NA NA
prnGSPAHM NA NA filter2_ […]Protein_GSPA_{NZ}[_impNA]_essmap.txt
gspaMap filter_ Protein[_impNA]_pVal.txt filter2_ […]Protein_GSPA_{NZ}[_impNA].txt
anal_prnString filter_ Protein[_impNA][_pVals].txt NA NA

  1. See also the help documents in proteoQ and the online README↩︎

  2. See also ?proteoQ::gspaMap for details.↩︎

Qiang Zhang
Qiang Zhang
Instructor of Medicine

My research interests include mass spectrometry-based proteomics and automation in data analysis.