How do I run proteoQ

Last updated on Apr 15, 2020 5 min read

The framework of proteoQ can be divided conceptually into the units of data normalization and informatic analysis.

Data normalization

The modules in data normalization need to be run in order, starting with the upload of metadata by load_expts and ending with the compilation of protein table by standPrn.

In stead of putting all components into one large unit, utilities in data normalization were divided into smaller pieces for flexible workflows and data mining. One of the features is that we may tailor various filter_ conditions in individual steps, to remove inappropriate entries. Another example is that we may execute repetitively standPep and standPrn with different slice_ conditions to achieve the mixed-bed normalization of data.¹

The modules of purgePSM and purgePep may be viewed as optional in that they do not add columns to the respective input datum.

After peptide and protein normalization, we may apply pepSig and prnSig to calculate significance p-values. Methods based on long-run frequency interpretation of probability are used in these two modules.

The modules of pepSig and prnSig, which calculate significance p-values, are arguably part of the normalization steps. One benefit in having them early in the procedure is to allow data subsetting against p-values in downstream analyses. For instance, we may subset data by the lowest p-values for heat map visualization. In addition, prnSig is required for modules such as prnGSPA that uses protein p-values. For the quasi-essentiality, they were assigned to the normalization unit.

NB: every time sample(s) were voided from expt_smry.xlsx, all the units in data normalization need to be re-executed to update the result files of Peptide.txt, Protein.txt etc. A cleaner alternative is to carry out the analysis from scratch as a brand new task.

Informatic analysis

The proteoQ modules in informatics can be executed more independently without specific orders. For instance, we may run from GSEA to Heat map or vice versa. It is also a valid idea to run alongside utilities under the same module. In the example of protein Trend analysis, it comprises the analysis part of anal_prnTrend and the visualization part of plot_prnTrend. We will need to first execute anal_prnTrend, and then plot_prnTrend.

Variable arguments (varargs) are broadly employed in proteoQ, typically for the purpose of flexible data row subsetting and ordering. The data files coming out of Data normalization are considered primary and termed df. We can then apply varargs of filter_ and arrange_ to utilities that read df. For example, prnHM imports the protein table, let’s say Protein.txt, and can take filter_ and arrange_ varargs for row filtration and ordering, respectively.

There are also utilities that read data files from informatic analysis. These files are termed secondary data files, df2. One example is plot_prnNMFCon that imports the consensus findings from anal_prnNMF for heat map visualization. The corresponding vararg for data filtration is filter2_.

The gspaMap is more unique in that it processes both df, the protein table from Data normalization, and df2, the gene set results from prnGSPA. Thus, it is possible to apply both filter_ and filter2_ when calling the module.²

The graphic visualization is typically facilitated through ggplot2 or pheatmap, with the later being used exclusively for heat maps. Parameter passing via dot-dot-dot is generally applicable. To take advantage of the full-fledged ggplot2, results are exported when possible. In the following example, we pass the findings from prnMDS for external ggplot2 visualization:

library(ggplot2)
res <- prnMDS()
p <- ggplot(res) + ...

Variable arguments

Utility	Vararg_	df	Vararg2_	df2
normPSM	filter_	Mascot, F[…].csv; MaxQuant, msms[…].txt; SM, PSMexport[…].ssv	NA	NA
PSM2Pep	NA	NA	NA	NA
mergePep	filter_	TMTset1_LCMSinj1_Peptide_N.txt	NA	NA
standPep	slice_	Peptide.txt	NA	NA
Pep2Prn	filter_	Peptide.txt	NA	NA
standPrn	slice_	Protein.txt	NA	NA
pepHist	filter_	Peptide.txt	NA	NA
prnHist	filter_	Protein.txt	NA	NA
pepSig	filter_	Peptide[_impNA].txt	NA	NA
prnSig	filter_	Protein[_impNA].txt	NA	NA
pepMDS	filter_	Peptide[_impNA][_pVal].txt	NA	NA
prnMDS	filter_	Protein[_impNA][_pVal].txt	NA	NA
pepPCA	filter_	Peptide[_impNA][_pVal].txt	NA	NA
prnPCA	filter_	Protein[_impNA][_pVal].txt	NA	NA
pepEucDist	filter_	Peptide[_impNA][_pVal].txt	NA	NA
prnEucDist	filter_	Protein[_impNA][_pVal].txt	NA	NA
pepCorr_logFC	filter_	Peptide[_impNA][_pVal].txt	NA	NA
prnCorr_logFC	filter_	Protein[_impNA][_pVal].txt	NA	NA
pepHM	filter_, arrange_	Peptide[_impNA][_pVal].txt	NA	NA
prnHM	filter_, arrange_	Protein[_impNA][_pVal].txt	NA	NA
anal_prnTrend	filter_	Protein[_impNA][_pVal].txt	NA	NA
plot_prnTrend	NA	NA	filter2_	[…]Protein_Trend_{NZ}[_impNA][…].txt
anal_pepNMF	filter_	Peptide[_impNA][_pVal].txt	NA	NA
anal_prnNMF	filter_	Protein[_impNA][_pVal].txt	NA	NA
plot_pepNMFCon	NA	NA	filter2_	[…]Peptide_NMF[…]_consensus.txt
plot_prnNMFCon	NA	NA	filter2_	[…]Protein_NMF[…]_consensus.txt
plot_pepNMFCoef	NA	NA	filter2_	[…]Peptide_NMF[…]_coef.txt
plot_prnNMFCoef	NA	NA	filter2_	[…]Protein_NMF[…]_coef.txt
plot_metaNMF	filter_, arrange_	Protein[_impNA][_pVal].txt	NA	NA
prnGSPA	filter_	Protein[_impNA]_pVals.txt	NA	NA
prnGSPAHM	NA	NA	filter2_	[…]Protein_GSPA_{NZ}[_impNA]_essmap.txt
gspaMap	filter_	Protein[_impNA]_pVal.txt	filter2_	[…]Protein_GSPA_{NZ}[_impNA].txt
anal_prnString	filter_	Protein[_impNA][_pVals].txt	NA	NA

See also the help documents in proteoQ and the online README ↩︎
See also ?proteoQ::gspaMap for details.↩︎

How do I run proteoQ

Data normalization

Informatic analysis

Variable arguments

Qiang Zhang

Instructor of Medicine