How do I run proteoQ
The framework of proteoQ can be divided conceptually into the units of data normalization and informatic analysis.
Data normalization
The modules in data normalization need to be run in order, starting with the upload of metadata by load_expts
and ending with the compilation of protein table by standPrn
.
In stead of putting all components into one large unit, utilities in data normalization were divided into smaller pieces for flexible workflows and data mining. One of the features is that we may tailor various filter_
conditions in individual steps, to remove inappropriate entries. Another example is that we may execute repetitively standPep
and standPrn
with different slice_
conditions to achieve the mixed-bed normalization of data.1
The modules of
purgePSM
andpurgePep
may be viewed as optional in that they do not add columns to the respective input datum.
After peptide and protein normalization, we may apply pepSig
and prnSig
to calculate significance p-values. Methods based on long-run frequency interpretation of probability are used in these two modules.
The modules of pepSig
and prnSig
, which calculate significance p-values, are arguably part of the normalization steps. One benefit in having them early in the procedure is to allow data subsetting against p-values in downstream analyses. For instance, we may subset data by the lowest p-values for heat map visualization. In addition, prnSig
is required for modules such as prnGSPA
that uses protein p-values. For the quasi-essentiality, they were assigned to the normalization unit.
NB: every time sample(s) were voided from
expt_smry.xlsx
, all the units in data normalization need to be re-executed to update the result files ofPeptide.txt
,Protein.txt
etc. A cleaner alternative is to carry out the analysis from scratch as a brand new task.
Informatic analysis
The proteoQ modules in informatics can be executed more independently without specific orders. For instance, we may run from GSEA
to Heat map
or vice versa. It is also a valid idea to run alongside utilities under the same module. In the example of protein Trend analysis, it comprises the analysis part of anal_prnTrend
and the visualization part of plot_prnTrend
. We will need to first execute anal_prnTrend
, and then plot_prnTrend
.
Variable arguments (varargs) are broadly employed in proteoQ, typically for the purpose of flexible data row subsetting and ordering. The data files coming out of Data normalization
are considered primary and termed df
. We can then apply varargs of filter_
and arrange_
to utilities that read df
. For example, prnHM
imports the protein table, let’s say Protein.txt
, and can take filter_
and arrange_
varargs for row filtration and ordering, respectively.
There are also utilities that read data files from informatic analysis. These files are termed secondary data files, df2
. One example is plot_prnNMFCon
that imports the consensus findings from anal_prnNMF
for heat map visualization. The corresponding vararg for data filtration is filter2_
.
The gspaMap
is more unique in that it processes both df
, the protein table from Data normalization
, and df2
, the gene set results from prnGSPA
. Thus, it is possible to apply both filter_
and filter2_
when calling the module.2
The graphic visualization is typically facilitated through ggplot2
or pheatmap
, with the later being used exclusively for heat maps. Parameter passing via dot-dot-dot
is generally applicable. To take advantage of the full-fledged ggplot2
, results are exported when possible. In the following example, we pass the findings from prnMDS
for external ggplot2
visualization:
library(ggplot2)
res <- prnMDS()
p <- ggplot(res) + ...
Variable arguments
Utility | Vararg_ | df | Vararg2_ | df2 |
---|---|---|---|---|
normPSM | filter_ | Mascot, F[…].csv; MaxQuant, msms[…].txt; SM, PSMexport[…].ssv | NA | NA |
PSM2Pep | NA | NA | NA | NA |
mergePep | filter_ | TMTset1_LCMSinj1_Peptide_N.txt | NA | NA |
standPep | slice_ | Peptide.txt | NA | NA |
Pep2Prn | filter_ | Peptide.txt | NA | NA |
standPrn | slice_ | Protein.txt | NA | NA |
pepHist | filter_ | Peptide.txt | NA | NA |
prnHist | filter_ | Protein.txt | NA | NA |
pepSig | filter_ | Peptide[_impNA].txt | NA | NA |
prnSig | filter_ | Protein[_impNA].txt | NA | NA |
pepMDS | filter_ | Peptide[_impNA][_pVal].txt | NA | NA |
prnMDS | filter_ | Protein[_impNA][_pVal].txt | NA | NA |
pepPCA | filter_ | Peptide[_impNA][_pVal].txt | NA | NA |
prnPCA | filter_ | Protein[_impNA][_pVal].txt | NA | NA |
pepEucDist | filter_ | Peptide[_impNA][_pVal].txt | NA | NA |
prnEucDist | filter_ | Protein[_impNA][_pVal].txt | NA | NA |
pepCorr_logFC | filter_ | Peptide[_impNA][_pVal].txt | NA | NA |
prnCorr_logFC | filter_ | Protein[_impNA][_pVal].txt | NA | NA |
pepHM | filter_, arrange_ | Peptide[_impNA][_pVal].txt | NA | NA |
prnHM | filter_, arrange_ | Protein[_impNA][_pVal].txt | NA | NA |
anal_prnTrend | filter_ | Protein[_impNA][_pVal].txt | NA | NA |
plot_prnTrend | NA | NA | filter2_ | […]Protein_Trend_{NZ}[_impNA][…].txt |
anal_pepNMF | filter_ | Peptide[_impNA][_pVal].txt | NA | NA |
anal_prnNMF | filter_ | Protein[_impNA][_pVal].txt | NA | NA |
plot_pepNMFCon | NA | NA | filter2_ | […]Peptide_NMF[…]_consensus.txt |
plot_prnNMFCon | NA | NA | filter2_ | […]Protein_NMF[…]_consensus.txt |
plot_pepNMFCoef | NA | NA | filter2_ | […]Peptide_NMF[…]_coef.txt |
plot_prnNMFCoef | NA | NA | filter2_ | […]Protein_NMF[…]_coef.txt |
plot_metaNMF | filter_, arrange_ | Protein[_impNA][_pVal].txt | NA | NA |
prnGSPA | filter_ | Protein[_impNA]_pVals.txt | NA | NA |
prnGSPAHM | NA | NA | filter2_ | […]Protein_GSPA_{NZ}[_impNA]_essmap.txt |
gspaMap | filter_ | Protein[_impNA]_pVal.txt | filter2_ | […]Protein_GSPA_{NZ}[_impNA].txt |
anal_prnString | filter_ | Protein[_impNA][_pVals].txt | NA | NA |