Posts

Understanding the Benjamini–Hochberg procedure

The prnSig and pepSig utilities in proteoQ perform significance analysis for protein and peptide data, respectively. In what follows the step of p-value assessment, multiple test corrections using the stats::p.

Interfaces to search engines

I have shown previously some unique aspects in exporting PSMs from Mascot, for uses with proteoQ. In this post, I will do the same for more search engines, e.

Exporting Mascot PSMs

Redundancy handling When exporting Mascot PSMs, one choice is to apply upfrontly the principle of parsimony by excluding both same-set and sub-set proteins (2011) whose presence may not be unambiguously determined.

Metadata files for LFQ

The LFQ workflows in proteoQ take the same metadata files as those with TMT procedures; namely, expt_smry.xlsx and frac_smry.xlsx by default. No prefractionation For experiments without peptide prefractionation, we only need to prepare the expt_smry.

LDA in proteoQ

\[\newcommand{\mx}[1]{\mathbf{#1}} \def\pone{\mathbf{\mathbf{\phi}}_{1}} \def\ponet{\mathbf{\mathbf{\phi}}_{1}^{T}} \def\phat{\hat{\phi}_{1}} \def\X{\mathbf{X}} \def\Xt{\mathbf{X}^{T}} \def\y{\mathbf{y}} \def\A{\mathbf{A}} \def\B{\mathbf{B}} \def\W{\mathbf{W}} \def\argmax{\mathrm{arg\, max}} \def\argmaxphii{\underset{\left \| \pone \right \|=1}{\argmax}}\] Linear discriminant analysis (LDA) is popular for both the classification and the dimension reduction of data at two or more categories.

Wrapping PCA into proteoQ

\[\newcommand{\mx}[1]{\mathbf{#1}} \def\pone{\mathbf{\mathbf{\phi}}_{1}} \def\ponet{\mathbf{\mathbf{\phi}}_{1}^{T}} \def\phat{\hat{\phi}_{1}} \def\xi{\mathbf{x}_{i}} \def\xit{\mathbf{x_{i}}^{T}} \def\X{\mathbf{X}} \def\Xt{\mathbf{X}^{T}} \def\Q{Q(\mathbf{\phi}_{1})} \def\A{\mathbf{A}} \def\U{\mathbf{U}} \def\V{\mathbf{V}} \def\Vt{\mathbf{V}^{T}} \def\I{\mathbf{I}} \def\L{\mathbf{\Lambda}} \def\l{\mathbf{\lambda}} \def\v{\mathbf{v}} \def\sumn{\sum_{i=1}^{n}} \def\sumj{\sum_{j=1}^{p}} \def\summ{\sum_{m=1}^{p}} \def\argmax{\mathrm{arg\, max}} \def\argmin{\mathrm{arg\, min}} \def\argminphii{\underset{\left \| \pone \right \|=1}{\argmin}} \def\argmaxphii{\underset{\left \| \pone \right \|=1}{\argmax}} \def\argmaxvphii{\underset{\left \| \V \pone \right \|=1}{\argmax}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% Diagnoal matrix Lambda \def\Lmat{\begin{bmatrix} \l_{1} & & & \\ & \l_{2} & & \\ & & \ddots & \\ & & & \l_{p} \end{bmatrix}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% Transposed phi_1 * v \def\phiv{\begin{bmatrix} \pone \cdot\v_{1} \\ \pone \cdot\v_{2} \\ \vdots \\ \pone \cdot\v_{p} \\ \end{bmatrix}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% phi_1 * v \def\phivc{\begin{bmatrix} \pone\cdot\v_{1} & \pone\cdot\v_{2} & \cdots & \pone\cdot\v_{p} \end{bmatrix}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% lambda * (phi_1 * v) \def\phivl{\begin{bmatrix} \l_{1} \pone \cdot\v_{1} & \l_{2} \pone \cdot\v_{2} & \cdots & \l_{p} \pone \cdot\v_{p} \end{bmatrix}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% lambda * (phi_1 * v)^2 \def\phivls{\l_{1} (\pone \cdot \v_{1})^{2} + \l_{2} (\pone \cdot \v_{2})^{2} + \cdots \l_{p} (\pone \cdot \v_{p})^{2}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% Transposed V \def\Vmatt{\begin{bmatrix} \v_{1}^{T} \\ \v_{2}^{T} \\ \vdots \\ \v_{p}^{T} \\ \end{bmatrix}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% V \def\Vmat{\begin{bmatrix} \v_{1} & \v_{2} & \cdots & \v_{p} \end{bmatrix}}\]

Data preprocessing

In a recent post, I described the components in proteoQ for data normalization and informatic analysis. Most of the time, I get my job done by invoking modules in the program.

How do I run proteoQ

The framework of proteoQ can be divided conceptually into the units of data normalization and informatic analysis. Data normalization The modules in data normalization need to be run in order, starting with the upload of metadata by load_expts and ending with the compilation of protein table by standPrn.

Merge data at different TMT plexes

The standard Tandem Mass Tag@ (TMT) kits are capable of relative quantitation with up to 11 samples under one experiment, by adding a chemical structure at a monoisotopic mass of 229.

Sample exclusions

With proteoQ, the removal of sample(s) from analysis is a simple matter of deleting corresponding entries under column Sample_ID in a metadata file. In the example shown below, we choose to exclude the two samples at TMT channels of 129N and 130C from the experiment at TMT_Set 1: