%\VignetteIndexEntry{LPE test for microarray data with small number of replicates} %\VignetteKeywords{Local pooled error, replicates} %\VignetteDepends{LPE} %\VignettePackage{LPE} \documentclass[12pt]{article} \usepackage{amsmath} \usepackage{hyperref} \usepackage{url} \usepackage{isorot} \usepackage{fullpage} % standard 1 inch margins \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\textit{#1}}} \newcommand{\code}[1]{\texttt{#1}} \newcommand{\email}[1]{\texttt{#1}} %%% Hyperlinks for ``PDF Latex'' : \ifx\pdfoutput\undefined%%--- usual ``latex'' : %% Stuff w/out hyperref \else%%---------------------- `` pdflatex '' : -- still gives funny errors \RequirePackage{hyperref} %% The following is R's share/texmf/hyperref.cfg : %% Stuff __with__ hyperref : \hypersetup{% %default: hyperindex,% colorlinks,% %default: pagebackref,% linktocpage,% %%plainpages=false,% linkcolor=Green,% citecolor=Blue,% urlcolor=Red,% pdfstartview=Fit,% pdfview={XYZ null null null}% } \RequirePackage{color} \definecolor{Blue}{rgb}{0,0,0.8} \definecolor{Green}{rgb}{0.1,0.75,0.1} \definecolor{Red}{rgb}{0.7,0,0} %% ESS JCGS v2 : %%\hypersetup{backref,colorlinks=true,pagebackref=true,hyperindex=true} %%\hypersetup{backref,colorlinks=false,pagebackref=true,hyperindex=true} \fi \usepackage{Sweave} \begin{document} \title{LPE test for microarray data with a small number of replicates} \author{ Nitin Jain \email{}, \\ Michael O Connell \email{},\\ and Jae K. Lee \email{} } \maketitle \tableofcontents \section{Introduction} The \Rpackage{LPE} package describes local-pooled-error (LPE) test for identifying significant differentially expressed genes in microarray experiments. Local pooled error test is especially useful when the number of replicates is low (2-3) \cite{Jain:2003}. LPE estimation is based on pooling errors within genes and between replicate arrays for genes in which expression values are similar. This is motivated by the observation that errors between duplicates vary as a function of the average gene expression intensity and by the fact that many gene expression studies are implemented with a limited number of replicated arrays \cite{Chen:1997}. LPE library is primarily used for analyzing data between two conditions. To use it for paired data, see \Rpackage{LPEP} library. For using LPE in multiple conditions, use \Rpackage{HEM} library. \subsection{Mouse Immune Response Study dataset} \label{sec:analysis} Step by step analysis is presented in Section \ref{sec:analysis} using data from a 6-chip (Affymetrix munie GeneChip, MG-U74Av2) oligonucleotide microarray study of a mouse immune response study. Three replicates of Affymetrix oligonucleotide chips per condition (Naive and Activated) were used. Mouse immune response study was conducted by \href{http://bme.virginia.edu/ley/lab/}{Dr. Klaus Ley}, Univeristy of Virginia. Details of methodology and application of Local Pooled Error (LPE) test can be obtained from the LPE paper, published in Bioinformatics \cite{Jain:2003}, and detailed description of \code{Rank-invariant resampling based FDR method} can be obtained from FDR paper, published in BMC Bioinformatics \cite{Jain:2005}. \section{Analyzing data set using \code{LPE} library} \subsection{Check installed version of \Rpackage{LPE}} First, make sure that the version of LPE library you are using is at least 1.6.0 <>= packageDescription("LPE") @ If you have an older version, download the latest version from Bioconductor. \subsection{Load the library} First, set the seed to 0 (to have reproducible results as shown below): <>= set.seed(0) @ Load the LPE library: <>= library(LPE) @ \subsection{Load the data set} Load the data set `Ley' (built in LPE package), check its dimensions and see the dataset. For illustration purposes, we will use only a small subset of Ley data (1000 rows) in the subsequent examples. Replicates of Naive condition are named as c1, c2, c3 and those of Activated condition are named as t1, t2 and t3 respectively. <>= data(Ley) dim(Ley) head(Ley) Ley.subset <- Ley[seq(1000),] @ \subsection{Normalization of data} Do the pre-processing (or normalization) of the data. Since the data obtained here is in MAS5 format, we will use data.type=MAS5. (Note that LPE does not require users to normalize the gene-expression data using the \code{preprocess} function. Users can very well use other methods, such as \code{RMA} or any other method of their choice.) The \code{preprocess} function does IQR normalization (so that inter-quartile ranges on all chips are set to their widest range), thresholding (making the intensity values lower than 1.0 to 1.0), log based 2 transformation and LOWESS normalization (if LOWESS is set to TRUE). Note that this preprocess is a simple constant-scale and location-normalization step. <>= Ley.normalized <- Ley.subset Ley.normalized[,2:7] <- preprocess(Ley.subset[,2:7], data.type = "MAS5") Ley.normalized[1:3,] @ \noindent Remove the Affymetrix control spots (whose ID begins with `AFFX'): <>= Ley.final <- Ley.normalized[substring(Ley.normalized$ID,1,4) !="AFFX",] dim(Ley.final) Ley.final[1:3,] @ \subsection{Obtain baseline error distribution} \noindent Calculate the baseline error distribution of Naive condition, which returns a data.frame of A vs M for selected number of bins (= 1/q), where q = quantile. <>= var.Naive <- baseOlig.error(Ley.final[,2:4],q=0.01) dim(var.Naive) var.Naive[1:3,] @ \noindent Similarly calculate the base-line distribution of Activated condition: <>= var.Activated <- baseOlig.error(Ley.final[,5:7], q=0.01) dim(var.Activated) var.Activated[1:3,] @ \subsection{Calculate \code{z-statistics} for each gene} \label{z-stats} Calculate the lpe variance estimates as described above. The function \code{lpe} takes the first two arguments as the replicated data, next two arguments as the baseline distribution of the replicates calculated from the \code{baseOlig.error} function, and Gene IDs as probe.set.name. <>= lpe.val <- data.frame(lpe(Ley.final[,5:7], Ley.final[,2:4], var.Activated, var.Naive, probe.set.name=Ley.final$ID) ) lpe.val <- round(lpe.val, digits=2) dim (lpe.val) lpe.val[1:3,] @ \subsection{FDR correction} Various FDR correction methods are supported in LPE: \code{BH} (Benjamini-Hochberg), \code{BY} (Benjamini-Yekutieli), \code{mix.all} (does FDR adjustment similar to that of SAM) or \code{resamp} (Rank invariant resampling based FDR correction - recommended method). For the sake of completion, there is an option ``Bonferroni'' to get Bonferroni adjusted p-values also. Note that BH and BY methods are adopted from \code{multtest} package. <>= fdr.BH <- fdr.adjust(lpe.val, adjp="BH") dim(fdr.BH) round(fdr.BH[1:4, ],2) @ \noindent Resampling based FDR adjustment takes a while to run, and returns the critical z-values and corresponding FDR. Users can decide from the table that which z.critical values to select (from the lpe results, here in lpe.val) to obtain the target fdr. <>= fdr.resamp <- fdr.adjust(lpe.val, adjp="resamp", iterations=2) fdr.resamp @ \noindent Note that above table may differ slightly due to generation of `NULL distribution' by resampling. For each target.fdr, we can note critical z-value, above which all genes are considered significant. For example, in the above table, to obtain all the genes with FDR less than or equal to 5\%, identify the \code{z.critical} corresponding to 5\%, (=1.77), and subselect all the genes obtained from LPE-method (section \ref{z-stats}) for which absolute value of \code{z.statistics} is greater than 1.77. Finally, here is an example of Bonferroni correction (sorted in order of significance): <>= Bonferroni.adjp <- fdr.adjust(lpe.val, adjp="Bonferroni") head(Bonferroni.adjp) @ \section{Discussion} \label{sec:discussion} Using our LPE approach, the sensitivity of detecting subtle expression changes can be dramatically increased and differential gene expression patterns can be identified with both small false-positive and small false-negative error rates. This is because, in contrast to the individual gene's error variance, the local pooled error variance can be estimated very accurately. \vspace{0.25 in} \textbf{Acknowledgments}. We wish to acknowledge the following colleagues: P. Aboyoun, J. Betcher, D Clarkson, J. Gibson, A. Hoering, S. Kaluzny, L. Kannapel, D. Kinsey, P. McKinnis, D. Stanford, S. Vega and H. Yan. \begin{thebibliography}{10} \expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi \expandafter\ifx\csname url\endcsname\relax \def\url#1{{\tt #1}}\fi \bibitem{Jain:2003} Jain et.\ al. \newblock {Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays}, \newblock {\em Bioinformatics}, 2003, Vol 19, No. 15, pp: 1945-1951. \bibitem{Jain:2005} Jain et.\ al.\ \newblock{Rank-invariant resampling based estimation of false discovery rate for analysis of small sample microarray data}, \newblock {\em BMC Bioinformatics}, 2005, Vol 6, 187. \bibitem{Chen:1997} Chen et.\ al.\ \newblock {Ratio-based decisions and the quantitative analysis of cDNA microarray images}, \newblock {\em Biomedical Optics}, 1997, Vol 2, pp: 364-374. \end{thebibliography} \end{document}