%\VignetteIndexEntry{LPE test for microarray data with small number of replicates}
%\VignetteKeywords{Local pooled error, replicates}
%\VignetteDepends{LPE}
%\VignettePackage{LPE}

\documentclass[12pt]{article}
\usepackage{amsmath}
\usepackage{hyperref}
\usepackage{url}
\usepackage{isorot}
\usepackage{fullpage} % standard 1 inch margins 

\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\textit{#1}}}
\newcommand{\code}[1]{\texttt{#1}}
\newcommand{\email}[1]{\texttt{#1}}

%%% Hyperlinks for ``PDF Latex'' :
\ifx\pdfoutput\undefined%%--- usual ``latex'' :
  %% Stuff w/out hyperref
\else%%---------------------- `` pdflatex '' : -- still gives funny errors
  \RequirePackage{hyperref}
  %% The following is R's share/texmf/hyperref.cfg :
  %% Stuff __with__ hyperref :
  \hypersetup{%
    %default: hyperindex,%
    colorlinks,%
    %default: pagebackref,%
    linktocpage,%
    %%plainpages=false,%
    linkcolor=Green,%
    citecolor=Blue,%
    urlcolor=Red,%
    pdfstartview=Fit,%
    pdfview={XYZ null null null}%
    }
  \RequirePackage{color}
  \definecolor{Blue}{rgb}{0,0,0.8}
  \definecolor{Green}{rgb}{0.1,0.75,0.1}
  \definecolor{Red}{rgb}{0.7,0,0}
  %% ESS JCGS v2 :
  %%\hypersetup{backref,colorlinks=true,pagebackref=true,hyperindex=true}
  %%\hypersetup{backref,colorlinks=false,pagebackref=true,hyperindex=true}
\fi

\usepackage{Sweave}
\begin{document}
\title{LPE test for microarray data with a small number of replicates}
\author{
        Nitin Jain
        \email{<nitin.jain@pfizer.com>}, 
        \\
        Michael O Connell
        \email{<moconnell@insightful.com>},\\
         and Jae K. Lee
        \email{<jaeklee@virginia.edu>}
        }

\maketitle
\tableofcontents


\section{Introduction}
The \Rpackage{LPE} package describes local-pooled-error (LPE) 
test for identifying significant differentially expressed 
genes in microarray experiments. Local pooled error test is
 especially useful when the number of replicates is low (2-3)
\cite{Jain:2003}.
 LPE estimation is based on pooling errors within
genes and between replicate arrays for genes in which expression
values are similar. This is motivated by the observation that
errors between duplicates vary as a function of the average gene
expression intensity and by the fact that many gene expression
studies are implemented with a limited number of replicated arrays
\cite{Chen:1997}.

LPE library is primarily used for analyzing data between two
conditions. To use it for paired data, see \Rpackage{LPEP} library.
For using LPE in multiple conditions, use \Rpackage{HEM} library.


\subsection{Mouse Immune Response Study dataset}
\label{sec:analysis}

Step by step analysis is presented in Section \ref{sec:analysis}
using data from a 6-chip (Affymetrix munie GeneChip, MG-U74Av2)
oligonucleotide microarray study of a mouse immune response study.
Three replicates of Affymetrix oligonucleotide chips per condition
(Naive and Activated) were used.  Mouse immune response study was
conducted by \href{http://bme.virginia.edu/ley/lab/}{Dr.  Klaus
 Ley}, Univeristy of Virginia.  

Details of methodology and application of Local Pooled Error (LPE)
test can be obtained from the LPE paper, published in Bioinformatics
\cite{Jain:2003}, and detailed description of \code{Rank-invariant
  resampling based FDR method} can be obtained from FDR paper,
published in BMC Bioinformatics \cite{Jain:2005}.


\section{Analyzing data set using \code{LPE} library}

\subsection{Check installed version of \Rpackage{LPE}}
First, make sure that the version of LPE library you are using is
at least 1.6.0
<<echo=TRUE, eval=FALSE>>=
packageDescription("LPE")
@ 
If you have an older version, download the latest version from Bioconductor.

\subsection{Load the library}

First, set the seed to 0 (to have reproducible results as shown
below):

<<echo=TRUE, eval=TRUE>>=
set.seed(0) 
@ 

Load the LPE library:
<<echo=TRUE, eval=TRUE>>=
library(LPE) 
@ 

\subsection{Load the data set}
Load the data set `Ley' (built in LPE package), check its dimensions
and see the dataset.  For illustration purposes, we will use only a
small subset of Ley data (1000 rows) in the subsequent examples.
Replicates of Naive condition are named as c1, c2, c3 and those of
Activated condition are named as t1, t2 and t3 respectively.

<<echo=TRUE, eval=TRUE>>=
data(Ley)
dim(Ley)
head(Ley)
Ley.subset <- Ley[seq(1000),]
@

\subsection{Normalization of data} 
Do the pre-processing (or normalization) of the data. Since the data
obtained here is in MAS5 format, we will use data.type=MAS5. (Note
that LPE does not require users to normalize the gene-expression data
using the \code{preprocess} function. Users can very well use other
methods, such as \code{RMA} or any other method of their choice.)

The \code{preprocess} function does IQR normalization (so that
inter-quartile ranges on all chips are set to their widest range),
thresholding (making the intensity values lower than 1.0 to 1.0),
log based 2 transformation and LOWESS normalization (if LOWESS is
set to TRUE).  Note that this preprocess is a simple constant-scale
and location-normalization step.

<<echo=TRUE, eval=TRUE>>=
Ley.normalized <- Ley.subset
Ley.normalized[,2:7] <- preprocess(Ley.subset[,2:7], data.type = "MAS5")
Ley.normalized[1:3,]
@

\noindent Remove the Affymetrix control spots (whose ID begins with `AFFX'):

<<echo=TRUE, eval=TRUE>>=
Ley.final <- Ley.normalized[substring(Ley.normalized$ID,1,4) !="AFFX",] 
dim(Ley.final)
Ley.final[1:3,]
@

\subsection{Obtain baseline error distribution}
\noindent Calculate the baseline error distribution of Naive condition,
which returns a data.frame of A vs M for selected number of bins (= 1/q), 
where q = quantile.


<<echo=TRUE, eval=TRUE>>=
var.Naive <- baseOlig.error(Ley.final[,2:4],q=0.01)
dim(var.Naive)
var.Naive[1:3,]
@

\noindent Similarly calculate the base-line distribution of Activated condition:


<<echo=TRUE, eval=TRUE>>=
var.Activated <- baseOlig.error(Ley.final[,5:7], q=0.01)
dim(var.Activated)
var.Activated[1:3,]
@

\subsection{Calculate \code{z-statistics} for each gene}
\label{z-stats}
Calculate the lpe variance estimates as described above. The
function \code{lpe} takes the first two arguments as the replicated
data, next two arguments as the baseline distribution of the
replicates calculated from the \code{baseOlig.error} function,
and Gene IDs as probe.set.name.

<<echo=TRUE, eval=TRUE>>=
lpe.val <- data.frame(lpe(Ley.final[,5:7], Ley.final[,2:4], 
                          var.Activated, var.Naive,
                          probe.set.name=Ley.final$ID)
                      )

lpe.val <- round(lpe.val, digits=2)
dim (lpe.val)
lpe.val[1:3,]
@

\subsection{FDR correction}

Various FDR correction methods are supported in LPE: \code{BH}
(Benjamini-Hochberg), \code{BY} (Benjamini-Yekutieli),
\code{mix.all} (does FDR adjustment similar to that of SAM) or
\code{resamp} (Rank invariant resampling based FDR correction -
recommended method). For the sake of completion, there is an option
``Bonferroni'' to get Bonferroni adjusted p-values also. Note that
BH and BY methods are adopted from \code{multtest} package.

<<echo=TRUE, eval=TRUE>>=
fdr.BH <- fdr.adjust(lpe.val, adjp="BH")
dim(fdr.BH)
round(fdr.BH[1:4, ],2)
@

\noindent Resampling based FDR adjustment takes a while to run, and
returns the critical z-values and corresponding FDR. Users can
decide from the table that which z.critical values to select (from
the lpe results, here in lpe.val) to obtain the target fdr.


<<echo=TRUE, eval=TRUE>>=
fdr.resamp <- fdr.adjust(lpe.val, adjp="resamp", iterations=2)
fdr.resamp
@


\noindent Note that above table may differ slightly due to generation 
of `NULL distribution' by resampling. For each target.fdr, we can
note critical z-value, above which all genes are considered
significant. For example, in the above table, to obtain all the
genes with FDR less than or equal to 5\%, identify the
\code{z.critical} corresponding to 5\%, (=1.77), and subselect all
the genes obtained from LPE-method (section \ref{z-stats}) for which
absolute value of \code{z.statistics} is greater than 1.77.

Finally, here is an example of Bonferroni correction (sorted in order
of significance):
<<echo=TRUE, eval=TRUE>>=
Bonferroni.adjp <- fdr.adjust(lpe.val, adjp="Bonferroni")
head(Bonferroni.adjp)
@


\section{Discussion}
\label{sec:discussion}
Using our LPE approach, the sensitivity of detecting subtle
expression changes can be dramatically increased and differential
gene expression patterns can be identified with both small
false-positive and small false-negative error rates. This is
because, in contrast to the individual gene's error variance, the
local pooled error variance can be estimated very accurately.
\vspace{0.25 in}

\textbf{Acknowledgments}. We wish to acknowledge the following
colleagues: P. Aboyoun, J. Betcher, D Clarkson, J. Gibson, A.
Hoering, S. Kaluzny, L. Kannapel, D. Kinsey, P. McKinnis,
D. Stanford, S. Vega and H. Yan.


\begin{thebibliography}{10}
\expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi
\expandafter\ifx\csname url\endcsname\relax
  \def\url#1{{\tt #1}}\fi


\bibitem{Jain:2003}
Jain et.\ al.
\newblock {Local-pooled-error test for identifying 
differentially expressed genes with a small number of replicated
microarrays},
\newblock {\em Bioinformatics}, 2003, Vol 19, No. 15,  pp: 1945-1951.


\bibitem{Jain:2005}
Jain et.\ al.\ 
\newblock{Rank-invariant resampling based estimation of false
    discovery rate for analysis of small sample microarray data},
\newblock {\em  BMC Bioinformatics}, 2005, Vol 6, 187.

\bibitem{Chen:1997}
Chen et.\ al.\
\newblock {Ratio-based
decisions and the quantitative analysis of cDNA microarray images},
\newblock {\em Biomedical Optics}, 1997, Vol 2, pp: 364-374.

\end{thebibliography}

\end{document}