Statistics for Microarrays: Design, Analysis and Inference is the first book that presents a coherent and systematic overview of statistical methods in all stages in the process of analysing microarray data from getting good data to obtaining meaningful results.
- Provides an overview of statistics for microarrays, including experimental design, data preparation, image analysis, normalization, quality control, and statistical inference.
- Features many examples throughout using real data from microarray experiments.
- Computational techniques are integrated into the text.
- Takes a very practical approach, suitable for statistically–minded biologists.
- Supported by a Website featuring colour images, software, and data sets.
1.1 Using the R Computing Environment.
1.1.1 Installing smida.
1.1.2 Loading smida.
1.2 Data Sets from Biological Experiments.
1.2.1 Arabidopsis experiment: Anna Amtmann.
1.2.2 Skin cancer experiment: Nighean Barr.
1.2.3 Breast cancer experiment: John Bartlett.
1.2.4 Mammary gland experiment: Gusterson group.
1.2.5 Tuberculosis experiment: BµG@S group.
I Getting Good Data.
2 Set–up of a Microarray Experiment.
2.1 Nucleic Acids: DNA and RNA.
2.2 Simple cDNA Spotted Microarray Experiment.
2.2.1 Growing experimental material.
2.2.2 Obtaining RNA.
2.2.3 Adding spiking RNA and poly–T primer.
2.2.4 Preparing the enzyme environment.
2.2.5 Obtaining labelled cDNA.
2.2.6 Preparing cDNA mixture for hybridization.
2.2.7 Slide hybridization.
3 Statistical Design of Microarrays.
3.1 Sources of Variation.
3.2.1 Biological and technical replication.
3.2.2 How many replicates?
3.2.3 Pooling samples.
3.3 Design Principles.
3.3.1 Blocking, crossing and randomization.
3.3.2 Design and normalization.
3.4 Single–channelMicroarray Design.
3.4.1 Design issues.
3.4.2 Design layout.
3.4.3 Dealing with technical replicates.
3.5 Two–channelMicroarray Designs.
3.5.1 Optimal design of dual–channel arrays.
3.5.2 Several practical two–channel designs.
4.1 Image Analysis.
4.2 Introduction to Normalization.
4.2.1 Scale of gene expression data.
4.2.2 Using control spots for normalization.
4.2.3 Missing data.
4.3 Normalization for Dual–channel Arrays.
4.3.1 Order for the normalizations.
4.3.2 Spatial correction.
4.3.3 Background correction.
4.3.4 Dye effect normalization.
4.3.5 Normalization within and across conditions.
4.4 Normalization of Single–channel Arrays.
4.4.1 Affymetrix data structure.
4.4.2 Normalization of Affymetrix data.
5 Quality Assessment.
5.1 Using MIAME in Quality Assessment.
5.1.1 Components of MIAME.
5.2 Comparing Multivariate Data.
5.2.1 Measurement scale.
5.2.2 Dissimilarity and distance measures.
5.2.3 Representing multivariate data.
5.3 Detecting Data Problems.
5.3.1 Clerical errors.
5.3.2 Normalization problems.
5.3.3 Hybridization problems.
5.3.4 Array mishandling.
5.4 Consequences of Quality Assessment Checks.
6 Microarray Myths: Data.
6.1.1 Single–versus dual–channel designs?
6.1.2 Dye–swap experiments.
6.2.1 Myth: microarray data is Gaussian .
6.2.2 Myth: microarray data is not Gaussian .
6.2.3 Confounding spatial and dye effect.
6.2.4 Myth: non–negative background subtraction .
II Getting Good Answers.
7 Microarray Discoveries.
7.1 Discovering Sample Classes.
7.1.1 Why cluster samples?
7.1.2 Sample dissimilarity measures.
7.1.3 Clustering methods for samples.
7.2 Exploratory Supervised Learning.
7.2.1 Labelled dendrograms.
7.2.2 Labelled PAM–type clusterings.
7.3 Discovering Gene Clusters.
7.3.1 Similarity measures for expression profiles.
7.3.2 Gene clustering methods.
8 Differential Expression.
8.1.1 Classical versus Bayesian hypothesis testing.
8.1.2 Multiple testing problem .
8.2 Classical Hypothesis Testing.
8.2.1 What is a hypothesis test?
8.2.2 Hypothesis tests for two conditions.
8.2.3 Decision rules.
8.2.4 Results from skin cancer experiment.
8.3 Bayesian Hypothesis Testing.
8.3.1 A general testing procedure.
8.3.2 Bayesian t–test.
9 Predicting Outcomes with Gene Expression Profiles.
9.1.1 Probabilistic classification theory.
9.1.2 Modelling and predicting continuous variables.
9.2 Curse of Dimensionality: Gene Filtering.
9.2.1 Use only significantly expressed genes.
9.2.2 PCA and gene clustering.
9.2.3 Penalized methods.
9.2.4 Biological selection.
9.3 Predicting ClassMemberships.
9.3.1 Variance–bias trade–off in prediction.
9.3.2 Linear discriminant analysis.
9.3.3 k–nearest neighbour classification.
9.4 Predicting Continuous Responses.
9.4.1 Penalized regression: LASSO.
9.4.2 k–nearest neighbour regression.
10 Microarray Myths: Inference.
10.1 Differential Expression.
10.1.1 Myth: Bonferroni is too conservative .
10.1.2 FPR and collective multiple testing.
10.1.3 Misinterpreting FDR.
10.2 Prediction and Learning.