Preface |
|
xv | |
1 Microarrays in Gene Expression Studies |
|
1 | (30) |
|
|
1 | (1) |
|
|
2 | (3) |
|
1.2.1 Genome, Genotype, and Gene Expression |
|
|
2 | (1) |
|
1.2.2 Of Wild-Types and Other Alleles |
|
|
3 | (1) |
|
1.2.3 Aspects of Underlying Biology and Physiochemistry |
|
|
4 | (1) |
|
1.3 Polymerase Chain Reaction |
|
|
5 | (1) |
|
|
6 | (1) |
|
1.4.1 Expressed Sequence Tag |
|
|
6 | (1) |
|
1.5 Microarray Technology and Application |
|
|
7 | (17) |
|
1.5.1 History of Microarray Development |
|
|
8 | (2) |
|
1.5.2 Tools of Microarray Technology |
|
|
10 | (8) |
|
1.5.3 Limitations of Microarray Technology |
|
|
18 | (2) |
|
1.5.4 Oligonucleotides versus cDNA Arrays |
|
|
20 | (3) |
|
1.5.5 SAGE: Another Method for Detecting and Measuring Gene Expression Levels |
|
|
23 | (1) |
|
1.5.6 Emerging Technologies |
|
|
24 | (1) |
|
1.6 Sampling of Relevant Research Entities and Public Resources |
|
|
24 | (7) |
2 Cleaning and Normalization |
|
31 | (30) |
|
|
31 | (1) |
|
|
32 | (6) |
|
2.2.1 Image Processing to Extract Information |
|
|
32 | (4) |
|
2.2.2 Missing Value Estimation |
|
|
36 | (2) |
|
2.2.3 Sources of Nonlinearity |
|
|
38 | (1) |
|
2.3 Normalization and Plotting Procedures for Oligonucleotide Arrays |
|
|
38 | (6) |
|
2.3.1 Global Approaches for Oligonucleotide Array Data |
|
|
38 | (1) |
|
2.3.2 Spiked Standard Approaches |
|
|
39 | (2) |
|
2.3.3 Geometric Mean and Linear Regression Normalization for Multiple Arrays |
|
|
41 | (1) |
|
2.3.4 Nonlinear Normalization for Multiple Arrays Using Smooth Curves |
|
|
42 | (2) |
|
2.4 Normalization Methods for cDNA Microarray Data |
|
|
44 | (8) |
|
2.4.1 Single-Array Normalization |
|
|
46 | (2) |
|
2.4.2 Multiple Slides Normalization |
|
|
48 | (1) |
|
2.4.3 ANOVA and Related Methods for Normalization |
|
|
49 | (1) |
|
2.4.4 Mixed-Model Method for Normalization |
|
|
50 | (1) |
|
|
51 | (1) |
|
2.5 Transformations and Replication |
|
|
52 | (4) |
|
2.5.1 Importance of Replication |
|
|
52 | (1) |
|
|
53 | (3) |
|
2.6 Analysis of the Alon Data Set |
|
|
56 | (1) |
|
2.7 Comparison of Normalization Strategies and Discussion |
|
|
56 | (5) |
3 Some Cluster Analysis Methods |
|
61 | (38) |
|
|
61 | (1) |
|
3.2 Reduction in the Dimension of the Feature Space |
|
|
62 | (1) |
|
|
63 | (1) |
|
3.4 Some Hierarchical Agglomerative Techniques |
|
|
64 | (4) |
|
3.5 kappa-Means Clustering |
|
|
68 | (1) |
|
3.6 Cluster Analysis with No A Priori Metric |
|
|
69 | (1) |
|
3.7 Clustering via Finite Mixture Models |
|
|
69 | (3) |
|
|
69 | (2) |
|
3.7.2 Advantages of Model-Based Clustering |
|
|
71 | (1) |
|
3.8 Fitting Mixture Models Via the EM Algorithm |
|
|
72 | (3) |
|
|
73 | (1) |
|
|
74 | (1) |
|
3.8.3 Choice of Starting Values for the EM Algorithm |
|
|
75 | (1) |
|
3.9 Clustering Via Normal Mixtures |
|
|
75 | (22) |
|
3.9.1 Heteroscedastic Components |
|
|
75 | (1) |
|
3.9.2 Homoscedastic Components |
|
|
76 | (1) |
|
3.9.3 Spherical Components |
|
|
76 | (1) |
|
|
77 | (1) |
|
|
77 | (1) |
|
3.10 Mixtures of τ Distributions |
|
|
78 | (1) |
|
3.11 Mixtures of Factor Analyzers |
|
|
78 | (2) |
|
3.12 Choice of Clustering Solution |
|
|
80 | (1) |
|
3.13 Classification ML Approach |
|
|
81 | (1) |
|
3.14 Mixture Models for Clinical and Microarray Data |
|
|
82 | (2) |
|
3.14.1 Unconditional Approach |
|
|
83 | (1) |
|
3.14.2 Conditional Approach |
|
|
84 | (1) |
|
3.15 Choice of the Number of Components in a Mixture Model |
|
|
84 | (2) |
|
3.15.1 Order of a Mixture Model |
|
|
84 | (1) |
|
3.15.2 Approaches for As ssing Mixture Order |
|
|
84 | (1) |
|
3.15.3 Bayesian Informatio Criterion |
|
|
85 | (1) |
|
3.15.4 Integrated Classificati n Likelihood Criterion |
|
|
85 | (1) |
|
|
86 | (1) |
|
3.17 Other Resampling Approaches for Number of Clusters |
|
|
87 | (1) |
|
|
87 | (1) |
|
3.17.2 The Clest Method for the Number of Clusters |
|
|
88 | (1) |
|
3.18 Simulation Results for Two Resampling Approaches |
|
|
88 | (3) |
|
3.19 Principal Component Analysis |
|
|
91 | (3) |
|
|
91 | (2) |
|
3.19.2 Singular Value Decomposition |
|
|
93 | (1) |
|
3.19.3 Some Other Multivariate Exploratory Methods |
|
|
94 | (1) |
|
3.20 Canonical Variate Analysis |
|
|
94 | (3) |
|
3.20.1 Linear Projections with Group Structure |
|
|
94 | (1) |
|
3.20.2 Canonical Variates |
|
|
95 | (2) |
|
3.21 Partial Least Squares |
|
|
97 | (2) |
4 Clustering of Tissue Samples |
|
99 | (34) |
|
|
99 | (1) |
|
|
100 | (1) |
|
4.3 Two Clustering Problems |
|
|
101 | (1) |
|
4.4 Principal Component Analysis |
|
|
102 | (1) |
|
4.5 The EMMIX-GENE Clustering Procedure |
|
|
103 | (1) |
|
4.6 Step 1: Screening of Genes |
|
|
104 | (1) |
|
4.7 Step 2: Clustering of Genes: Formation of Metagenes |
|
|
105 | (2) |
|
4.8 Step 3: Clustering of Tissues |
|
|
107 | (1) |
|
|
108 | (1) |
|
4.10 Example: Clustering of Alon Data |
|
|
108 | (4) |
|
4.10.1 Clustering on Basis of 446 Genes |
|
|
108 | (1) |
|
4.10.2 Clustering on Basis of Gene Groups |
|
|
109 | (3) |
|
4.10.3 Clustering on Basis of Metagenes |
|
|
112 | (1) |
|
4.11 Example: Clustering of van't Veer Data |
|
|
112 | (12) |
|
4.11.1 Screening and Clustering of Genes |
|
|
113 | (2) |
|
4.11.2 Usefulness of the Selected Genes |
|
|
115 | (6) |
|
4.11.3 Clustering of Tissues |
|
|
121 | (2) |
|
4.11.4 Use of Underlying Signatures with Clinical Data |
|
|
123 | (1) |
|
4.12 Choosing the Number of Clusters in Microarray Data |
|
|
124 | (1) |
|
4.12.1 Some Previous Attempts |
|
|
124 | (1) |
|
4.13 Likelihood Ratio Test Applied to Microarray Data |
|
|
125 | (3) |
|
|
125 | (1) |
|
|
126 | (1) |
|
|
127 | (1) |
|
|
127 | (1) |
|
4.14 Effect of Selection Bias on the Number of Clusters |
|
|
128 | (1) |
|
4.15 Clustering on Microarray arid-Clinical Data |
|
|
128 | (2) |
|
|
130 | (3) |
5 Screening and Clustering of Genes |
|
133 | (52) |
|
5.1 Detection of Differentially Expressed Genes |
|
|
133 | (4) |
|
|
133 | (1) |
|
|
134 | (1) |
|
5.1.3 Multiplicity Problem |
|
|
134 | (1) |
|
5.1.4 Overview of Literature |
|
|
135 | (2) |
|
5.2 Test of a Single Hypothesis |
|
|
137 | (1) |
|
|
138 | (1) |
|
5.3.1 Calculation of Interactions via ANOVA Models |
|
|
138 | (1) |
|
5.3.2 Two-Sample τ-Statistics |
|
|
139 | (1) |
|
5.4 Multiple Hypothesis Testing |
|
|
139 | (5) |
|
5.4.1 Outcomes with Multiple Hypotheses |
|
|
140 | (1) |
|
5.4.2 Controlling the FWER |
|
|
140 | (1) |
|
5.4.3 False Discovery Rate (FDR) |
|
|
141 | (1) |
|
5.4.4 Benjamini-Hochberg Procedure |
|
|
142 | (1) |
|
5.4.5 False Nondiscovery Rate (FNR) |
|
|
143 | (1) |
|
|
143 | (1) |
|
|
143 | (1) |
|
5.4.8 Linking False Rates with Posterior Probabilities |
|
|
143 | (1) |
|
5.5 Null Distribution of Test Statistic |
|
|
144 | (4) |
|
|
144 | (1) |
|
5.5.2 Null Replications of the Test Statistic |
|
|
145 | (1) |
|
|
146 | (1) |
|
5.5.4 Application of SAM Method to Alon Data |
|
|
146 | (2) |
|
5.6 Recent Approaches for Strong Control of the FDR |
|
|
148 | (6) |
|
|
148 | (1) |
|
5.6.2 Technical Definition of q-Value |
|
|
149 | (1) |
|
5.6.3 Controlling FDR Strongly |
|
|
150 | (1) |
|
5.6.4 Selecting Genes via the q-Value |
|
|
151 | (1) |
|
5.6.5 Application to Hedenfalk Data |
|
|
152 | (2) |
|
5.7 Two-Component Mixture Model Framework |
|
|
154 | (4) |
|
5.7.1 Definition of Model |
|
|
154 | (1) |
|
|
155 | (1) |
|
|
155 | (1) |
|
5.7.4 Bayes Risk in terms of Estimated FDR and FNR |
|
|
156 | (2) |
|
5.8 Nonparametric Empirical Bayes Approach |
|
|
158 | (2) |
|
5.8.1 Method of Efron et al. (2001) |
|
|
158 | (1) |
|
5.8.2 Mixture Model Method (MMM) |
|
|
158 | (1) |
|
5.8.3 Nonparametric Bayesian Approach |
|
|
159 | (1) |
|
5.8.4 Application of Empirical Bayes Methods to Alon Data |
|
|
159 | (1) |
|
5.9 Parametric Mixture Models for Differential Gene Expression |
|
|
160 | (6) |
|
5.9.1 Parametric Empirical Bayes Methods |
|
|
160 | (4) |
|
5.9.2 Finding Clusters of Differentially Expressed Genes |
|
|
164 | (1) |
|
5.9.3 Example: Fitting Normal Mixtures to τ-Statistic Values |
|
|
165 | (1) |
|
5.10 Use of the Rho-Value as a Summary Statistic |
|
|
166 | (5) |
|
5.10.1 Beta Mixture for Distribution of Rho-Values |
|
|
168 | (1) |
|
5.10.2 Example: Fitting Beta Mixtures to Rho-Values |
|
|
169 | (2) |
|
|
171 | (2) |
|
5.12 Finding Correlated Genes |
|
|
173 | (1) |
|
5.13 Clustering of Genes via Full Expression Profiles |
|
|
173 | (1) |
|
5.14 Clustering of Genes via PCA of Expression Profiles |
|
|
174 | (1) |
|
5.15 Clustering of Genes with Repeated Measurements |
|
|
175 | (2) |
|
5.15.1 A Mixture Model for Technical Replicates |
|
|
175 | (1) |
|
5.15.2 Application of EM Algorithm |
|
|
176 | (1) |
|
|
176 | (1) |
|
|
177 | (8) |
|
|
177 | (1) |
|
5.16.2 Methodology and implementation |
|
|
177 | (1) |
|
5.16.3 Optimal cluster size via the Gap statistic |
|
|
178 | (1) |
|
5.16.4 Supervised Gene Shaving |
|
|
179 | (1) |
|
|
179 | (1) |
|
|
180 | (5) |
6 Discriminant Analysis |
|
185 | (36) |
|
|
185 | (1) |
|
|
185 | (2) |
|
|
187 | (1) |
|
6.4 Decision-Theoretic Approach |
|
|
187 | (2) |
|
|
189 | (1) |
|
6.6 Different Types of Error Rates |
|
|
190 | (1) |
|
6.7 Sample-Based Discriminant Rules |
|
|
191 | (1) |
|
6.8 Parametric Discriminant Rules |
|
|
192 | (1) |
|
6.9 Discrimination via Normal Models |
|
|
193 | (6) |
|
6.9.1 Heteroscedastic Normal Model |
|
|
193 | (1) |
|
6.9.2 Plug-in Sample NQDR |
|
|
194 | (1) |
|
6.9.3 Homoscedastic Normal Model |
|
|
195 | (2) |
|
6.9.4 Optimal Error Rates |
|
|
197 | (1) |
|
6.9.5 Plug-in Sample NLDR |
|
|
197 | (1) |
|
6.9.6 Normal Mixture Model |
|
|
198 | (1) |
|
6.10 Fisher's Linear Discriminant Function |
|
|
199 | (2) |
|
6.10.1 Separation Approach |
|
|
199 | (1) |
|
6.10.2 Regression Approach |
|
|
199 | (2) |
|
6.11 Logistic Discrimination |
|
|
201 | (1) |
|
6.12 Nearest-Centroid Rule |
|
|
202 | (1) |
|
6.13 Support Vector Machines |
|
|
203 | (4) |
|
|
203 | (1) |
|
6.13.2 Selection of Feature Variables |
|
|
204 | (1) |
|
|
205 | (1) |
|
|
206 | (1) |
|
6.14 Variants of Support Vector Machines |
|
|
207 | (1) |
|
|
207 | (1) |
|
6.16 Nearest-Neighbor Rules |
|
|
208 | (2) |
|
|
208 | (1) |
|
6.16.2 Definition of a kappa-NN Rule |
|
|
209 | (1) |
|
6.17 Classification Trees |
|
|
210 | (1) |
|
6.18 Error-Rate Estimation |
|
|
211 | (2) |
|
6.18.1 Apparent Error Rate |
|
|
211 | (2) |
|
6.18.2 Bias Correction of the Apparent Error Rate |
|
|
213 | (1) |
|
|
213 | (1) |
|
6.19.1 Leave-One-Out(LOO) Estimator |
|
|
213 | (1) |
|
6.19.2 q-Fold Cross-Validation |
|
|
214 | (1) |
|
6.20 Error-Rate Estimation via the Bootstrap |
|
|
214 | (2) |
|
6.20.1 The 0.632 Estimator |
|
|
214 | (1) |
|
6.20.2 Mean Squared Error of the Estimated Error Rate |
|
|
215 | (1) |
|
6.21 Selection of Feature Variables |
|
|
216 | (2) |
|
6.22 Error-Rate Estimation with Selection Bias |
|
|
218 | (3) |
|
|
218 | (1) |
|
6.22.2 External Cross-Validation |
|
|
218 | (1) |
|
6.22.3 The 0.632+ Estimator |
|
|
219 | (2) |
7 Supervised Classification of Tissue Samples |
|
221 | (32) |
|
|
221 | (1) |
|
7.2 Reducing the Dimension of the Feature Space of Genes |
|
|
222 | (2) |
|
7.2.1 Principal Components |
|
|
223 | (1) |
|
7.2.2 Partial Least Squares |
|
|
223 | (1) |
|
|
223 | (1) |
|
|
224 | (1) |
|
7.3 SVM with Recursive Feature Elimination (RFE) |
|
|
224 | (2) |
|
7.4 Selection Bias: SVM with RFE |
|
|
226 | (2) |
|
7.5 Selection Bias: Fisher's Rule with Forward Selection |
|
|
228 | (2) |
|
7.6 Selection Bias: Noninformative Data |
|
|
230 | (2) |
|
7.7 Discussion of Selection Bias |
|
|
232 | (1) |
|
7.8 Selection of Marker Genes with SVM |
|
|
233 | (3) |
|
7.8.1 Description of van de Vijver Breast Cancer Data |
|
|
233 | (1) |
|
7.8.2 Application of SVM with RFE |
|
|
234 | (2) |
|
7.9 Nearest-Shrunken Centroids |
|
|
236 | (3) |
|
|
236 | (3) |
|
7.10 Comparison of Nearest-Shrunken Centroids with SVM |
|
|
239 | (6) |
|
|
239 | (1) |
|
7.10.2 van de Vijver Data |
|
|
239 | (6) |
|
7.11 Selection Bias Working with the Top 70 Genes |
|
|
245 | (4) |
|
7.11.1 Bias in Error Rates |
|
|
245 | (1) |
|
7.11.2 Bias in Comparative Studies of Error Rates |
|
|
246 | (2) |
|
|
248 | (1) |
|
7.12 Discriminant Rules Via Initial Grouping of Genes |
|
|
249 | (4) |
|
7.12.1 Supervised Version of EMMIX-GENE |
|
|
249 | (1) |
|
7.12.2 Bayesian Tree Classification |
|
|
249 | (1) |
|
|
249 | (1) |
|
|
250 | (1) |
|
7.12.5 Grouping of Genes via Supervised Procedures |
|
|
250 | (3) |
8 Linking Microarray Data with Survival Analysis |
|
253 | (14) |
|
|
253 | (1) |
|
8.2 Four Lung Cancer Data Sets |
|
|
254 | (1) |
|
8.3 Statistical Analysis of Two Data Sets |
|
|
255 | (1) |
|
|
256 | (5) |
|
|
256 | (3) |
|
|
259 | (1) |
|
8.4.3 Discriminant Analysis |
|
|
260 | (1) |
|
|
261 | (5) |
|
8.5.1 Cluster Analysis of AC Tumors |
|
|
262 | (1) |
|
|
263 | (3) |
|
8.5.3 Discriminant Analysis |
|
|
266 | (1) |
|
|
266 | (1) |
References |
|
267 | (30) |
Author Index |
|
297 | (16) |
Subject Index |
|
313 | |