The Elements of Statistical Learning

by Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome H.

ISBN13: 9780387952840

ISBN10: 0387952845

Format: Hardcover

Pub. Date: 2001-10-01

Publisher(s): Springer Verlag

Other versions by this Author

List Price: ~~$104.53~~

Rent Textbook

Select for Price

Add to Cart

There was a problem. Please try again later.

Rent Digital

Online:30 Days access
Downloadable:30 Days

$32.04

Online:60 Days access
Downloadable:60 Days

$42.72

Online:90 Days access
Downloadable:90 Days

$53.40

Online:120 Days access
Downloadable:120 Days

$64.08

Online:180 Days access
Downloadable:180 Days

$69.42

Online:1825 Days access
Downloadable:Lifetime Access

$106.80

$69.42

Add to Cart

New Textbook

We're Sorry
Sold Out

Used Textbook

We're Sorry
Sold Out

Buy from our Marketplace starting at $34.05

Summary

During the past decade there has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics.Many of these tools have common underpinnings but are often expressed with different terminology. This book descibes theimprtant ideas in these areas ina common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a vluable resource for statisticians and anyone interested in data mining in science or industry.The book's coverage is broad, from supervised learing (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting--the first comprehensive treatment of this topic in any book.Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.

Preface

vii

Introduction

(8)

Overview of Supervised Learning

(32)

Introduction

(1)

Variable Types and Terminology

(2)

Two Simple Approaches to Prediction: Least Squares and Nearest Neighbors

(7)

Linear Models and Least Squares

(3)

Nearest-Neighbor Methods

(2)

From Least Squares to Nearest Neighbors

(2)

Statistical Decision Theory

(4)

Local Methods in High Dimensions

(6)

Statistical Models, Supervised Learning and Function Approximation

(4)

A Statistical Model for the Joint Distribution Pr (X, Y)

(1)

Supervised Learning

(1)

Function Approximation

(3)

Structured Regression Models

(1)

Difficulty of the Problem

(1)

Classes of Restricted Estimators

(4)

Roughness Penalty and Bayesian Methods

(1)

Kernel Methods and Local Regression

(1)

Basis Functions and Dictionary Methods

(2)

Model Selection and the Bias --- Variance Tradeoff

(4)

Bibliographic Notes

(1)

Exercises

(2)

Linear Methods for Regression

(38)

Introduction

(1)

Linear Regression Models and Least Squares

(8)

Example: Prostate Cancer

(2)

The Gauss --- Markov Theorem

(1)

Multiple Regression from Simple Univariate Regression

(5)

Multiple Outputs

(1)

Subset Selection and Coefficient Shrinkage

(20)

Subset Selection

(2)

Prostate Cancer Data Example (Continued)

(2)

Shrinkage Methods

(7)

Methods Using Derived Input Directions

(2)

Discussion: A Comparison of the Selection and Shrinkage Methods

(5)

Multiple Outcome Shrinkage and Selection

(2)

Computational Considerations

(4)

Bibliographic Notes

(1)

Exercises

(4)

Linear Methods for Classification

(36)

Introduction

(2)

Linear Regression of an Indicator Matrix

(3)

Linear Discriminant Analysis

(11)

Regularized Discriminant Analysis

(1)

Computations for LDA

(1)

Reduced-Rank Linear Discriminant Analysis

(4)

Logistic Regression

(10)

Fitting Logistic Regression Models

(2)

Example: South African Heart Disease

100

(2)

Quadratic Approximations and Inference

102

(1)

Logistic Regression or LDA?

103

(2)

Separating Hyperplanes

105

(10)

Rosenblatt's Perceptron Learning Algorithm

107

(1)

Optimal Separating Hyperplanes

108

(3)

Bibliographic Notes

111

(1)

Exercises

111

(4)

Basis Expansions and Regularization

115

(50)

Introduction

115

(2)

Piecewise Polynomials and Splines

117

(9)

Natural Cubic Splines

120

(2)

Example: South African Heart Disease (Continued)

122

(2)

Example: Phoneme Recognition

124

(2)

Filtering and Feature Extraction

126

(1)

Smoothing Splines

127

(7)

Degrees of Freedom and Smoother Matrices

129

(5)

Automatic Selection of the Smoothing Parameters

134

(3)

Fixing the Degrees of Freedom

134

(1)

The Bias --- Variance Tradeoff

134

(3)

Nonparametric Logistic Regression

137

(1)

Multidimensional Splines

138

(6)

Regularization and Reproducing Kernel Hilbert Spaces

144

(4)

Spaces of Functions Generated by Kernels

144

(2)

Examples of RKHS

146

(2)

Wavelet Smoothing

148

(17)

Wavelet Bases and the Wavelet Transform

150

(3)

Adaptive Wavelet Filtering

153

(2)

Bibliographic Notes

155

(1)

Exercises

155

(5)

Appendix: Computational Considerations for Splines

160

(1)

Appendix: B-splines

160

(3)

Appendix: Computations for Smoothing Splines

163

(2)

Kernel Methods

165

(28)

One-Dimensional Kernel Smoothers

165

(7)

Local Linear Regression

168

(3)

Local Polynomial Regression

171

(1)

Selecting the Width of the Kernel

172

(2)

Local Regression in IRp

174

(1)

Structured Local Regression Models in IRp

175

(4)

Structured Kernels

177

(1)

Structured Regression Functions

177

(2)

Local Likelihood and Other Models

179

(3)

Kernel Density Estimation and Classification

182

(4)

Kernel Density Estimation

182

(2)

Kernel Density Classification

184

(1)

The Naive Bayes Classifier

184

(2)

Radial Basis Functions and Kernels

186

(2)

Mixture Models for Density Estimation and Classification

188

(2)

Computational Considerations

190

(3)

Bibliographic Notes

190

(1)

Exercises

190

(3)

Model Assessment and Selection

193

(32)

Introduction

193

(1)

Bias, Variance and Model Complexity

193

(3)

The Bias --- Variance Decomposition

196

(4)

Example: Bias --- Variance Tradeoff

198

(2)

Optimism of the Training Error Rate

200

(3)

Estimates of In-Sample Prediction Error

203

(2)

The Effective Number of Parameters

205

(1)

The Bayesian Approach and BIC

206

(2)

Minimum Description Length

208

(2)

Vapnik---Chernovenkis Dimension

210

(4)

Example (Continued)

212

(2)

Cross-Validation

214

(3)

Bootstrap Methods

217

(8)

Example (Continued)

220

(2)

Bibliographic Notes

222

(1)

Exercises

222

(3)

Model Inference and Averaging

225

(32)

Introduction

225

(1)

The Bootstrap and Maximum Likelihood Methods

225

(6)

A Smoothing Example

225

(4)

Maximum Likelihood Inference

229

(2)

Bootstrap versus Maximum Likelihood

231

(1)

Bayesian Methods

231

(4)

Relationship Between the Bootstrap and Bayesian Inference

235

(1)

The EM Algorithm

236

(7)

Two-Component Mixture Model

236

(4)

The EM Algorithm in General

240

(1)

EM as a Maximization-Maximization Procedure

241

(2)

MCMC for Sampling from the Posterior

243

(3)

Bagging

246

(4)

Example: Trees with Simulated Data

247

(3)

Model Averaging and Stacking

250

(3)

Stochastic Search: Bumping

253

(4)

Bibliographic Notes

254

(1)

Exercises

255

(2)

Additive Models, Trees, and Related Methods

257

(42)

Generalized Additive Models

257

(9)

Fitting Additive Models

259

(2)

Example: Additive Logistic Regression

261

(5)

Summary

266

(1)

Tree-Based Methods

266

(13)

Background

266

(1)

Regression Trees

267

(3)

Classification Trees

270

(2)

Other Issues

272

(3)

Spam Example (Continued)

275

(4)

PRIM --- Bump Hunting

279

(4)

Spam Example (Continued)

282

(1)

MARS: Multivariate Adaptive Regression Splines

283

(7)

Spam Example (Continued)

287

(1)

Example (Simulated Data)

288

(1)

Other Issues

289

(1)

Hierarchical Mixtures of Experts

290

(3)

Missing Data

293

(2)

Computational Considerations

295

(4)

Bibliographic Notes

295

(1)

Exercises

296

(3)

Boosting and Additive Trees

299

(48)

Boosting Methods

299

(4)

Outline of this Chapter

302

(1)

Boosting Fits an Additive Model

303

(1)

Forward Stagewise Additive Modeling

304

(1)

Exponential Loss and AdaBoost

305

(1)

Why Exponential Loss?

306

(2)

Loss Functions and Robustness

308

(4)

``Off-the-Shelf'' Procedures for Data Mining

312

(2)

Example --- Spam Data

314

(2)

Boosting Trees

316

(3)

Numerical Optimization

319

(4)

Steepest Descent

320

(1)

Gradient Boosting

320

(2)

MART

322

(1)

Right-Sized Trees for Boosting

323

(1)

Regularization

324

(7)

Shrinkage

326

(2)

Penalized Regression

328

(2)

Virtues of the L1 Penalty (Lasso) over L2

330

(1)

Interpretation

331

(4)

Relative Importance of Predictor Variables

331

(2)

Partial Dependence Plots

333

(2)

Illustrations

335

(12)

California Housing

335

(4)

Demographics Data

339

(1)

Bibliographic Notes

340

(4)

Exercises

344

(3)

Neural Networks

347

(24)

Introduction

347

(1)

Projection Pursuit Regression

347

(3)

Neural Networks

350

(3)

Fitting Neural Networks

353

(2)

Some Issues in Training Neural Networks

355

(4)

Starting Values

355

(1)

Overfitting

356

(2)

Scaling of the Inputs

358

(1)

Number of Hidden Units and Layers

358

(1)

Multiple Minima

359

(1)

Example: Simulated Data

359

(3)

Example: ZIP Code Data

362

(4)

Discussion

366

(1)

Computational Considerations

367

(4)

Bibliographic Notes

367

(2)

Exercises

369

(2)

Support Vector Machines and Flexible Discriminants

371

(40)

Introduction

371

(1)

The Support Vector Classifier

371

(6)

Computing the Support Vector Classifier

373

(2)

Mixture Example (Continued)

375

(2)

Support Vector Machines

377

(13)

Computing the SVM for Classification

377

(3)

The SVM as a Penalization Method

380

(1)

Function Estimation and Reproducing Kernels

381

(3)

SVMs and the Curse of Dimensionality

384

(1)

Support Vector Machines for Regression

385

(2)

Regression and Kernels

387

(2)

Discussion

389

(1)

Generalizing Linear Discriminant Analysis

390

(1)

Flexible Discriminant Analysis

391

(6)

Computing the FDA Estimates

394

(3)

Penalized Discriminant Analysis

397

(2)

Mixture Discriminant Analysis

399

(12)

Example: Waveform Data

402

(4)

Bibliographic Notes

406

(1)

Exercises

406

(5)

Prototype Methods and Nearest - Neighbors

411

(26)

Introduction

411

(1)

Prototype Methods

411

(4)

K-means Clustering

412

(2)

Learning Vector Quantization

414

(1)

Gaussian Mixtures

415

(1)

κ-Nearest-Neighbor Classifiers

415

(12)

Example: A Comparative Study

420

(2)

Example: κ-Nearest-Neighbors and Image Scene Classification

422

(1)

Invariant Metrics and Tangent Distance

423

(4)

Adaptive Nearest-Neighbor Methods

427

(5)

Example

430

(1)

Global Dimension Reduction for Nearest-Neighbors

431

(1)

Computational Considerations

432

(5)

Bibliographic Notes

433

(1)

Exercises

433

(4)

Unsupervised Learning

437

(72)

Introduction

437

(2)

Association Rules

439

(14)

Market Basket Analysis

440

(1)

The Apriori Algorithm

441

(3)

Example: Market Basket Analysis

444

(3)

Unsupervised as Supervised Learning

447

(2)

Generalized Association Rules

449

(2)

Choice of Supervised Learning Method

451

(1)

Example: Market Basket Analysis (Continued)

451

(2)

Cluster Analysis

453

(27)

Proximity Matrices

455

(1)

Dissimilarities Based on Attributes

455

(2)

Object Dissimilarity

457

(2)

Clustering Algorithms

459

(1)

Combinatorial Algorithms

460

(1)

K-means

461

(2)

Gaussian Mixtures as Soft K-means Clustering

463

(1)

Example: Human Tumor Microarray Data

463

(3)

Vector Quantization

466

(2)

K-medoids

468

(2)

Practical Issues

470

(2)

Hierarchical Clustering

472

(8)

Self-Organizing Maps

480

(5)

Principal Components, Curves and Surfaces

485

(9)

Principal Components

485

(6)

Principal Curves and Surfaces

491

(3)

Independent Component Analysis and Exploratory Projection Pursuit

494

(8)

Latent Variables and Factor Analysis

494

(2)

Independent Component Analysis

496

(4)

Exploratory Projection Pursuit

500

(1)

A Different Approach to ICA

500

(2)

Multidimensional Scaling

502

(7)

Bibliographic Notes

503

(1)

Exercises

504

(5)

References

509

(14)

Author Index

523

(4)

Index

527

Kids

Men

Unisex

Women

For You

For Your Car

For Your Home

For Your Pet

For Your Tech

Holiday

Home Decor

Mascot

Office Decor

Spirit

Stationary

Diploma Frames

Graduation Gear

Graduation Gifts

For Your Office

Clothing

The Elements of Statistical Learning

Rent Textbook

Rent Digital

New Textbook

Used Textbook

Summary

Table of Contents

The Elements of Statistical Learning

Rent Textbook

Rent Digital

New Textbook

Used Textbook

How Marketplace Works:

Summary

Table of Contents

Digital License