Stat 991, Spring 2010

Multivariate Analysis, Dimensionality Reduction, and Spectral Methods

Syllabus:

Modern statistical approaches on large datasets must directly analyze and manipulate data in either matrix or vector formats. This course will focus on the statistical theory and practice of manipulating such data. The topics covered will be: multivariate analysis, dimensionality reduction, convexity issues of working with matrices, and spectral methods.

With regards to dimensionality reduction, we will cover PCA, CCA, and random projections (e.g. Johnson-Lindenstrauss) and examine potential applications. With regards to convexity issues, the course will examine the rudimentary question of how accurate is an SVD of a random matrix (we will examine a generalization of the Chernoff method to matrices). Other potential topics may include matrix completion (filling in the entries of a matrix with missing entries), subspace identification (e.g. learning time series models like Kalman filters based on a multivariate, covariance analysis), locality sensitive hashing (randomly projecting data for efficient storage and recall), matrix based regularization methods (and related convexity issues), and kernel methods/Gaussian process regression.

The major topics discussed in the course will include the following:

Dimensionality reduction, including SVD and random projections.
Accuracy of SVD: How accurate is the subspace found? We will cover a recent concentration result for random matrices.
Implications for learning: We will examine projecting the data to low dimensional spaces and then learning on these lower dim spaces. We will cover both regression and clustering here.
CCA, Subspace ID and Time Series: how to (probably learn) state space models, including Kalman filters and HMMs.
Matrix Completion: how to fill in missing entries of a matrix?

Prerequisites:

The course is appropriate for a graduate student with some background in statistics and machine learning. The course will assume a basic level of mathematical maturity, so please contact the instructor if you have concerns.

Requirements:

As this is an advanced grad course, the point is for you to just learn the material on your own. For requirements, I'd like you to read the notes, give me corrections if you find them, and write a short informal (typed) summary of any (subset) of the following: 1) questions/insights about the course notes 2) possible research directions 3) how ideas might be related to your work 4) even just some different derivation you like 5) or a related paper you read 6) points that were unclear. You don't need to write more than a page. Also, I'd like you to find related papers to the material we cover, which overlaps with your research interests.

Instructor:

Sham Kakade skakade at wharton.upenn.edu

Time and location:

Time:	MW : 1:30 - 3:00
Location:	JMHH F88

Schedule and notes:

Weds 1/13/10
- The Singular Value Decomposition and the best fitting subspace
- lecture notes pdf
- further reading:
  Spectral Algorithms, Ch 1. pdf

Weds 1/20/10
- Applications of SVD
- lecture notes pdf
- further reading:
  Latent Semantic Analysis Wikipedia page

Mon 1/25/10
- Random projections and Johnson-Lindenstrauss lemma,
- lecture notes pdf
- further reading:
  S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. pdf

Weds & Mon, 1/27/10 & 2/1/10
- Recall and Locality Sensitive Hashing
- See the survey paper here.

Weds 2/3/10
- Ridge Regression and Dimensionality Reduction
- lecture notes pdf

Mon 2/8/10
- Margin-Based Classification and Dimensionality Reduction,
- lecture notes pdf
- further reading:
  N. Balcan, A. Blum, S. Vempala. On Kernels, Margins and Low-dimensional Mappings. pdf
  R. Arriaga. S. Vempala. An algorithmic theory of learning: Robust Concepts and Random Projection. pdf

Mon 2/15/10
- Learning and Random Features
- lecture notes pdf
- further reading:
  Random Features Page

Weds 2/17/10
- Clustering in High Dimensions
- lecture notes pdf
- further reading:
  Single Linkage Clustering. Wikipedia page

Mon 2/22/10
- Clustering and PCA
- further reading:
  Spectral Algorithms, Ch 2. pdf

Weds 2/24/10
- Clustering and Non-Isotropic Mixtures
- further reading:
  D. Achlioptas and F. McSherry On Spectral Learning of Mixtures of Distributions. pdf

Mon 3/1/10
- Clustering, Random Projections, and Non-Isotropic PCA
- further reading:
  S. Dasgupta. Learning mixtures of Gaussians. pdf
  N. Srebro, G. Shakhnarovich, S. Roweis. An Investigation of Computational and Informational Limits in Gaussian Mixture Clustering. pdf
  S. Brubaker and S. Vempala. N. Srebro. Isotropic PCA and Affine-Invariant Clustering. pdf

Weds 3/3/10
- Isotropic PCA,Probabilistic Latent Semantic Indexing, Non-negative Matrix Factorization
- further reading:
  probabilistic LSI Wikipedia page
  T Hofmann. Probabilistic Latent Semantic Indexing. pdf
  Non-negative Matrix Factorization Wikipedia page
  D. Donoho & V. Stodden. When Does Non-Negative Matrix Factorization. Give a Correct Decomposition into Parts? pdf

Mon 3/15/10
- Graph Laplacians and Spectral Clustering
- further reading:
  Graph Laplacians.Wikipedia page
  U von Luxburg. A Tutorial on Spectral Clustering. pdf

Weds 3/17/10
- Graph Laplacians and Semi-Supervised Learning
- further reading:
  Zhu, X., Lafferty, J., and Ghahramani, Z. Semi-Supervised Learning: From Gaussian Fields to Gaussian Processes. pdf
  M. Belkin, V. Sindhwani, and P. Niyogi.Manifold Regularization: a Geometric Framework for Learning from Examples. pdf

Mon 3/22/10
- Multidimensional Scaling and SMACOF
- further reading:
  The first ref provides a nice intro to classical scaling and background, along with a nice extension.
  L. Cayton and S. Dasgupta. Robust Euclidean embedding. pdf
  Multidimensional ScalingWikipedia page
  Stress Majorization. Wikipedia page
  J. de Leeuw & P. Mair. Multidimensional Scaling Using Majorization: SMACOF in R. pdf
  Ch 14. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. pdf

Weds 3/24/10
- No Class

Mon 3/29/10
- Multi-Dimensional Scaling and non-linear dimensionality reduction
- further reading:
  Ch 14. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. pdf

Weds 3/31/10
- CCA and Multi-View Learning
- further reading:
  CCA. Wikipedia page
  D. Foster, S. M. Kakade, T. Zhang. Multi-View Dimensionality Reduction via Canonical Correlation Analysis. pdf

Mon 4/5/10
- Spectral methods for learning time series.
- lecture notes pdf

Weds 4/7/10
- Spectral methods for learning Kalman filters.
- lecture notes pdf
- further reading:
  P.Van Overschee and B.De Moor. Subspace Identification for Linear Systems. ps.gz

Mon 4/12/10
- Matrix representations for HMMs.
- lecture notes pdf
- further reading: these also make use of these matrix based representations
  M. Littman, R. Sutton, & S. Singh. Predictive Representations of State. pdf
  H. Jaeger. Observable Operator Models for Discrete Stochastic Time Series. pdf

Weds 4/14/10
- Spectral methods for learning HMMs. Relaxing the rank conditions.
- lecture notes pdf
- further reading:
  D. Hsu, S. M. Kakade, T. Zhang. A Spectral Algorithm for Learning Hidden Markov Models. pdf

Weds 4/19/10
- Concentration results for random matrices.
- lecture notes:
  Basic statements. pdf
  Derivations. pdf
- further reading: these papers have "Bernstein" like bounds, for variance control.
  B. Recht. A Simpler Approach to Matrix Completion. pdf
  See R. Vershynin's notes on the Ahlswede-Winter method and the Golden-Thompson proof. website

Weds 4/21/10 & Mon 4/26/10
- Matrix norms and matrix regularization.
- further reading:
  S. M. Kakade, S. Shalev-Shwartz, A. Tewari. Applications of strong convexity--strong smoothness duality to learning with matrices. pdf