Stat 991, Spring 2010
Multivariate
Analysis, Dimensionality Reduction, and Spectral Methods
Syllabus:
Modern statistical approaches on large datasets must directly analyze
and manipulate data in either matrix or vector formats. This course
will focus on the statistical theory and practice of manipulating such
data. The topics covered will be: multivariate analysis,
dimensionality reduction, convexity issues of working with matrices,
and spectral methods.
With regards to dimensionality reduction, we will cover PCA, CCA, and
random projections (e.g. Johnson-Lindenstrauss) and examine potential
applications. With regards to convexity issues, the course will
examine the rudimentary question of how accurate is an SVD of a random
matrix (we will examine a generalization of the Chernoff method to
matrices). Other potential topics may include matrix completion
(filling in the entries of a matrix with missing entries), subspace
identification (e.g. learning time series models like Kalman filters
based on a multivariate, covariance analysis), locality sensitive
hashing (randomly projecting data for efficient storage and recall),
matrix based regularization methods (and related convexity issues),
and kernel methods/Gaussian process regression.
The major topics discussed in the course will include the following:
- Dimensionality reduction, including SVD and random projections.
- Accuracy of SVD: How accurate is the subspace found? We will
cover a recent concentration result for random matrices.
- Implications for learning: We will examine projecting the data to
low dimensional spaces and then learning on these lower dim
spaces. We will cover both regression and clustering here.
- CCA, Subspace ID and Time Series: how to (probably learn) state
space models, including Kalman filters and HMMs.
- Matrix Completion: how to fill in missing entries of a matrix?
Prerequisites:
The course is appropriate for a graduate student with some background in
statistics and machine learning. The course will assume a basic level of mathematical maturity, so
please contact the instructor if you have concerns.
Requirements:
As this is an advanced grad course, the point is for you
to just learn the material on your own. For requirements, I'd like you
to read the notes, give me corrections if you find them, and
write a short informal (typed) summary of any (subset) of the
following: 1) questions/insights about the course notes 2) possible
research directions 3) how ideas might be related to your work 4) even
just some different derivation you like 5) or a related paper you read
6) points that were unclear. You don't need to write more than a page.
Also, I'd like you to find related papers to the material we cover,
which overlaps with your research interests.
Instructor:
Time and location:
Time: | MW : 1:30 - 3:00 |
Location: | JMHH F88 |
Schedule and notes:
- Weds 1/13/10
- The Singular Value Decomposition and the best fitting subspace
- lecture notes pdf
- further reading:
Spectral Algorithms, Ch 1. pdf
- Weds 1/20/10
- Applications of SVD
- lecture notes pdf
- further reading:
Latent Semantic Analysis Wikipedia
page
- Mon 1/25/10
- Random projections and Johnson-Lindenstrauss lemma,
- lecture notes pdf
- further reading:
S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. pdf
- Weds & Mon, 1/27/10 & 2/1/10
- Recall and Locality Sensitive Hashing
- See the survey paper here.
- Weds 2/3/10
- Ridge Regression and Dimensionality Reduction
- lecture notes pdf
- Mon 2/8/10
- Margin-Based Classification and Dimensionality Reduction,
- lecture notes pdf
- further reading:
N. Balcan, A. Blum, S. Vempala. On Kernels, Margins and Low-dimensional Mappings. pdf
R. Arriaga. S. Vempala. An algorithmic theory of learning: Robust Concepts and Random Projection.
pdf
- Mon 2/15/10
- Weds 2/17/10
- Clustering in High Dimensions
- lecture notes pdf
- further reading:
Single Linkage Clustering. Wikipedia
page
- Mon 2/22/10
- Clustering and PCA
- further reading:
Spectral Algorithms, Ch 2. pdf
- Weds 2/24/10
- Clustering and Non-Isotropic Mixtures
- further reading:
D. Achlioptas and F. McSherry On Spectral Learning of Mixtures of Distributions. pdf
- Mon 3/1/10
- Clustering, Random Projections, and Non-Isotropic PCA
- further reading:
S. Dasgupta. Learning mixtures of Gaussians. pdf
N. Srebro, G. Shakhnarovich, S. Roweis. An Investigation of Computational and Informational Limits in Gaussian Mixture Clustering. pdf
S. Brubaker and S. Vempala. N. Srebro. Isotropic PCA and Affine-Invariant Clustering. pdf
- Weds 3/3/10
- Isotropic PCA,Probabilistic Latent Semantic Indexing, Non-negative Matrix
Factorization
- further reading:
probabilistic LSI Wikipedia
page
T Hofmann. Probabilistic Latent Semantic Indexing. pdf
Non-negative Matrix
Factorization Wikipedia
page
D. Donoho & V. Stodden. When Does Non-Negative Matrix Factorization. Give a Correct Decomposition into Parts? pdf
- Mon 3/15/10
- Graph Laplacians and Spectral Clustering
- further reading:
Graph Laplacians.Wikipedia
page
U von Luxburg. A Tutorial on Spectral Clustering. pdf
- Weds 3/17/10
- Graph Laplacians and Semi-Supervised Learning
- further reading:
Zhu, X., Lafferty, J., and Ghahramani, Z.
Semi-Supervised Learning: From Gaussian Fields to Gaussian Processes. pdf
M. Belkin, V. Sindhwani, and P. Niyogi.Manifold Regularization: a Geometric Framework for Learning from Examples.
pdf
- Mon 3/22/10
- Multidimensional Scaling and SMACOF
- further reading:
The first ref provides a nice intro to classical scaling and
background, along with a nice extension.
L. Cayton and S. Dasgupta. Robust Euclidean embedding. pdf
Multidimensional ScalingWikipedia
page
Stress Majorization. Wikipedia
page
J. de Leeuw & P. Mair. Multidimensional Scaling Using Majorization: SMACOF in R. pdf
Ch 14. The Elements of
Statistical Learning: Data Mining, Inference, and Prediction. pdf
- Weds 3/24/10
- Mon 3/29/10
- Multi-Dimensional Scaling and non-linear dimensionality reduction
- further reading:
Ch 14. The Elements of
Statistical Learning: Data Mining, Inference, and Prediction. pdf
- Weds 3/31/10
- CCA and Multi-View Learning
- further reading:
CCA. Wikipedia
page
D. Foster, S. M. Kakade, T. Zhang. Multi-View Dimensionality
Reduction via Canonical Correlation Analysis. pdf
- Mon 4/5/10
- Spectral methods for learning time series.
- lecture notes pdf
- Weds 4/7/10
- Spectral methods for learning Kalman filters.
- lecture notes pdf
- further reading:
P.Van Overschee and B.De Moor. Subspace Identification for Linear Systems.
ps.gz
- Mon 4/12/10
- Matrix representations for HMMs.
- lecture notes pdf
- further reading: these also make use of these matrix based
representations
M. Littman, R. Sutton, & S. Singh. Predictive Representations
of State. pdf
H. Jaeger. Observable Operator Models for Discrete Stochastic Time Series.
pdf
- Weds 4/14/10
- Spectral methods for learning HMMs. Relaxing the rank
conditions.
- lecture notes pdf
- further reading:
D. Hsu, S. M. Kakade, T. Zhang. A Spectral Algorithm for Learning Hidden Markov Models. pdf
- Weds 4/19/10
- Concentration results for random matrices.
- lecture notes:
Basic statements. pdf
Derivations. pdf
- further reading: these papers have "Bernstein" like bounds,
for variance control.
B. Recht. A Simpler Approach to Matrix Completion.
pdf
See R. Vershynin's notes on the Ahlswede-Winter method and the
Golden-Thompson proof.
website
- Weds 4/21/10 & Mon 4/26/10
- Matrix norms and matrix regularization.
- further reading:
S. M. Kakade, S. Shalev-Shwartz, A. Tewari. Applications of strong convexity--strong smoothness duality to learning with matrices. pdf