What is the difference between singular value decomposition, and principal component analysis?… by professional stude
I'm already aware of the wikipedia pages.
Best Answer:
Singular value decomposition and principal component analysis
Michael E. Wall, Andreas Rechtsteiner, Luis M. Rocha
Modeling, Algorithms, and Informatics Group (CCS-3)
Los Alamos National Laboratory, MS B256
Los Alamos, New Mexico 87545, USA
Citation: Wall, Michael E., Andreas Rechtsteiner, Luis M. Rocha.”Singular value decomposition and principal component analysis”. in A Practical Approach to Microarray Data Analysis. D.P. Berrar, W. Dubitzky, M. Granzow, eds. pp. 91-109, Kluwer: Norwell, MA (2003). LANL LA-UR-02-4001.
Also available in the arXiv.org e-Print archive and in Adobe Acrobat (.pdf) format.
Abstract.
This chapter describes gene expression analysis by Singular Value Decomposition (SVD), emphasizing initial characterization of the data. We describe SVD methods for visualization of gene expression data, representation of the data using a smaller number of variables, and detection of patterns in noisy gene expression data. In addition, we describe the precise relation between SVD analysis and Principal Component Analysis (PCA) when PCA is calculated using the covariance matrix, enabling our descriptions to apply equally well to either method. Our aim is to provide definitions, interpretations, examples, and references that will serve as resources for understanding and extending the application of SVD and PCA to gene expression analysis.
Keywords: bioinformatics, computational biology, linear algebra, data mining, singular value decomposition, principal component analysis, gene expression analysis, SVD, PCA, microarray analysis techniques.
1. Introduction
One of the challenges of bioinformatics is to develop effective ways to analyze global gene expression data. A rigorous approach to gene expression analysis must involve an up-front characterization of the structure of the data. In addition to a broader utility in analysis methods, singular value decomposition (SVD) and principal component analysis (PCA) can be valuable tools in obtaining such a characterization. SVD and PCA are common techniques for analysis of multivariate data, and gene expression data are well suited to analysis using SVD/PCA. A single microarray For simplicity, we use the term microarray to refer to all varieties of global gene expression technologies.
Close experiment can generate measurements for thousands, or even tens of thousands of genes. Present experiments typically consist of less than ten assays, but can consist of hundreds (Hughes et al., 2000). Gene expression data are currently rather noisy, and SVD can detect and extract small signals from noisy data.
The goal of this chapter is to provide precise explanations of the use of SVD and PCA for gene expression analysis, illustrating methods using simple examples. We describe SVD methods for visualization of gene expression data, representation of the data using a smaller number of variables, and detection of patterns in noisy gene expression data. In addition, we describe the mathematical relation between SVD analysis and Principal Component Analysis (PCA) when PCA is calculated using the covariance matrix, enabling our descriptions to apply equally well to either method. Our aims are 1) to provide descriptions and examples of the application of SVD methods and interpretation of their results; 2) to establish a foundation for understanding previous applications of SVD to gene expression analysis; and 3) to provide interpretations and references to related work that may inspire new advances.
In section 1, the SVD is defined, with associations to other methods described. A summary of previous applications is presented in order to suggest directions for SVD analysis of gene expression data. In section 2 we discuss applications of SVD to gene expression analysis, including specific methods for SVD-based visualization of gene expression data, and use of SVD in detection of weak expression patterns. Some examples are given of previous applications of SVD to analysis of gene expression data. Our discussion in section 3 gives some general advice on the use of SVD analysis on gene expression data, and includes references to specific published SVD-based methods for gene expression analysis. Finally, in section 4, we provide information on some available resources and further reading.
1.1 Mathematical definition of the SVD Complete understanding of the material in this chapter requires a basic understanding of linear algebra. We find mathematical definitions to be the only antidote to the many confusions that can arise in discussion of SVD and PCA.
Close
Let X denote an m x n matrix of real-valued data and rank The rank of a matrix is the number of linearly independent rows or columns.
Close r, where without loss of generality m=n, and therefore r = n. In the case of microarray data, xij is the expression level of the ith gene in the jth assay. The elements of the ith row of X form the n-dimensional vector gi, which we refer to as the transcriptional response of the ith gene. Alternatively, the elements of the jth column of X form the m-dimensional vector aj, which we refer to as the expression profile of the jth assay.
The equation for singular value decomposition of X is the following:
(5.1)
where U is an m x n matrix, S is an n x n diagonal matrix, and VT is also an n x n matrix. The columns of U are called the left singular vectors, {uk}, and form an orthonormal basis for the assay expression profiles, so that ui
