Why should you prefer SVD over EIG while performing Linear Principal Component Analysis?
Lets deliberate!
Simply put, amongst many other applications of Principal Component Analysis (PCA), the main purpose is to identify patterns to reduce dimensions of the dataset with minimal loss of information. Usually, PCA is explained through an Eigen decomposition of the covariance matrix. Eigen decomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors (to learn about the functionality of PCA, read Implementing a Principal Component Analysis PCA).
However, in the real world, we are often advised to obtain eigenvalues and eigenvectors using SVD. Before we get to the why of it, let me briefly put forth the concept of SVD.
Singular value decomposition takes a rectangular matrix A, where A is a n x p matrix. The SVD theorem states:
Where, U^TU = Inxn and V^TV = Ipxp (i.e. U and V are orthogonal)
Where the columns of U are the left singular vectors ; S (the same dimensions as A) has singular values and is diagonal ; and VT has rows that are the right singular vectors . The SVD represents an expansion of the original data in a coordinate system where the covariance matrix is diagonal.
Calculating the SVD consists of finding the eigenvalues and eigenvectors of AA^T and A^TA. The eigenvectors of A^TA make up the columns of V , the eigenvectors of AA^T make up the columns of U. Also, the singular values in S are square roots of eigenvalues from AA^T or A^TA. The singular values are the diagonal entries of the S matrix and are arranged in descending order. (to understand this in detail with the help of an example, watch Singular Value Decomposition — by Professor Gilbert Strang). This is how all the elements of SVD are obtained (i.e. eigenvalues and eigenvectors).
Now, the real question!
Why are we advised to use SVD and not EIG to perform PCA? Why does sklearn.decomposition
.PCA uses SVD of the data to project it to a lower dimensional space? Let’s understand why!
Mathematically, there is no difference whether you calculate PCA on the data matrix directly(SVD) or on its covariance matrix(EIG). Read the proof — Relationship between PCA and SVD by Bastian Rieck)
The difference is purely due to numerical precision and complexity. The eigen decomposition has a complexity of O (n³), the SVD is generally faster in practice. Moreover, it does not involve calculating the covariance matrix, which makes the SVD numerically more stable.