Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize.
2D example
First, consider a dataset in only two dimensions, like (height, weight). This dataset can be plotted as points in a plane. But if we want to tease out variation, PCA finds a new coordinate system in which every point has a new (x,y) value. The axes don't actually mean anything physical; they're combinations of height and weight called "principal components" that are chosen to give one axes lots of variation.Drag the points around in the following visualization to see PC coordinate system adjusts.
PCA
is useful for eliminating dimensions. Below, we've plotted the data
along a pair of lines: one composed of the x-values and another of the
y-values.
If we're going to only see the data along one dimension,
though, it might be better to make that dimension the principal
component with most variation. We don't lose much by dropping PC2 since it contributes the least to the variation in the data set.
3D example
With three dimensions, PCA is more useful, because it's hard to see through a cloud of data. In the example below, the original data are plotted in 3D, but you can project the data into 2D through a transformation no different than finding a camera angle: rotate the axes to find the best angle. To see the "official" PCA transformation, click the "Show PCA" button. The PCA transformation ensures that the horizontal axis PC1 has the most variation, the vertical axis PC2 the second-most, and a third axis PC3 the least. Obviously, PC3 is the one we drop.Eating in the UK (a 17D example)
Original example from Mark Richardson's class notes Principal Component Analysis
What if our data have way more than 3-dimensions? Like, 17
dimensions?! In the table is the average consumption of 17 types of
food in grams per person per week for every country in the UK.
The table shows some interesting variations across different food types, but overall differences aren't so notable. Let's see if PCA can eliminate dimensions to emphasize how countries differ.
The table shows some interesting variations across different food types, but overall differences aren't so notable. Let's see if PCA can eliminate dimensions to emphasize how countries differ.
Or subscribe to our mailing list.
No comments:
Post a Comment