Introduction

  • LPCA is a visualization procedure implemented by Alice Barbara Tumpach that computes 2D positions of labelled datapoints from a high-dimensional dataset by performing a global Principal Component Analysis (PCA) followed by local PCAs in each class. The low-dimensional approximation of each class is then glued in the plane spanned by the first two eigenvalues of the global PCA in such a way that:

    • 1. the distance from the global mean (placed at the origin) to the mean of each class is preserved
    • 2. the direction from the global mean to the mean of each class is preserved
    • 3. the angle between the main eigenvector of each class and the main eigenvector of the whole dataset is preserved.

    Code

    Demonstration of the program

    On 6000 fashion items from Fashion MNIST dataset:

    On the whole Fashion MNIST dataset (LPCA takes 3 seconds on a Mac M1, tSNE much more, mdscale does not converge in a reasonable amount of time...):

    On 6000 digits from MNIST dataset:

    On 50000 digits from MNIST dataset (mdscale takes too much time...):

Get In Touch