Single-Cell Data Analysis Using MMD Variational Autoencoder (preprint)

Abstract

Variational Autoencoder (VAE) is a generative model from the computer vision community; it learns a latent representation of the images and generates new images in an unsupervised way. Recently, Vanilla VAE has been applied to analyse single-cell datasets, in the hope of harnessing the representation power of latent space to evade the ‘curse of dimensionality’ of the original dataset. However, some research points out that Vanilla VAE is suffering from the issue of the less informative latent space, which raises a question concerning the reliability of Vanilla VAE latent space in representing the high-dimensional single-cell datasets. Therefore a study is set up to examine this issue from the perspective of bioinformatics. This paper confirms the issue of Vanilla VAE by comparing it to MMD-VAE, a variant of VAE which has overcome this issue, across a series of mass cytometry and single-cell RNAseq datasets. The result shows MMD-VAE is superior to Vanilla VAE in retaining the information not only in the latent space but also the reconstruction space, which suggests that MMD-VAE be a better option for single-cell data analysis.

Publication
In bioRxiv

The draft of the manuscript can be found from bioRxiv now and is still in preparation of publication to a journal. Any constructive feedback is welcome.