High-dimensional cluster analysis with the masked EM algorithm
Kadir SN, Goodman DFM, Harris KD
Neural Computation
(2014) 26:11
Abstract
Cluster analysis faces two problems in high dimensions: the "curse of
dimensionality" that can lead to overfitting and poor generalization
performance and the sheer time taken for conventional algorithms to
process large amounts of high-dimensional data. We describe a solution
to these problems, designed for the application of spike sorting for
next-generation, high-channel-count neural probes. In this problem, only
a small subset of features provides information about the cluster
membership of any one data vector, but this informative feature subset
is not the same for all data points, rendering classical feature
selection ineffective. We introduce a "masked EM" algorithm that allows
accurate and time-efficient clustering of up to millions of points in
thousands of dimensions. We demonstrate its applicability to synthetic
data and to real-world high-channel-count spike sorting data.
Links
Related software
Spike sorting.
Categories