Speaker : Professor Fushing Hsieh(University of California, Davis)
Title : 1. Computing ultrametric topology for data cloud geometry: Why are hierarchical clustering trees misleading?
Time : 2012-07-18 (Wed) 10:00 - 12:00
Place : Seminar Room 722, Institute of Mathematics (NTU Campus)
Abstract: We propose a new algorithm to derive the geometric structure of a data cloud. This algorithm constructs an ultrametric space on the data starting with the knowledge of an empirical distance measure and derives an ultrametric tree on this space. It proceeds as follows. The empirical relational measure is transformed into a temperature-regulated potential defined on the links between the nodes. Based on this potential, we extract at very low temperature a collection of motifs, which become building blocks for growing clusters via data-driven merging dynamics as temperature is being raised slowly. A series of phase transitions on this merging dynamics is identified at a series of critical temperatures. These temperatures are then taken as energy barrier heights to define an ultrametric topology onto the data cloud. This topology provides measurable and natural distances between clusters. We call the hierarchy of clustering configurations a data cloud geometry (DCG), which can be represented by an ultrametric tree or a Parisi matrix. We have compared the trees generated with this new algorithm to equivalent trees derived with the Hierarchical Clustering (HC) method on simulated as well as real data clouds from fMRI brain connectivity studies, cancer genomics, and giraffe's social network analyses. In each case, we have shown that the DCG trees are more robust and provide a better quantification of the multi-scale geometric structures of the data. Discussion on DCG tree’s applications. Several applications of DCG-tree within domain of combinatorial optimization are discussed. Examples including: Traveling Salesman Problem and high-dimensional repression problems.