|
Areas
Data mining Statistical learning
|
Clustering and classification methods based on mixture models for high
dimensional data and non-vector type of data such as discrete distributions with non-fixed supports (sets of unordered and
weighted vectors).
-
Multilayer mixture models
-
Two-way mixture models for discrete and continuous data
-
D2-clustering for discrete distributions (sets of unordered and weighted vectors) under the Kantorovich-Wasserstein metric.
-
Generalized mixture modeling for a metric (but not vector) space
-
Clustering via mode association
Applications explored: document retrieval/classification,
image annotation/retrieval/segmentation/compression,
social networks, information
visualization, genomics, etc.
Sample talk
|
Free software
|
Basics on data mining & learning
|
|
|
Stochastic modeling
|
Spatial stochastic models attempt to characterize the
inherent dependence among image pixels. The dependence can then be exploited
for various tasks, for instance, segmentation, compression, classification.
We have developed the 2-D Hidden Markov model (i.e., Spatial HMM) with
extensions to a multiresolution model (MHMM) and 3-D for volume data.
Applications explored: general-purpose photographs,
satellite images, Chinese classical paintings, Van Gogh paintings, etc.
Sample talk
|
Tutorial on HMM
|
|
|
Image annotation
|
Image annotation is about tagging pictures by words automatically using only
pixel information. We have developed ALIPR, a real-time computerized image
annotation system. The work is rooted in the ALIP system developed in 2002.
Relevant methodologies: 2-D MHMM, D2-clustering, generalized mixture
modeling.
Sample talk
|
alipr.com
|
In the news:
MIT Tech Review
...
|
|
|
|
flower, holiday, garden
|
ocean, lighthouse, beach
|
medicine, seed, science
|
|
|
Image retrieval
|
Content-based image retrieval systems search for similar pictures using only
pixel information. We have developed the
SIMPLIcity
retrieval system that
has been deployed at several real-world Web sites, e.g.,
airliners.net
,
mindat.org
,
terragalleria.com
, and requested for educational purposes
by dozens of universities. We continue to work on image retrieval to bring
in new aspects such as aesthetics, semantics learning, and story picturing.
Sample talk
|
Demo
|
Slashdot news
|
|
|
Social networks
|
Statistical modeling and learning techniques are used to discover
E-communities and to study academic
collaboration networks with applications to
citseer.
|
|
|
Comparative genomics
|
Data mining and statistical modeling methods are used to study evolution and
functions of DNA segments based on aligned DNA sequences of multiple
species.
Sample talk
|
|
|
Data compression/Source coding theory
|
Asymptotics of vector quantizers with high bit rate when perceptually
based distortion measures are used.
Sample talk
|
|
|
|