Jeffrey A. Bilmes: Software and Data
Software
 The graphical models toolkit
GMTK
 The
Vocal Joystick
Software for windows.
 The
PhiPAC automatically tuning matrixmatrix multiply library (and
the first autotuning matrix multiply dense linear algebra library).

The Measure
propagation code and web page, a graph Laplacian Manifold
approximation based semisupervised learning algorithm that uses an
objective function based on KLdivergence (code also includes
fast C++ parallel implementation).

The Buried Markov Model (BMM)
code (includes mixtures of sparse linear conditional multitime
Gaussian models). Sorry, no documentation.

Multiparty meeting scheduling with simple
preference aggregation rules.

Extensions to the old Berkeley parallel make software (or what is
known as pmake). The original pmake utility is
described here.
We have made a number of significant extensions to pmake including a full
gnu autoconf configuration, many new resource constraints (including dynamic), and other
features (as well as removed some old ones that were no longer
needed). The complete source code is at at
pmake3.0alpha. Note that this
is an alpha release, and is basically working but there are no plans
for additional work to be done on this (at least by me or my group), nor can I answer
any further questions about this (see the source code).
Data
 Corpus definitions and baseline systems
for both the SVitchboardII and FiSVerI datasets
can be found at
this
link. The paper describing it is
here

Cooperative Cut
image data
, a set of difficult to segment images (with elongated or narrow
structures, and contrast gradients) along with ground truth labellings,
and that were used in the following
paper

Vocal Joystick
Vowel Corpus

A small amount of handaligned
French/English data, useful for statistical machine translation systems,
done by Karim Filali.

The
COSINE
multichannel realworld insitu noisy speech corpus.
(now available for download).

The SemiSupervised Switchboard Transcription (S3TP) project
and its
data.
In the 1990s, the
switchboard transcription project gave us 1.5 hours of
framebyframe phonetically transcribed switchboard conversational
speech data. Here, we have used a modern semisupervised learning
algorithm to phonetically label at the frame level the remaining 250
hours of SWBI, and we call this the semisupervised switchboard
transcription project (or s3tp). The data and algorithms
are available
here.