Jeff A. Bilmes's Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category • Sorted by First Author Last Name • Classified by Author Last Name •

Eliminating redundancy among protein sequences using submodular optimization

Maxwell W Libbrecht, Jeffrey A Bilmes, and William Stafford Noble. Eliminating redundancy among protein sequences using submodular optimization. bioRxiv, Cold Spring Harbor Labs Journals, 2016.

Download

[PDF] [gzipped postscript] [postscript] [HTML]

Abstract

Submodular optimization, a discrete analogue to continuous convex optimization, has been used with great success in many fields but is not yet widely used in biology. We demonstrate how submodular optimization can be applied to the problem of removing redundancy in protein sequence data sets, a common step in many bioinformatics and structural biology workflows. We show that an approach based on submodular optimization results in representative protein sequence subsets with greater functional diversity than sets chosen with existing methods. In particular, we compare to a widely used, heuristic algorithm implemented in software tools such as CD-HIT, using as a gold standard the SCOPe library of protein domain structures. In this setting, submodular optimization consistently yields protein sequence subsets that include more SCOPe domain families than sets of the same size selected by the heuristic approach. This framework is theoretically optimal under some assumptions, and it is flexible and intuitive because it applies generic methods to optimize one of a variety of objective functions. This application serves as a model for how submodular optimization can be applied to other discrete problems in biology.

BibTeX

@article {max-elim-redundancy-bioarxiv-2016,
  author = {Libbrecht, Maxwell W and Bilmes, Jeffrey A and Noble, William Stafford},
  title = {Eliminating redundancy among protein sequences using submodular optimization},
  year = {2016},
  doi = {10.1101/051201},
  publisher = {Cold Spring Harbor Labs Journals},
  abstract = {Submodular optimization, a discrete analogue to continuous convex optimization, has been used with great success in many fields but is not yet widely used in biology. We demonstrate how submodular optimization can be applied to the problem of removing redundancy in protein sequence data sets, a common step in many bioinformatics and structural biology workflows. We show that an approach based on submodular optimization results in representative protein sequence subsets with greater functional diversity than sets chosen with existing methods. In particular, we compare to a widely used, heuristic algorithm implemented in software tools such as CD-HIT, using as a gold standard the SCOPe library of protein domain structures. In this setting, submodular optimization consistently yields protein sequence subsets that include more SCOPe domain families than sets of the same size selected by the heuristic approach. This framework is theoretically optimal under some assumptions, and it is flexible and intuitive because it applies generic methods to optimize one of a variety of objective functions. This application serves as a model for how submodular optimization can be applied to other discrete problems in biology.},
  URL = {http://biorxiv.org/content/early/2016/05/02/051201},
  eprint = {http://biorxiv.org/content/early/2016/05/02/051201.full.pdf},
  journal = {bioRxiv},
}

Generated by bib2html.pl (written by Patrick Riley ) on Thu Jan 01, 2026 08:47:28

Jeff A. Bilmes's Publications

Eliminating redundancy among protein sequences using submodular optimization

Download

Abstract

BibTeX

Share