A general sequence processing and analysis program for protein engineering

J Chem Inf Model. 2014 Oct 27;54(10):3020-32. doi: 10.1021/ci500362s. Epub 2014 Oct 3.

Abstract

Protein engineering projects often amass numerous raw DNA sequences, but no readily available software combines sequence processing and activity correlation required for efficient lead identification. XLibraryDisplay is an open source program integrated into Microsoft Excel for Windows that automates batch sequence processing via a simple step-by-step, menu-driven graphical user interface. XLibraryDisplay accepts any DNA template which is used as a basis for trimming, filtering, translating, and aligning hundreds to thousands of sequences (raw, FASTA, or Phred PHD file formats). Key steps for library characterization through lead discovery are available including library composition analysis, filtering by experimental data, graphing and correlating to experimental data, alignment to structural data extracted from PDB files, and generation of PyMOL visualization scripts. Though larger data sets can be handled, the program is best suited for analyzing approximately 10 000 or fewer leads or naïve clones which have been characterized using Sanger sequencing and other experimental approaches. XLibraryDisplay can be downloaded for free from sourceforge.net/projects/xlibrarydisplay/ .

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Electronic Data Processing
  • Gene Library
  • Humans
  • Internet
  • Molecular Sequence Data
  • Protein Engineering / instrumentation*
  • Protein Engineering / methods
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / statistics & numerical data
  • User-Computer Interface*