MIT OpenCourseWare
  • OCW home
  • Course List
  • about OCW
  • Help
  • Feedback
  • Support MIT OCW


Computational Biology Tools and Resources

BLAST Sequence alignments provide a powerful way to compare novel sequences with previously characterized genes. Both functional and evolutionary information can be inferred from well designed queries and alignments. The Basic Local Alignment Search Tool (BLAST) provides a method for rapid searching of nucleotide and protein databases. It is a sequence comparison algorithm optimized for speed used to search sequence databases for optimal local alignments to a query. BLAST  Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. "Basic Local Alignment Search Tool." J. Mol. Biol. 215, (1990): 403-410.
BIND The Biomolecular Interaction Network Database (BIND) is a collection of records documenting molecular interactions. The contents of BIND include high-throughput data submissions and hand-curated information gathered from the scientific literature. BIND Bader, G. D., D. Betel, and C.W. "BIND: the Biomolecular Interaction Network Database." Nucleic Acids Res. 31 (2003): 248-50.
Biology WorkBench The Biology WorkBench is a web-based tool for biologists developed by the San Diego Supercomputer Center at the University of California San Diego. The WorkBench allows biologists to search many popular protein and nucleic acid sequence databases. Database searching is integrated with access to a wide variety of analysis and modeling tools, all within a point and click interface that eliminates file format compatibility problems Biology WorkBench
ClustalW ClustalW is a general purpose multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. ClustalW Higgins, D., J. Thompson, T. Gibson, J. D. Thompson, D. G. Higgins, and T. J. Gibson. "CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-specific Gap Penalties and Weight Matrix Choice." Nucleic Acids Res. 22 (1994): 4673-4680.
DALI DALI stands for Distance mAtrix aLIgnment. The Dali server is an automatic service for the comparison of protein structure in 3D. You send the coordinates of a query structure and receive a multiple structure alignment in return. You can submit your coordinates either by electronic mail or interactively from the World Wide Web. DALI Holm, L., and C. Sander. "Mapping the Protein Universe (209kb)." Science 273, (1996): 595-602.
Deep View Swiss-Pdb Viewer The Deep View Swiss-PdbViewer is a software application with a user friendly interface that allows one to analyze several proteins at the same time. The proteins can be superimposed in order to deduce structural alignments and compare their active sites or any other relevant parts. Amino acid mutations, H-bonds, angles, and distances between atoms are easy to obtain thanks to the intuitive graphic and menu interface. Deep View Swiss-PdbViewer

Tutorial For Deep View (Swiss-PdbViewer)
Guex, N., and M. C. Peitsch. "SWISS-MODEL and the Swiss-PdbViewer: An Environment for Comparative Protein Modeling." Electrophoresis 18 (1997): 2714-2723.
DIPTM DIPTM stands for Database of Interacting Proteins. The DIPTM database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. DIPTM Salwinski L., C. S. Miller, A. J. Smith, F. K. Pettit, J. U. Bowie, and D. Eisenberg. "The Database of Interacting Proteins: 2004 update." Nucleic Acids Res. 32 (2004): Database issue:D449-51.
Dot Matrix Dot or matrix plots provide an easy and powerful means of sequence analysis for searching out regions of similarity in two sequences and repeats within a single sequence. Nucleic Acid Dot Plots Maizel, J. V., and R. P. Lenk. "Enhanced Graphic Matrix Analysis of Nucleic Acid and Protein Sequences." Proc. Natl. Acad. Sci. USA. 78 (1981): 7665.

Pustell, J., and F. C. Kafatos. "A High Speed, High Capacity Homology Matrix: Zooming through SV40 and Polyoma." Nucleic Acids Res. 10 (1982): 4765.

Quigley, G. J., L. Gehrke, D. A. Roth, and P. E. Auron. "Computer-aided Nucleic Acid Secondary Structure Modeling Incorporating Enzymatic Digestion Data. Nucleic Acids Res. 12 (1984): 347.
Entrez Entrez is a retrieval system designed for searching several linked databases at NCBI (National Center for Biotechnology Information). Entrez

GENSCAN GENSCAN predicts the locations and exon-intron structures of genes in genomic sequences from a variety of organisms. GENSCAN was developed by Prof. Chris Burge while he was in the research group of Samuel Karlin, Department of Mathematics, Stanford University. GENSCAN Burge, C., and S. Karlin. "Prediction of complete gene structures in human genomic DNA." J. Mol. Biol. 268 (1997): 78-94.
Gibbs Motif Sampler The Gibbs Motif Sampler will allow you to identify motifs, conserved regions, in DNA or protein sequences. Gibbs Motif Sampler This software was developed by Eric C. Rouchka and Bill Thompson based on work by C. E. Lawrence, J. S. Liu, A. F. Neuwald and others (References) as part of the Bayesian Bioinformatics Program at the Biometrics Laboratory of Wadsworth Center.
MEME Discover motifs (highly conserved regions) in groups of related DNA or protein sequences using MEME. MEME Bailey, Timothy L., and Charles Elkan. "Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers." Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. Menlo Park, California: AAAI Press, 1994, pp. 28-36.
MODBASE MODBASE is a database of annotated comparative protein structure models and associated resources. MODBASE Pieper U., N. Eswar, H. Braberg, M. S. Madhusudhan, F. P. Davis, A. C. Stuart, N. Mirkovic,
A. Rossi, M. A. Marti-Renom, A. Fiser, B. Webb, D. Greenblatt, C. C. Huang, T. E. Ferrin, and A. Sali. "MODBASE, a database of annotated comparative protein structure models, and associated resources." Nucleic Acids Res. 32 (2004): Database issue:D217-22.
PDB Database The Protein Data Bank (PDB) is the single worldwide repository for the processing and distribution of 3-D biological macromolecular structure data. The Protein Data Bank (PDB)

(PDB Advisory Notice on using materials available in the archive)
Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. "The Protein Data Bank." Nucleic Acids Res. 28 (2000): 235-242.
PHYLIP PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). It is available free over the Internet, and written to work on as many different kinds of computer systems as possible. PHYLIP
NCBI Established in 1988 as a national resource for molecular biology information, NCBI (National Center for Biotechnology Information) creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information -- all for the better understanding of molecular processes affecting human health and disease. This web site provides access to a myriad of biological sequence databases, structural databases, bioinformatics tools, and literature search tools. NCBI 
Python Scripting Language A scripting language widely used, along with PERL, for bioinformatics and computational biology. Python
RasMol RasMol is a molecular graphics program intended for the visualization of proteins, nucleic acids and small molecules. The program reads in molecular co-ordinate files and interactively displays the molecule on the screen in a variety of representations and colour schemes. 


Sayle R., and E. James Milner-White. "RasMol: Biomolecular graphics for all." Trends in Biochemical Sciences (TIBS) 20, no. 9 (September 1995): 374.
Scansite Scansite searches for motifs within proteins that are likely to be phosphorylated by specific protein kinases or bind to domains such as SH2 domains, 14-3-3 domains or PDZ domains. Scansite Songyang, Z., S. Blechner, N. Hoagland, M. F. Hoekstra, H. Piwnica-Worms, and L. C. Cantley. "Use of an Oriented Peptide Library to Determine the Optimal Substrates of Protein Kinases." Curr Biol. 4, no. 11 (1 Nov 1994): 973-82.

Yaffe, M. B., G. G. Leparc, J. Lai, T. Obata, S. Volinia, and L. C. Cantley. "A Motif-based Profile Scanning Approach for Genome-wide Prediction of Signaling Pathways." Nat. Biotechnol. 19, no. 4 (Apr 2001): 348-53.

Obenauer, J. C., L. C. Cantley, and M. B. Yaffe. "Scansite 2.0: Proteome-wide Prediction of Cell Signaling Interactions using Short Sequence Motifs." Nucleic Acids Res. 31, no. 13 (1 Jul 2003): 3635-41.
SCOP Nearly all proteins have structural similarities with other proteins and, in some of these cases, share a common evolutionary origin. The Structural Classification Of Proteins (SCOP) database, created by manual inspection and abetted by a battery of automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification. SCOP Murzin, A. G., S. E. Brenner, T. Hubbard, and C. Chothia. "SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures." J. Mol. Biol. 247 (1995): 536-540.
TMHMM This software tool was developed by the Center for Biological Sequence Analysis at the Technical University of Denmark and is used to predict transmembrane helices in protein sequences.


Krogh, A., B. Larsson, B. von Heijne, and E. L. Sonnhammer. "Predicting transmembrane protein topology with a hidden Markov model:
application to complete genomes." J. Mol. Biol. 305 (2001): 567-80.