Problem Set #5:     Using BLAST Sequence Comparisons

to Learn about the Structure of an Unknown Protein

Due October 21st at the beginning of class



1.  You have isolated an interesting protein from your unsuspecting roommate, and you would like to identify it.  By mass spectrometry, you demonstrate that a section of the sequence is ADITNLCFEPANGQMVKCDPGHGKY. 


Although you could jump into the lab and do some experiments on the protein, your first idea is to use your computer to learn something about it first.  As a result, you decide to do a BLAST search, which will allow you to compare your sequence to sequences in the major biological databases, looking for proteins that match fully or partially with this sequence.  A free BLAST search engine is available through the National Institutes of Health “Entrez” site at


Using the FAQ’s at left, read about what kinds of BLAST searches are available.  Which is best suited to the purpose of finding similar protein sequences?



What databases are available to search?



What does a “good” match look like? (i.e. what is the E-value, and what are you looking for?)



Having decided on a BLAST search type and on “nr” or “SwissProt” as a database, you can type or paste your sequence into the search box and ask it to “BLAST” your sequence.  


In this particular case,  the server will fairly quickly find a “putative conserved domain”.   To what protein(s) does this domain belong?   What does the server tell you about the basic function of these proteins?




Go back to the “formatting BLAST” screen and click the big FORMAT button.  After a short delay,  many matches should appear, most of which appear to be related.  What are the matching proteins called?



Scroll down to see more information on the best matches.  For the two best matches, how similar are they to your sequence?  What are the E-values, and what does this mean? (i.e., do you think this is your protein?)  What organism are they from? 




Are there any gaps or mutations in your roommate’s protein compared to these?




Click on the blue L square next to the best match.  This will connect you to an information screen called “LocusLink” which has links to all kinds of information about the protein match.  If you click on the rainbow-colored buttons, you will get all sorts of information about the protein, some of which is in hopeless biology-speak, but some of which is interesting.  What kind of information do you find?  For example, what chromosome is this protein located on?   Are there known homologous proteins in other organisms?






Now that you have at least a bare minimum idea of what your roommate’s mystery protein probably is, you would like to know what its structure looks like.  Although you could go directly to the PDB and look for tubulin proteins, this search engine allows you to BLAST search for structures to match your sequence.  Go back to the BLAST search page and this time choose PDB as your database.



How many structures are found?  What does this tell you about the relative quantity of DNA sequence information versus protein structure information?





What structures are found? What are their PDB filenames?




Though Entrez will give you some structural information, it is no match for our trusty Protein Data Bank!  Go the PDB ( and type in the PDB filename for the best matching structure.  Look at the summary information and download the file to examine in RasMol or a program of your choice.


What do the proteins look like? (how many domains, alpha helices, beta strands, overall shape, cofactors, etc.). 
















How does the quarternary stucture, observed to some extent in the crystal packing unit, relate to the function of the protein?




Draw a topology diagram for the protein.








2.  Proteins can be classified into groups based upon their structures, which can be helpful in predicting the structure of a protein.  For example, if a BLAST search identifies weak sequence homology to several proteins belonging to a single fold family, you could hypothesize that your protein had a similar fold.


There are a couple of major classifications of protein folds that are used.  The first is  SCOP, or Structural Classification of Proteins, at  How does SCOP classify proteins (i.e. into which groups and subgroups, and what are these groupings called)?











Where do tubulins belong in this classification?



A second classification is CATH, found at   How does CATH classify proteins? What does CATH stand for?  What are the groupings?












Where do tubulins belong in this classification?


 Answer the following on another piece of paper:


3.  What is the closest protein match for the sequence “ELVISLIVES”? (hint: use the small- protein-sequence-BLAST)


4.  You isolate a crude nuclear fraction containing a DNA-binding protein you are interested in.  Describe two methods by which you might purify it, and two by which you could sequence it.


5. As it turns out, the biochemistry on #%%$^Bob’s planet is different from ours in yet another way.  Due to a bizzare astrophysical anomaly in the formation of his solar system, deuterium is the most common isotope of hydrogen on his planet.  What would the proton NMR spectrum of #%%$^Bob’s cytochrome c look like?  Why?  


Describe in detail what would happen to the NMR spectrum after the addition of 1H2O to a sample of #%%$^Bob’s cytochrome c.  (Be specific: which peaks with what chemical shifts would change; in what way would they change and on what timescale?; these peaks woud correspond to which chemical groups, secondary structures, tertiary structures, and biophysical events?; how could the behavior in 1H2O be perturbed? )


6.  In a particularly famous set of experiments, an NMR structure of metallothionein (a Zn-binding protein) was published that drastically contradicted the backbone conformation in a published X-ray structure; the NMR structure proved to be correct.*  


One major problem in these determination was described by the crystallographers as follows:  “At the same time, a survey of 80 compounds for heavy atom derivatives was unsuccessful.  Soaked crystals were evaluated by precession photography.  The survey included sulfhydryl-specific compounds PtCl4 2-. Hg(CN)2, CdCl2, which destroyed the diffraction pattern, and inert complexes and ions, such as Pt(CN)4 2-, Sm3+, and UO2 2+, which had no effect, even at high concentration.  One compound, (NH4)WS4, stained the crystals yellow and introduced significant isomorphous intensity differences.  A complete oscillation camera data set was collected…but a consistent solution to the isomorphous difference Patterson map was not found…crystallization experiments with metallothionein reconstituted or substituted with other metals have not as yet yielded suitable isomorphous crystals.”  What were the investigators trying to do?  What was the essential major problem, and why would it subsequently contribute to their having a major error in their structure?


*if  you are interested, the relevant references are Furey et al, Science (1986) v 231, p. 704; Shultze et al and Wuttrich, J. Mol. Biol. (1988) v. 203, p. 251; and Robbins et al J. Mol. Biol. (1991) v. 221. p. 1269.


7.  Use simple thermodynamic arguments to explain why proteins denature at elevated temperatures.  What spectroscopic technique might you use to monitor this denaturation process?  Some organisms called hyperthermophiles live at extremely high temperatures, such as hot springs and the volcanic vents at the bottom of the oceans.  Explain in simple thermodynamic terms how their proteins must be different from ours in order for this to be possible.