Chemistry 333: Protein Structure and Function


Problem Set #1: Introduction to the Protein Data Bank and RasMol

Due September 16th at the beginning of class



1.  To begin, go to the class web page at   Use the tabs at the top of the page to go to the page called “Links” and there click on the link to the Protein Data Bank, or PDB. 


The PDB is a repository for structural information about proteins gathered from all over the world.  To date, there are about 22,000 different structural files contained in the PDB, and the number is growing every day.  Look under “Current Holdings” on the left side of the page.  How many structures are in the PDB today?    



There are a few different ways to find and access information in the PDB through its search engine, which we will play with today.  Let’s start by looking for the structure of a protein called myoglobin.  First, let’s yype the name of the protein in the search box at the middle of the page and hit the “Find a structure.”   How many structures match that name?



Now, scroll down the page and look at the results of the search.  In general, what kind of information is given to you on this page? 






The code at the left (4 digits, letters and numbers) is a filename unique for each structure.  Generally files having codes starting with zero are only text files, and contain no 3-dimensional structural information.  Files with similar codes are often related structures, that is, published in the same research paper, or revisions done by the same research group.


Since this search gave us far too many results, we will have to be more specific in our search.  Go back to the center of the main PDB page and now choose “Search Lite.”  Here you may may be more specific in your search of the PDB, by using Boolean operators (and, or, not…) and attributes.  Let’s say that you are particularly interested in the structure of horse myoglobin.  Following the examples on the SearchLite page, find the PDB files for horse myoglobin.  How many files are there now, and how many structures are available with a resolution better than 2.00 Ā?   




For the moment, let us look at file 1DWR.  Click on it and you will get a summary page of information about the structure, including the authors and the paper in which it was published, the method (X-ray crystallography or NMR), the organism.  (We will discuss the space groups, R-value, and resolution in a couple of weeks.)  In the left hand purple column, choose “View Structure” and you’ll be given a few choices of nice colorful cartoon pictures.  Does this protein contain more alpha helices or beta strands, or an even mixture of both?  What else is in the structure besides proteinaceous material?




Now, click on “Download/Display File” in the left hand purple column, which should give you a few choices of formats.  Let’s start by looking at the “header only.”  This is the top section of the PDB file itself, and it generally contains an awful lot of valuable information.  This includes “journal” and “remarks” with the references, interesting facts about the crystallization conditions, mutations, the NMR refinements, etc.;  “seqres” with the complete protein sequence in 3-letter code for each of the independent protein strands (labeled A,B,C…); “het” with a list of  non-peptide heteroatoms, such as waters, phosphates, metal ions,  and cofactors;  and a list of all of the secondary structural units, labeled “helix” or “sheet”.


When was this structure deposited into the PDB?


How many peptide chains and/or proteins are packed into this unit cell?


How many helices are there?  How many beta strands?


What are the the heteroatoms?


What inhibitor is bound to the active site?



Now, click on “Download/Display File” in the left hand purple column, and choose the text file complete with coordinates.  The text file will start with the same header, but it will be a lot longer.  Scroll down and take a look!  The three-dimensional location of each and every atom in the protein is listed, along with its identity (what strand does it belong to, what amino acid does it belong to, what kind of atom is it, where does it belong within the amino acid).  This latter information might be dizzying to you, but it will make a lot of sense to the computer graphics programs we will use this semester.


At the top of the page, choose “Save Full Entry onto Disk” and place the file (which will be named 1DWR.pdb) on the desktop.  Note that all pdb filenames follow this format.



2.  Now, we are going to use a program called RasMol to look at the structure we have downloaded.  There are more powerful and friendly programs out there that we will use in later weeks, but for the moment, RasMol is a very basic way to take a look at a structure and make a pretty picture.  RasMol is available for free on the web at the following address, which also has help and tutorials:  There is a link to this page from the class Links webpage.  You may want to install it on your personal computer, if you have one.  RasMol has also already been downloaded onto the computers in this lab, so we can skip that part.


Open file 1DWR.pdb with RasMol.  Because this is an old file, you may have to highlight the file on the desktop, then go to File -> Open with -> RasMol.   When it opens, you should get a white command-line screen and a black structure screen.  First, let’s bring the black structure screen to the front and play with that.


You can drag the protein around to different orientations using the mouse and the scroll bar.            

You can change the representation of the protein using the Display menu to CPK, backbone only, ribbons, etc, and use the Colours menu to change the colors of the protein.


In your opinion, which Display/Colour choices show the secondary structure most clearly?




Which Display choices show the heme porphyrin cofactor and inhibitor (the “heteroatoms”), and which do not?



If you choose a Display choice where the cofactor is not visible, you can put the heme porphyrin molecules back in.  Type ‘select hetero’ in the white window, then choose a display style to make them appear in the black window.  If you type ‘select protein’ you can change its display style independently of the porphyrin; ‘select all’ lets you change the whole image style at once.

This is nice because you can mix-and-match display styles and colors within the same picture—it can be part CPK, and part ribbon drawing, or whatever you like.


Now, click on some atom on your structure and then look at the command line window.   RasMol will tell you exactly which atom that is, and where it belongs in the sequence.


In the command line window, we can also do other things, like zooming in or out.  What happens when you type ‘zoom 400’ ?   ‘zoom 10’?


It is possible to do a lot of different things in the command line window, which you can look up on the help file or in the online tutorial if you would like, but which you do not need to learn.


Finally, choose a style and an image orientation and zoom factor that you think looks cool, whether it be a ribbon drawing, a close-up of the porphyrin-inhibitor, a space-filling model, or whatever.  Use the Export menu to save an image and print it out (black and white is fine) and turn it in with this assignment.

3. Now go back to the PDB, and in SearchLite, search for files of human myoglobin.  What is interesting about the results?  What warning should you take away from this?




Take a look at file 2HHB.pdb, and compare to file 1DWR.pdb.  What is similar, and what is different?






4.   Which nine protein groups (seven side chains and….) are ionizable at biologically-achievable pH’s?   Draw their neutral and ionized structures and indicate what fraction of each form would be present at pH 6.8.

5. An alien from another planet, whom we will call “#%%$^Bob,” comes to Earth to study biological chemistry and is aghast to find that our proteins use only L-amino acids, because on his planet all of the amino acids are D-amino acids. (Needless to say, #%%$^Bob can only eat food brought from his home planet, because our proteins give him terrible gas.  But that is another story).  Draw a Ramachandran plot showing the preferred conformations of alanine on his planet.