MASS SPECTROMETRY & PROTEOMICS
LAB COMPUTER EXERCISES INTRODUCTION


MASS SPECTROMETERS

COMPUTER EXERCISE #1: Observe how one type of mass spectrometer works.

Select this link to observe how an instrument works
Electrospray source, Ion Trap analyzer:
LCQ and ESI Ion Trap System

INFORMATION ABOUT PROTEINS

COMPUTER EXERCISE #2: Find the theoretical MW of a known protein.  Compare theoretical to observed.  Browse information and view structure.

(a)  Open an internet browser, go to the Expasy Proteomics Server:  http://us.expasy.org/

Select SWISS-PROT and TrEMBL database

Type: alkaline phosphatase ecoli
Will result in information about protein precursor (P00634)
Explore the information (Feature Table, amino acid sequence)

Select Computer pI/Mw
Select residues 22-471 (this is the protein without signal)
This will provide information about the protein's average MS

Compare the mass spectrum (below) to the theoretical average MW.  Calculate the m/z difference and % error between the observed and theoretical protein.


Figure 1.  MALDI-TOF spectrum of alkaline phosphatase

(b) To view the structure of this protein, open a new browser page, go to the Protein Data Bank http://www.rcsb.org/pdb/

Search the Archive for Alkaline phosphatase e.coli, type in the ID number, do a full text search: (P00634)
Select Search
Select any of the structures listed
Under DISPLAY OPTIONS select a viewer
Explore the information provided by this page
Try other proteins, use the "search" option on the left side of the main page, select browse database and select any section, then any protein.

2-D GEL ELECTROPHORESIS

COMPUTER EXERCISE #3: Explore information on a 2-Dimensional gel.

Go to http://us.expasy.org/ and select SWISS-2DPAGE (under databases).
Choose access to Swiss 2D page by assession number
Search for P00359 (Glyceraldehyde 3-phosphate dehydrogenase from yeast)
Enlarge the gel and notice where the spot is observed.

Return to SWISS-2DPAGE, and choose access by clicking on a spot .
Find the gel for yeast (Saccharomyces cerevisiae) and try to find the same protein.


PROTEIN IDENTIFICATION BY PEPTIDE MASS MAPPING


COMPUTER EXERCISE #4: Investigate peptide mass mapping used for protein identification



Figure 2.  MALDI-TOF spectrum of tryptic peptides from Unknown protein #116.

(a) Identify a protein from measured peptide masses

Open the excel spreadsheet provided in the link excel data .  Use only the tabs labelled 66, 166, 55, 36.
Copy and paste the m/z values into the search described below.
Delete the m/z values 904.4681 and 2465.199 (these are internal calibrants)

Identify unknown protein #66 using the excel data.
Identify protein from other unknown proteins (#166, 55, 36) as time permits.

Open an internet browser, go to http://prospector.ucsf.edu/

Select MS-Fit program

At the bottom of the screen is a data paste area filled with masses.
Delete these and copy/paste the mass list from the excel spreadsheet.
Search these masses while changing different options.
Options to vary:  (Database (swiss-Prot, Owl, NCB), Mass tolerance (50, 70, 100, 120 ppm)
Look for proteins with strong MOWSE scores with low mass errors.

(b) Find a protein's theoretical peptides to confirm the identity

Open an internet browser, go to http://prospector.ucsf.edu/

Select the MS-Digest program

Options to choose:
Retrieve entry by assession number (Enter a number for the protein identified in #3a)
Select the database where your protein was found
The digestion enzyme is Trypsin, with 0 missed cleavages


PROTEIN IDENTIFICATION BY MS/MS


COMPUTER EXERCISE #5: Investigate the use of MS-MS data for peptide sequence information -- and ultimately protein identification.

An unknown protein was digested with trypsin and MS-MS spectra was acquired.
A list of fragments from one of these peptides is included in the excel spreadsheet (fragments 316-318).

Go to Protein Prospector , select the MS-Tag program.
Copy/paste the mass list, overwriting the default masses on the program.
First on the list should be the mass of the selected peptide (this may need to be calculated)

Search for possible peptides/proteins.


BLAST SEARCHING

COMPUTER EXERCISE #6: Investigate the use of unannotated sequence data contained in databases of genome sequencing information. Perform BLAST searching to find homologous proteins.

A protein was subjected to nanoLC-MS/MS and two peptides were identified by Sequest searching against a database of rice genome sequence data (which is not publicly available). The protein header is listed simply as “unknown”. See tab in exercise spreadsheet for details of the peptide sequences.  Perform a BLAST search to find similar proteins, and get some idea what the function might be of this unknown protein..

Open a browser, go to www.ncbi.nlm.nih.gov/BLAST
Choose the protein-protein (Blastp) option

Copy and paste the two peptides sequences into the “search” window as a single string of text i.e. Make sure to remove the dots and any unnecessary spaces.
Check that the database searched is set to nr (i.e. the nonredundant database)

When the new page appears that indicates you search has been submitted, press the format button. Your results (when ready) will open in a new window)
Look at the results, ignoring the “unknown” proteins and see if there is a common functional theme among the various proteins reported as having decent homology.  (Not necessarily the first one.)  Look at some of the individual matches and see how closely (or not) they resemble the sequence you input.

Now go to www.arabidopsis.org and do it again. Find BLAST under Analysis tools and use blastp against the PlantProtein database or the AGI proteins database (Arabidopsis genome initiative).


MORE PROTEINS TO IDENTIFY (MS/MS)

COMPUTER EXERCISE #7: Hard copies of MS/MS spectra will be provided.  


    Identify the amino acid sequence of a peptide using the method shown in #5.  The results will identify the protein .


X TANDEM (MS/MS)

COMPUTER EXERCISE #8: Raw data files in electronic form of MS/MS spectra will be provided.


Identify the peptide(s) and protein(s) using the X Tandem program.  The program has been installed on each computer.

Program location:  C:\program files\tandem
Data location:  C:\data (*.dta files)

Open the tandem\bin folder (C:\program files\tandem\bin), Four "input.xml" files have been prepared to search the data.  Open one of these files in notepad and note the location of the original data and the destination of the output (Data location:  C:\data)
    4 sample data have been prepared
    single spectrum data:  input JK04_1560.xml   input JK03_1471.xml
    multiple spectra data:  inputJK03.xml   inputLuci .xml

SEARCH A DATA FILE
Open a command prompt window, cd to c:\program files\tandem\bin (you can type cd <space> then drag and drop the folder icon from an open window as a typing shortcut)
Type tandem inputJK04_1560.xml and <enter>
If it says not recognized, retype the command as tandem.exe inputJK04_1560.xml and <enter>
It should say ‘loading spectra’, spectra matching criteria = xx, computing models, then spit out a line of random letters and numbers to indicate its thinking.  A command prompt will appear when the search is complete.

VIEW THE OUTPUT
go to the www.thegpm.org website, click on the ‘genomes’ link at top left, then click on the ‘view saved xml data’ link at top left. You can then browse to select the XML file you just created (C:\data *output.xml files) and click the ‘view models’ link. This opens up a very nice html page of results.

Search the four prepared data files and view the results.

Make your own input files as follows:
Go to the tandem\bin directory(C:\program files\tandem\bin).  Open an input file and edit it to change the file names of the input and output files, using the *.dta files available in the C:\data folder.  Save your edited file under a new name, be sure to select "save as" and change the file type to "all files" to save this as an xml file.  Repeat the Search and View as described above.