MASS SPECTROMETRY & PROTEOMICS
LAB COMPUTER EXERCISES INTRODUCTION
COMPUTER EXERCISE #1: Observe how one type of mass spectrometer works.
Select this link to observe how an instrument works
Electrospray source, Ion Trap analyzer:
LCQ and ESI Ion Trap System
INFORMATION ABOUT PROTEINS
COMPUTER EXERCISE #2: Find the theoretical MW of a known protein.
Compare theoretical to observed. Browse information and view
(a) Open an internet browser, go to the Expasy Proteomics
Select SWISS-PROT and TrEMBL database
Type: alkaline phosphatase ecoli
Will result in information about protein precursor (P00634)
Explore the information (Feature Table, amino acid sequence)
Select Computer pI/Mw
Select residues 22-471 (this is the protein without signal)
This will provide information about the protein's average MS
Compare the mass spectrum (below) to the theoretical average MW. Calculate
the m/z difference and % error between the observed and theoretical protein.
Figure 1. MALDI-TOF spectrum of alkaline phosphatase
(b) To view the structure of this protein, open a new
browser page, go to the Protein Data Bank
Search the Archive for Alkaline phosphatase e.coli,
type in the ID number, do a full text search: (P00634)
2-D GEL ELECTROPHORESIS
Select any of the structures listed
Under DISPLAY OPTIONS select a viewer
Explore the information provided by this page
Try other proteins, use the "search" option on the left side of the main page, select browse database and select any section, then any protein.
COMPUTER EXERCISE #3: Explore information on a 2-Dimensional
Go to http://us.expasy.org/
and select SWISS-2DPAGE (under databases).
Choose access to Swiss 2D page by assession number
Search for P00359 (Glyceraldehyde 3-phosphate dehydrogenase from
Enlarge the gel and notice where the spot is observed.
Return to SWISS-2DPAGE, and choose access by clicking on a spot
Find the gel for yeast (Saccharomyces cerevisiae) and try to find
the same protein.
PROTEIN IDENTIFICATION BY PEPTIDE MASS MAPPING
COMPUTER EXERCISE #4: Investigate peptide mass mapping used
for protein identification
Figure 2. MALDI-TOF spectrum of tryptic peptides from
Unknown protein #116.
(a) Identify a protein from measured peptide masses
Open the excel spreadsheet provided in the link
. Use only the tabs labelled 66, 166, 55, 36.
Copy and paste the m/z values into the search described below.
Delete the m/z values 904.4681 and 2465.199 (these are internal calibrants)
Identify unknown protein #66 using the excel data.
Identify protein from other unknown proteins (#166, 55, 36) as time permits.
Open an internet browser, go to
Select MS-Fit program
At the bottom of the screen is a data paste area filled with masses.
Delete these and copy/paste the mass list from the excel spreadsheet.
Search these masses while changing different options.
Options to vary: (Database (swiss-Prot, Owl, NCB), Mass
tolerance (50, 70, 100, 120 ppm)
Look for proteins with strong MOWSE scores with low mass errors.
(b) Find a protein's theoretical peptides
to confirm the identity
Open an internet browser, go to
Select the MS-Digest program
Options to choose:
Retrieve entry by assession number (Enter a number for the protein identified
Select the database where your protein was found
The digestion enzyme is Trypsin, with 0 missed cleavages
PROTEIN IDENTIFICATION BY MS/MS
COMPUTER EXERCISE #5: Investigate the use of MS-MS
data for peptide sequence information -- and ultimately protein identification.
An unknown protein was digested with trypsin and
MS-MS spectra was acquired.
A list of fragments from one of these peptides is included in the excel
spreadsheet (fragments 316-318).
Go to Protein Prospector
, select the MS-Tag program.
Copy/paste the mass list, overwriting the default masses on the program.
First on the list should be the mass of the selected peptide (this may
need to be calculated)
Search for possible peptides/proteins.
COMPUTER EXERCISE #6: Investigate the use of unannotated
sequence data contained in databases of genome sequencing information. Perform
BLAST searching to find homologous proteins.
A protein was subjected to nanoLC-MS/MS and
two peptides were identified by Sequest searching against a database of rice
genome sequence data (which is not publicly available). The protein header
is listed simply as “unknown”. See tab in exercise spreadsheet for details
of the peptide sequences. Perform a BLAST search to find similar
proteins, and get some idea what the function might be of this unknown protein..
Open a browser, go to www.ncbi.nlm.nih.gov/BLAST
Choose the protein-protein (Blastp) option
Copy and paste the two peptides sequences into the “search” window as
a single string of text i.e. Make sure to remove the dots and any unnecessary
Check that the database searched is set to nr (i.e. the nonredundant database)
When the new page appears that indicates you search has been submitted,
press the format button. Your results (when ready) will open in a
Look at the results, ignoring the “unknown” proteins and see if there
is a common functional theme among the various proteins reported as having
decent homology. (Not necessarily the first one.) Look
at some of the individual matches and see how closely (or not) they resemble
the sequence you input.
Now go to www.arabidopsis.org
and do it again. Find BLAST under Analysis tools and use blastp
against the PlantProtein database or the AGI proteins
database (Arabidopsis genome initiative).
MORE PROTEINS TO IDENTIFY
COMPUTER EXERCISE #7: Hard copies of MS/MS spectra will be provided.
Identify the amino acid sequence of a peptide using the
method shown in #5. The results will identify the protein .
X TANDEM (MS/MS)
COMPUTER EXERCISE #8: Raw data files in electronic form of MS/MS spectra
will be provided.
Identify the peptide(s)
and protein(s) using the X Tandem program. The program has been installed
on each computer.
Program location: C:\program files\tandem
Data location: C:\data (*.dta files)
Open the tandem\bin folder (C:\program files\tandem\bin), Four "input.xml"
files have been prepared to search the data. Open one of these files
in notepad and note the location of the original data and the destination
of the output (Data location: C:\data)
4 sample data have been prepared
single spectrum data: input JK04_1560.xml
multiple spectra data: inputJK03.xml inputLuci
SEARCH A DATA FILE
Open a command prompt window, cd to c:\program files\tandem\bin (you can
type cd <space> then drag and drop the folder icon from an open window
as a typing shortcut)
Type tandem inputJK04_1560.xml and <enter>
If it says not recognized, retype the command as tandem.exe inputJK04_1560.xml
It should say ‘loading spectra’, spectra matching criteria = xx, computing
models, then spit out a line of random letters and numbers to indicate its
thinking. A command prompt will appear when the search is complete.
VIEW THE OUTPUT
go to the www.thegpm.org website, click on the ‘genomes’ link at top left,
then click on the ‘view saved xml data’ link at top left. You can then browse
to select the XML file you just created (C:\data *output.xml files) and click
the ‘view models’ link. This opens up a very nice html page of results.
Search the four prepared data files and view the results.
Make your own input files as follows:
Go to the tandem\bin directory(C:\program files\tandem\bin). Open an
input file and edit it to change the file names of the input and output files,
using the *.dta files available in the C:\data folder. Save your edited
file under a new name, be sure to select "save as" and change the file type
to "all files" to save this as an xml file. Repeat the Search and View
as described above.