orbplus - Structural Feature Prediction Tool

orbplus
Brian Sykes Lab

Version: 1.1.2 - Jul / 2011
Download and Installation

Purpose: Structural Feature Prediction Tool

Introduction
Using the Software
Miscellaneous
- Prediction Calculation
- Troubleshooting

Overview and References

This software summarizes and predicts the properties of an input protein using only the assigned chemical shift information of a target protein. Examples of such properties include:

Estimating how open or closed the input protein is
Estimate and analyze interhelical angles for each residue in the input protein
Binding affinity

The target protein may be a single protein selected from a database of proteins where the property in question is known or estimated. The target protein may also be an averaged protein derived from the selection of several proteins from the database.

First, the amino acid chemical shifts of the input protein are correlated and ranked against the target. Then, those amino acids with the highest correlation criteria are selected to predict an overall property value for the protein.

Although the software can do this calculation instantaneously, the usefulness of the software lies in the presentation of visual tools (spectral plots, chemical shift histograms, pymol modelling) which allow the user to confidently and quickly evaluate prediction results.

Authors and References: Robertson, IM, Boyko, RF, and Sykes, BD (2011) Visualizing the principal component of 1H,15N-HSQC NMR spectral changes that reflect structural or functional properties of a protein: Application to troponin C. J. Biomol. NMR (in press).

Download and Installation

This software is a self contained tcl/tk starkit which should make installation fairly trivial. See the Release notes for the latest updates. Download the software appropriate for your machine:
- MacOSX (5.6 MB) - Mac (intel and ppc)
- Linux x86 (4.6 MB) - Intel x86 32 bit
- Linux x86_64 (4.6 MB) - Intel x86 64 bit
Double click the download file. You should now get a new file on the desktop (or your download directory) represented by the orbplus green ball icon. For macosx, you may want to drag the orbplus icon into /Applications or keep it on your desktop.
Now let's make sure the software is working. Download data and examples (1.1MB) and double click (or un-tar) this file to create the orbplus-1.1-x-data folder. (see the examples.readme file for a quick summary of each example provided).
To run an example, double click on the orbplus.command file.

It is difficult for the programmer to test all the various operating system environments for this software. If you have an installation that did not work as simply as stated above, then please feel free to email robert.boyko@ualberta.ca and indicate the type of system you have and the errors that occurred.

Setup your Data

Create a working directory.

> cd 
> mkdir orbplus_test

Copy the proteins used for comparison purposes to your working directory. This may include assigned (or partially assigned) proteins in one of the following formats:

xpk - nmrview peaklist file
ppm.out (nv5) - nmrview chemical shift file
ppm.out (nvj) - nmrviewj chemical shift file
bmrb - biomagresbank desposit format.

Currently the software does not parse input files to determine the format. You must specify the correct file suffix when entering data. If you forget to do this, then orbplus will complain that it cannot figure out how to read your input.
It is not necessary to have all atoms assigned, the software only looks for the assigned atoms. The default atom names are "HN", "H", and "N" (case does not matter). See the Startup section if you are planning to work with other atoms or atom names.
Copy your input data (the protein you wish to analyze) to the working directory. Formats and rules for input data are the same as specified above.

Create a DB Index file. Here is an example of how that file may look:

# DB Index file for use in orbplus
MODEL:      Interhelical Angle
# key       short_name  file_name               residue_range:score_range ... 
REFERENCE:  Ca          nhsqc_Ca.xpk            90-125:111-119 126-160:113.5-118.25
PROTEIN:    cRp40       nhsqc_cRp40.xpk         90-125:79-87 126-160:83-91
PROTEIN:    EGCg        nhsqc_EGCg.xpk          90-125:104-114 126-160:106-118
PROTEIN:    EMD         nhsqc_EMD.xpk           90-125:90-98 126-160:107-119


MODEL: is an arbitrary name or title of the run.
REFERENCE: contain the reference chemical shifts for comparision. Typically
the user will pick the most closed protein to be the reference protein but other
criteria is certainly possible. 
PROTEIN: proteins for comparison 


short_name field identifies the protein for labelling purposes on plots.
file_name field can be a full pathname however this is not necessary if the
file is in the same directory as in the Db Index file. 
residue_range field specifies a continuous group of residues and
score_range is the associated score for those residues. Typically scores
represent interhelical angles or percent open values.

Users need to decide where the interesting regions are and appropriate scores. There is no limit to the number of residue_range:score_range terms where each term is separated by whitespace. Blank lines in the file are ignored as well as lines which start with "#". Other examples of Db Index files include Cardiac.db/orbplus_oc.index and Cardiac.db/orbplus_iha.index.

Optional: Copy a representative protein structure to your working directory.
Optional: Set up software preferences and/or startup script.

Startup, Preferences, and Output

The simplest solution is not to worry about settings, just double click on the green ball orbplus icon or execute a startup script from a terminal window. But if you want the software to automatically find the data, taylor windows, colors, fonts for your screen, set plotting attributes, etc then you should understand the options on how to do this.

Invoke orbplus via clicking the icon:

".orbplus.defaults"

".orbplus.output"

Invoke orbplus via the command line:
The following command line arguments are available for orbplus:

orbplus [-defaults defaultsFile] [-db dbIndexFile] [-data InputFile] [-pymol pdbFile] [-output outputDir] defaultsFile: application defaults file dbIndexFile: database index file inputFile: input data to analyze pdbFile: input structure file for pymol to display outputDir: where to direct all output from this software

After downloading, you may want to make an alias for orbplus, for example:
(linux) alias orbplus $HOME/Desktop/orbplus/orbplus (macosx) alias orbplus $HOME/Desktop/orbplus.app/Contents/MacOS/orbplus
Precedence for reading and setting preferences is in the following order (highest to lowest):
1. the argument given to the -defaults flag on the command line
2. search the current directory for the file "orbplus.defaults" (command line only)
3. search the user home directory for the hidden file ".orbplus.defaults"
4. search the orbplus installation directory for "lib/orbplus.defaults"
The following orbplus.defaults example shows how to control window size and placement, colors, input, output, and other variables for your particular needs.
Precedence for setting the orbplus output directory is in the following order (highest to lowest):
1. the argument given to the -output flag on the command line
2. specifying the write_dir parameter in one of the above preferences files.
3. the user home directory in the hidden directory ".orbplus.defaults"

If orbplus has any problems starting up see the orbplus.log file. A table of defined color names can be found at http://www.tcl.tk/man/tcl/TkCmd/colors.htm and a starting point for font selection is searching "xlsfonts" in your web browser.

Input Panel

Enter the input protein and database index fields as shown in the example below. Use the Browse button to search for these files on your computer if necessary. In the example below, the database index file is found in Cardiac.db/orbplus_oc.index and input protein chemical shifts are in Cardiac.db/l48q.xpk .


 Figure 1

The software prints the number of assigned peaks read as notice to the user that the input file was correctly parsed. To check if the chemical shift files on the database were read correctly, view the orbplus.log file in the Output->Show log file menu item.

The Target Protein

The Next, Previous and Averaged buttons allow the user to cycle the database to see how the input protein compares with various target proteins. The figure below shows how the Residue Tableau is updated when W7 (a specific protein in our database) is selected to be the target protein.


 Figure 2

The Residue Tableau Panel

The residue tableau indicates those residues which are currently being used in the

prediction calculation

The following figure indicates the top 10 residues for a given target protein. How the software ranks the residues is explained in the prediction calculation section.


 Figure 3

The user typically clicks (or drags) the Top Residues scrollbar to increase/decrease the number of residues selected. A user may be very interested to manually select the residues in which case one first clicks the Manual Select radio button and then clicks the desired residue boxes. All calculations and spectral plots are updated on the fly.

Calculation Results Window

As residues are selected/unselected, users will see a summary of how each residue contributes to the overall prediction score (eg, Interhelical Angle).


 Figure 4

A full explanation of the columns in this table is given in the prediction calculation section.

Spectral Plots

Clicking on any residue in the residue tableau updates the residue-all and residue-expand spectral windows for that residue.

The residue-all window shows all the chemical shifts for all the residues in the target protein. The black circle depicts the positioning of the current residue, the red circle are the chemical shifts for the reference protein for the current residue.

The residue-expand window shows all the chemical shifts for one residue only. Again, the red circle are the chemical shifts for the reference protein, the black circle are the chemical shifts of the target protein. The gray circles are the other possible target proteins and the plus sign depicts the chemical shifts of the Input protein. Any pink circles (eg, f77w and apo) are the chemical shifts of proteins which have been temporarily excluded from the database (see Database Customization section).


 Figure 5

The multiple-res-plot combines the information in one residue-expand plot for all the residues hilited in the Residue tableau . It is a way to view chemical shifts of several residues all at once. Because screen space is likely a premium, the user may want to keep this window in icon form.


 Figure 6

The Chemical Shift Profile Histogram

The profile histogram is an interesting way to summarize the change in chemical shift for all residues. Note that calculating changes in chemical shift includes the multiplicative factor which equalizes the importance of the chemical shift atoms (eg, HN and N).

Each residue consists of a black/white bar contained within a red bar contained within a blue bar. Residues which do not have all the required shift information are not included.

The blue bar indicates the magnitude of the chemical shift of the target protein with respect to the reference protein. The red bar is the magnitude of the chemical shift change of the input protein with respect to the reference protein. The white or black bar found within each histogram bar signifies the direction of the input protein chemical shift with respect to the target protein. White is a positive correlation, black is negative.

In the example below, one can see regions of large chemical shifts with high correlation (ie, residues 27-35) but perhaps not surprisingly there are regions of anti-correlation (ie, 38-51) which may also have statistical and structural significance. It is interesting to note the general ebb and flow of chemical shift change throughout various regions of this particular input protein.


 Figure 7

The chemical shift profile as depicted in the histogram can provide interesting trends and insights as the user cycles through the database of target proteins . For example, the profile of the l48q input with the f77w-v82a database protein is strikingly different than the Averaged or other target proteins.


 Figure 8

Display Residues in pymol

The default location for the structural display software is in /usr/local/bin/pymol . A user that prefers pymol to be in a different location should use a custom defaults file like

l48q-pymol/orbplus.defaults

Press the Structural Display button to see a spatial respresentation of the selected residues on a pdb structure. Unless you have copied the pdb file to your current working directory, the software allows you to browse the computer to find the file. If you have a pdb structure, now is a good time to analyze the relevance of the residues selected in the prediction calculation.

Figure 9

Database Customization

Database

orbplus


 Figure 10

f77w

apo

Prediction Calculation

orbplus will attempt to estimate a property (eg, open/close value) of an input protein by comparing the corresponding chemical shifts from a subset of residues in the target protein. First, the software attempts to rank the residues from best to worst predictive value to aid in the residue selection process. The user will then decide how many top residues to include or perhaps use the information provided to manually make a different selection. Once the residues are selected as displayed in the Calculation Results window, the final prediction (eg, interhelical angle) is the average of the Model Predict column.


 Figure 11

The residues are ranked from best to worst using each of the following criteria:

The magnitude chemical shift change from reference protein to input protein. In the Input Distance column of the Calculation Results table below, we can see that residue 28 is ranked number 1 in terms of having the highest absolute chemical shift change.
The directional chemical shift change between input and target proteins. In the Theta column of the table below, we can see that chemical shift change in residue 64 has the second highest directional correlation of all the residues.
The magnitude chemical shift change from the reference protein to target protein. In the Database Distance column in the table below we note that residue 64 of the target protein has the greatest absolute chemical shift change.
The variance of all the chemical shift changes when the target protein is an ensemble of proteins. In the variance column of the table below, we note that residue 62 has the 3rd lowest chemical shift change variance (calculation based on 10 points). Note that comparing the variance of populations of unequal sample size involves multiplying by the appropriate t Distribution value.

For those residues which have reference chemical shifts, we can define corresponding chemical shift vectors for both the input and target proteins. Trigonometry can now be used to calculate the component of the input vector projected onto the target vector. The Model Predict value is the reference value score (eg, interhelical angle) plus a term where term is the difference between reference and target scores multiplied by the calculated component.

The overall Rank of a residue (see column 2) is calculated as a weighted average of the above criteria. Unless explicitly set in the defaults file each criteria carries equal weighting. For example, residue 64 is ranked #1 because 26+2+1+6=35 and residue 37 is next at 3+16+2+33=54.

The variance orbplus uses to rank the database entries; is this simply determined by summing the standard deviations in the x and y coordinates between all entries and then subsequently ranked by lowest overall standard deviation? Yes, except all entries are scaled first so that the averaged vector is the same length for each residue. If you don't do this, then you bias your results to select residues that are not undergoing any change. Scaling does have the effect of magnifying the measurement error so the smaller residue shifters are negatively biased.

How is the Averaged chemical shift vector calculated? The averaged chemical shift vector for a particular residue is the average of all the N and Hn shift values for all the available and active data for that residue in the database.

Discussion

Prediction calculation at this stage in orbplus is a simplification and is basically a starting point for the user to explore chemical shift patterns that are correlated/anti-correlated with structure. For this reason, calculating the error in the prediction would be virtually meaningless. orbplus is a visualization tool and hopefully the presentation of basic but reasonable criteria combined with residue locality and user knowledge leads to effective insights.

For a discussion of how orbplus and PLS regression analysis compare, please see the orbplus publication.

Appendix 1: Troubleshooting

Note: If orbplus has any problems starting up or you want to check that all chemical shift files were read correctly, see the orbplus.log file in the output directory. If orbplus started then the log file is also viewable by clicking 'Show log file' in the 'Output' menu item.
**Warning: Cannot set vector for residue=n This message occurs if the residue is not assigned in the reference protein.
Note: If you click the green ball in a window title to expand the window to full size, the window is not automatically raised to the top. A simple click on the expanded window raises it to the top.
When contacting the authors with problems, please send us the orbplus.log file.

Table of Contents