orbplus
Brian Sykes Lab

Version: 1.1.2 - Jul / 2011
Download and Installation

Purpose: Structural Feature Prediction Tool

Table of Contents

  1. Introduction

  2. Using the Software

  3. Miscellaneous

Overview and References


Download and Installation

  1. This software is a self contained tcl/tk starkit which should make installation fairly trivial. See the Release notes for the latest updates. Download the software appropriate for your machine:

  2. Double click the download file. You should now get a new file on the desktop (or your download directory) represented by the orbplus green ball icon. For macosx, you may want to drag the orbplus icon into /Applications or keep it on your desktop.

  3. Now let's make sure the software is working. Download data and examples (1.1MB) and double click (or un-tar) this file to create the orbplus-1.1-x-data folder. (see the examples.readme file for a quick summary of each example provided).

  4. To run an example, double click on the orbplus.command file.


Setup your Data

  1. Create a working directory.
    > cd 
    > mkdir orbplus_test 
    

  2. Copy the proteins used for comparison purposes to your working directory. This may include assigned (or partially assigned) proteins in one of the following formats:

    Currently the software does not parse input files to determine the format. You must specify the correct file suffix when entering data. If you forget to do this, then orbplus will complain that it cannot figure out how to read your input.

    It is not necessary to have all atoms assigned, the software only looks for the assigned atoms. The default atom names are "HN", "H", and "N" (case does not matter). See the Startup section if you are planning to work with other atoms or atom names.

  3. Copy your input data (the protein you wish to analyze) to the working directory. Formats and rules for input data are the same as specified above.

  4. Create a DB Index file. Here is an example of how that file may look:
    # DB Index file for use in orbplus
    MODEL:      Interhelical Angle
    # key       short_name  file_name               residue_range:score_range ... 
    REFERENCE:  Ca          nhsqc_Ca.xpk            90-125:111-119 126-160:113.5-118.25
    PROTEIN:    cRp40       nhsqc_cRp40.xpk         90-125:79-87 126-160:83-91
    PROTEIN:    EGCg        nhsqc_EGCg.xpk          90-125:104-114 126-160:106-118
    PROTEIN:    EMD         nhsqc_EMD.xpk           90-125:90-98 126-160:107-119
    
    • MODEL: is an arbitrary name or title of the run.
    • REFERENCE: contain the reference chemical shifts for comparision. Typically the user will pick the most closed protein to be the reference protein but other criteria is certainly possible.
    • PROTEIN: proteins for comparison
    • short_name field identifies the protein for labelling purposes on plots.
    • file_name field can be a full pathname however this is not necessary if the file is in the same directory as in the Db Index file.
    • residue_range field specifies a continuous group of residues and
    • score_range is the associated score for those residues. Typically scores represent interhelical angles or percent open values.

    Users need to decide where the interesting regions are and appropriate scores. There is no limit to the number of residue_range:score_range terms where each term is separated by whitespace. Blank lines in the file are ignored as well as lines which start with "#". Other examples of Db Index files include Cardiac.db/orbplus_oc.index and Cardiac.db/orbplus_iha.index.

  5. Optional: Copy a representative protein structure to your working directory.

  6. Optional: Set up software preferences and/or startup script.

Startup, Preferences, and Output


Input Panel


The Target Protein


The Residue Tableau Panel


Calculation Results Window


Spectral Plots


The Chemical Shift Profile Histogram


Display Residues in pymol


Database Customization


Prediction Calculation

orbplus will attempt to estimate a property (eg, open/close value) of an input protein by comparing the corresponding chemical shifts from a subset of residues in the target protein. First, the software attempts to rank the residues from best to worst predictive value to aid in the residue selection process. The user will then decide how many top residues to include or perhaps use the information provided to manually make a different selection. Once the residues are selected as displayed in the Calculation Results window, the final prediction (eg, interhelical angle) is the average of the Model Predict column.

 Figure 11 

The residues are ranked from best to worst using each of the following criteria:

  1. The magnitude chemical shift change from reference protein to input protein. In the Input Distance column of the Calculation Results table below, we can see that residue 28 is ranked number 1 in terms of having the highest absolute chemical shift change.

  2. The directional chemical shift change between input and target proteins. In the Theta column of the table below, we can see that chemical shift change in residue 64 has the second highest directional correlation of all the residues.

  3. The magnitude chemical shift change from the reference protein to target protein. In the Database Distance column in the table below we note that residue 64 of the target protein has the greatest absolute chemical shift change.

  4. The variance of all the chemical shift changes when the target protein is an ensemble of proteins. In the variance column of the table below, we note that residue 62 has the 3rd lowest chemical shift change variance (calculation based on 10 points). Note that comparing the variance of populations of unequal sample size involves multiplying by the appropriate t Distribution value.

For those residues which have reference chemical shifts, we can define corresponding chemical shift vectors for both the input and target proteins. Trigonometry can now be used to calculate the component of the input vector projected onto the target vector. The Model Predict value is the reference value score (eg, interhelical angle) plus a term where term is the difference between reference and target scores multiplied by the calculated component.

The overall Rank of a residue (see column 2) is calculated as a weighted average of the above criteria. Unless explicitly set in the defaults file each criteria carries equal weighting. For example, residue 64 is ranked #1 because 26+2+1+6=35 and residue 37 is next at 3+16+2+33=54.

The variance orbplus uses to rank the database entries; is this simply determined by summing the standard deviations in the x and y coordinates between all entries and then subsequently ranked by lowest overall standard deviation? Yes, except all entries are scaled first so that the averaged vector is the same length for each residue. If you don't do this, then you bias your results to select residues that are not undergoing any change. Scaling does have the effect of magnifying the measurement error so the smaller residue shifters are negatively biased.

How is the Averaged chemical shift vector calculated? The averaged chemical shift vector for a particular residue is the average of all the N and Hn shift values for all the available and active data for that residue in the database.

Discussion

Prediction calculation at this stage in orbplus is a simplification and is basically a starting point for the user to explore chemical shift patterns that are correlated/anti-correlated with structure. For this reason, calculating the error in the prediction would be virtually meaningless. orbplus is a visualization tool and hopefully the presentation of basic but reasonable criteria combined with residue locality and user knowledge leads to effective insights.

For a discussion of how orbplus and PLS regression analysis compare, please see the orbplus publication.


Appendix 1: Troubleshooting

  1. Note: If orbplus has any problems starting up or you want to check that all chemical shift files were read correctly, see the orbplus.log file in the output directory. If orbplus started then the log file is also viewable by clicking 'Show log file' in the 'Output' menu item.

  2. **Warning: Cannot set vector for residue=n This message occurs if the residue is not assigned in the reference protein.

  3. Note: If you click the green ball in a window title to expand the window to full size, the window is not automatically raised to the top. A simple click on the expanded window raises it to the top.

  4. When contacting the authors with problems, please send us the orbplus.log file.