orbplus
Brian Sykes Lab

Version: 1.0.6 - Aug 24 / 2010
Download and Installation

Purpose: Structural Feature Prediction Tool

Table of Contents

  1. Introduction

  2. Data

  3. Using the Software

  4. Calculation Details and Discussion

Overview


Authors and References


Download and Installation

This software written in tcl/tk is built as a self contained starkit which should make installation fairly trivial.

  1. First download the software appropriate for your machine. We support:

  2. You should also download our set of data and examples (1.1 MB) in order to understand and test the software.

  3. If your browser did not automatically un-tar the download file, you can use a command line like:
    	gunzip orbplus-1.0-x.tar.gz
    	tar xf orbplus-1.0-x.tar
    
    or try double clicking the *.gz file on your desktop and see if the OS knows what to do.

  4. Make sure the software starts up. For macosx, you should be able to start the program by double clicking the orbplus icon now shown on your desktop (or downloads directory). I can drag the orbplus icon into /Applications and have it work but not sure if this is universally true.

    For linux, click on the orbplus icon in the downloaded folder.

    If orbplus did not start up, make sure you have the correct download. Quit the program and proceed to the Data Setup section.


Data Setup

    There are several ways to configure data for this software. For new users, you should download download and untar the data and examples directory to follow the examples presented here.

  1. Create a working directory and a directory for your database. We have arbitrarily chosen to call the working directory "l48q-1" and the database "Cardiac.db".
    > mkdir l48q-1 
    > mkdir Cardiac.db 
    

  2. Copy your input data (the protein you wish to analyze) to the working directory. In our example, this file is l48q.xpk . orbplus will accept the following input data formats:

    Currently the software does not parse input files to determine the format. You must specify the correct file suffix when entering data. If you forget to do this, then orbplus will complain that it cannot figure out how to read your input.

    It is not necessary to have all atoms assigned, the software only looks for the assigned atoms. The default atom names are "HN", "H", and "N" (case does not matter). See the Startup section if you are planning to work with other atoms or atom names.

  3. Copy the proteins used for comparison purposes to the database directory. This may include assigned (or partially assigned) proteins with residues in various global or regional states. Here is a description of the various states of the proteins in the Cardiac.db from data and examples .

    orbplus will accept the same input formats as the input data described above. In the next section, we describe how we quantify the state change.

  4. Within the database directory, create an index file which indicates the relationship between various protein states in the database. Here is an example of how that file may look:
    	# Title for modelling this activity change
    	MODEL Percent Open
    	
    	# database entries (only 2) 
    	
    	PROTEIN     w7
    	STATE1      closed.xpk      1-45:0 46-90:50
    	STATE2      w7.xpk          1-45:50 46-90:90
    	END
    	
    	PROTEIN     bep
    	STATE1      closed.xpk      1-45:0 46-90:50
    	STATE2      bep.xpk         1-45:100 46-90:100
    	END
    
    
    In the example above we have chemical shifts for 2 different proteins in closed and open states. It is believed that residues 46-90 of the closed state wild type protein (closed.xpk) are somewhat more open than the 1-45 residue region. W7 is more open and Bep (bepridil) is completely open. It is the job of the user to define the interesting regions within a protein and to quantify the differences between STATE1 and STATE2. In the above example, a completely closed structure is scored 0 and completely open is scored 100.

    Defining and scoring regions requires a fair amount of expertise and insight from the user. Other examples to examine include:

    Here is a more formal explanation of the lines in this file:

    	MODEL  title
    	PROTEIN  name
    	STATE1 file_name residue_range:score residue_range:score ...
    	STATE2 file_name residue_range:score residue_range:score ...
    	END  
    

    MODEL is an arbitrary name for the experiment type, and PROTEIN is the name of the protein. Blank lines are ignored as well as lines which start with "#".

  5. Optional: Copy to your working directory a structural model in the form of a pdb file which best exemplifies your input protein (eg, a closed state protein like Cardiac.db/closed.pdb). The pymol software can then be used to hilite residues chosen by the software on a structural model.

Startup


Input


The Residue Tableau


Calculation Results


Spectral Plots


The Target Protein


The Chemical Shift Profile Histogram


Display Residues in pymol


Prediction Calculation

orbplus will attempt to estimate the close/open property of an input protein by comparing the corresponding chemical shifts from a subset of residues in the target protein. First, the software attempts to rank the residues from best to worst predictive value to aid in the residue selection process. The user will then decide how many top residues to include or perhaps use the information provided to manually make a different selection.

Residues are ranked on the following criteria.

  1. The magnitude chemical shift change from the closed to open state of the input protein. Assume it is reasonable to use the closed state chemical shift data as a reference starting point for the input protein.

  2. The directional chemical shift change between input and target proteins. Again, assume it is reasonable to use the closed state chemical shift data as a reference starting point for the input protein.

  3. The magnitude chemical shift change from the closed to open state of the target protein.

  4. The variance of all the chemical shift changes from the closed to open state of an Averaged target protein.

The residues are ranked from best to worst in each criteria and a residue selection table similar to the one below is shown. Columns 4 - 7 respectively reflect the criteria mentioned above. The overall rank of a residue (see column 2) is calculated as a weighted average of the individual ranks of columns 4 - 7. Unless explicitly set in the defaults file each criteria carries equal weighting. Column 3 is an estimate of the percent open score calculated as the component of the input close/open vector projected onto the target vector. Column 8 is the number of chemical shift data available in the target protein for computing the variance.


 Figure 10 

As further explanation, here is an example of how one can analyze the above table. Residue 29 is ranked #1 because 3+2+9+5=19 and residue 28 is next at 1+12+1+31=45. Although residue 28 has the greatest chemical shift change, it is somewhat penalized for the variance (31/82) of its chemical shifts in the Averaged target protein. Also of note is residue 27 which only has two assigned chemical shifts in the Averaged target protein and they are not close together (66/82). Note that comparing the variance of populations of unequal sample size involves multiplying by the appropriate t Distribution value.

The overall Percent Open value as shown on the main panel (see figure 2) is a simple unweighted average of column 3 in the residue selection table.


Future Development and Conclusions

  1. A more complex algorithm for inferring structural properties based on chemical shift only is likely not warranted at this time. It can be argued that the software can do a better job of prediction if we give higher weights to higher ranked residues, or weight the database proteins unequally, or devise formulas that work with scores and not ranks. However the main goal of the software is to present a simple model which a user can understand.

  2. It is also postulated that a comprehensive pairwise comparison of all database entries would allow the software to make reasonable estimations for each protein close/open state, once a reference protein is established. For example, if structural models of protein A and protein B lead one to conclude that both are completely open, yet the chemical shift movement in key residues may be following a discernable pattern which suggests one is less open than the other.

  3. Although the initial impetus for this software was to predict close/open states of proteins with assigned chemical shift data, other interesting structural questions involving transitional states (eg, binding/non-binding) may also be readily applicable with this software.


Questions

  1. The variance orbplus uses to rank the database entries; is this simply determined by summing the standard deviations in the x and y coordinates between all entries and then subsequently ranked by lowest overall standard deviation? Yes, except all entries are scaled first so that the averaged vector is the same length for each residue. If you don't do this, then you bias your results to select residues that are not undergoing any change. Scaling does have the effect of magnifying the measurement error so the smaller residue shifters are negatively biased. Given the current use of orbplus, there is no impetus to reconcile this bias.

  2. How is the average vector calculated? What angle is associated with the vector? The software assumes that all input and database data have a common closed state (as identified in the database index file). Given this assumption, the average vector for a particular residue is the average of the x and y coordinates of all the available and active data for that residue in the database. Once you have calculated the average vector, finding the angle with the input data vector is standard trigonometry.