orbplus - Structural Feature Prediction Tool

orbplus
Brian Sykes Lab

Version: 1.0.6 - Aug 24 / 2010
Download and Installation

Purpose: Structural Feature Prediction Tool

Introduction
Data
- Data Setup
Using the Software
Calculation Details and Discussion

Overview

Estimating how open or closed the input protein is
Estimate and analyze interhelical angles for each residue in the input protein
Binding affinity

First, the amino acid chemical shifts of the input protein are correlated and ranked against the target. Then, those amino acids with the highest correlation criteria are selected to predict an overall property value for the protein.

Although the software can do this calculation instantaneously, the usefulness of the software lies in the presentation of visual tools (spectral plots, chemical shift histograms, pymol modelling) which allow the user to confidently and quickly evaluate prediction results.

Authors and References

Authors:

Download and Installation

This software written in tcl/tk is built as a self contained starkit which should make installation fairly trivial.

First download the software appropriate for your machine. We support:
- MacOSX (5.6 MB) - Mac (intel and ppc)
- Linux x86 (4.6 MB) - Intel x86 32 bit
- Linux x86_64 (4.6 MB) - Intel x86 64 bit
You should also download our set of data and examples (1.1 MB) in order to understand and test the software.
If your browser did not automatically un-tar the download file, you can use a command line like:
```
	gunzip orbplus-1.0-x.tar.gz
	tar xf orbplus-1.0-x.tar
```
or try double clicking the *.gz file on your desktop and see if the OS knows what to do.
Make sure the software starts up. For macosx, you should be able to start the program by double clicking the orbplus icon now shown on your desktop (or downloads directory). I can drag the orbplus icon into /Applications and have it work but not sure if this is universally true.
For linux, click on the orbplus icon in the downloaded folder.
If orbplus did not start up, make sure you have the correct download. Quit the program and proceed to the Data Setup section.

Data Setup

There are several ways to configure data for this software. For new users, you should download download and untar the

data and examples

Create a working directory and a directory for your database. We have arbitrarily chosen to call the working directory "l48q-1" and the database "Cardiac.db".
```
> mkdir l48q-1 
> mkdir Cardiac.db 
```
Copy your input data (the protein you wish to analyze) to the working directory. In our example, this file is l48q.xpk . orbplus will accept the following input data formats:
- xpk - nmrview peaklist file
- ppm.out (nv5) - nmrview chemical shift file
- ppm.out (nvj) - nmrviewj chemical shift file
- bmrb - biomagresbank desposit format.
Currently the software does not parse input files to determine the format. You must specify the correct file suffix when entering data. If you forget to do this, then orbplus will complain that it cannot figure out how to read your input.
It is not necessary to have all atoms assigned, the software only looks for the assigned atoms. The default atom names are "HN", "H", and "N" (case does not matter). See the Startup section if you are planning to work with other atoms or atom names.
Copy the proteins used for comparison purposes to the database directory. This may include assigned (or partially assigned) proteins with residues in various global or regional states. Here is a description of the various states of the proteins in the Cardiac.db from data and examples .
- proteins which are believed to be in a closed state, eg closed.xpk
- proteins which are believed to be in a partially open state, eg cSp.xpk
- proteins which are believed to be in the most open state, eg bep.xpk
- proteins where a particular region has a distinct change in open/close state, eg w7.xpk
orbplus will accept the same input formats as the input data described above. In the next section, we describe how we quantify the state change.
Within the database directory, create an index file which indicates the relationship between various protein states in the database. Here is an example of how that file may look:
```
	# Title for modelling this activity change
	MODEL Percent Open
	
	# database entries (only 2) 
	
	PROTEIN     w7
	STATE1      closed.xpk      1-45:0 46-90:50
	STATE2      w7.xpk          1-45:50 46-90:90
	END
	
	PROTEIN     bep
	STATE1      closed.xpk      1-45:0 46-90:50
	STATE2      bep.xpk         1-45:100 46-90:100
	END
```
In the example above we have chemical shifts for 2 different proteins in closed and open states. It is believed that residues 46-90 of the closed state wild type protein (closed.xpk) are somewhat more open than the 1-45 residue region. W7 is more open and Bep (bepridil) is completely open. It is the job of the user to define the interesting regions within a protein and to quantify the differences between STATE1 and STATE2. In the above example, a completely closed structure is scored 0 and completely open is scored 100.
Defining and scoring regions requires a fair amount of expertise and insight from the user. Other examples to examine include:
- Cardiac.db/Index.oc - the above example using more proteins in the database
- Cardiac.db/Index.oc.format - demonstrates other input formats and no regions
- Cardiac.db/Index.iha - a more data oriented approach to open/close states using interhelical angles
- Cardiac.db/Index.iha.simple - interhelical angles without defined regions
Here is a more formal explanation of the lines in this file:
```
	MODEL  title
	PROTEIN  name
	STATE1 file_name residue_range:score residue_range:score ...
	STATE2 file_name residue_range:score residue_range:score ...
	END  
```
MODEL is an arbitrary name for the experiment type, and PROTEIN is the name of the protein. Blank lines are ignored as well as lines which start with "#".
Optional: Copy to your working directory a structural model in the form of a pdb file which best exemplifies your input protein (eg, a closed state protein like Cardiac.db/closed.pdb). The pymol software can then be used to hilite residues chosen by the software on a structural model.

Startup

As mentioned in the installation, the user can click on the orbplus icon or invoke orbplus from a terminal window.

If you invoke orbplus via the command line then you have the choice of entering command line arguments:


orbplus [-defaults defaultsFile] [-db dbIndexFile] [-data InputFile]

	defaultsFile: custom program preferences file. 
	dbIndexFile: database index file as described in data setup section
	inputFile: input data to analyze

The advantage of the command line is that you can specify the inputs and customize the software for specific environments. Here is a commented version from

l48q-2/defaults

startup script

orbplus

defaults file

lib/defaults

http://www.tcl.tk/man/tcl/TkCmd/colors.htm

Input

This section is required if you have not told orbplus where to find the input data via the defaults file as described above.

Enter the input protein and database index fields as shown in the example below. Use the Browse button to search for these files on your computer if necessary. In the example below, the database index file is found in orbplus.data/Cardiac.db/Index.oc and input protein chemical shifts are in orbplus.data/l48q-2/l48q.xpk .


 Figure 1

The Residue Tableau

After the software successfully reads in the input, it places all the residues in the target protein in a tableau and hilites those residues (in red) which are determined as being the best for prediction purposes. See the

prediction calculation


 Figure 2

Top Residues

prediction calculation

Manual Select

Calculation Results

As residues are selected/unselected, users will see a summary of how each residue contributes to the overall prediction score.


 Figure 3

A full explanation of this table is given in the

prediction calculation

Spectral Plots

Averaged


 Figure 4

The Target Protein

Often it is insightful to change which proteins in the database are included in the formation of the target protein. For example, in the spectral plot above a user may wonder if the f77w-v82a protein unduly biases the calculated Averaged chemical shift. Under the Database menu in the main orbplus window, the user can checkmark the proteins to use in the database.


 Figure 5

The diagram below shows the new target protein for residue 28 with the unselected f77w-v82a protein in pink and the new value of the Averaged chemical shift (compare with figure 4).


 Figure 6

Users may also be interested in the role an individual protein target contributes to the residue selection of the prediction calculation . The following set of buttons allow the user to cycle the database (eg, Next, Previous or return to Averaged) to get more insight in the residue ranking process. All calculations and windows are updated automatically.


 Figure 7

Averaged

The Chemical Shift Profile Histogram

The histogram attempts to summarize the information contained in each expanded spectral plot for all residues.

The colors blue and red indicate the magnitude of the chemical shift change of the target and input protein respectively. The white and black lines found within each histogram bar signify the direction of the input protein chemical shift with respect to the target protein. White is a positive correlation, black is negative. In the example below, one can see regions of large chemical shifts with high correlation (ie, residues 27-35) but perhaps not surprisingly there are regions of anti-correlation (ie, 38-51) which may also have statistical and structural significance.


 Figure 8

The chemical shift profile as depicted in the histogram can provide interesting trends and insights as the user cycles through the database of target proteins . For example, the profile of the l48q input with the f77w-v82a database protein is strikingly different than the Averaged or other target proteins (compare figures 8 and 9).


 Figure 9

Display Residues in pymol

The default location for the structural display software is in /usr/local/bin/pymol . A user that prefers pymol to be in a different location should use a custom defaults file like

l48q-2/defaults

lib/defaults

orbplus

Press the Structural Display button to see a spatial respresentation of the selected residues on a pdb structure. Unless you have copied the pdb file to your current working directory, the software allows you to browse the computer to find the file. If you have a pdb structure, now is a good time to analyze the relevance of the residues selected in the prediction calculation.

Prediction Calculation

orbplus will attempt to estimate the close/open property of an input protein by comparing the corresponding chemical shifts from a subset of residues in the target protein. First, the software attempts to rank the residues from best to worst predictive value to aid in the residue selection process. The user will then decide how many top residues to include or perhaps use the information provided to manually make a different selection.

Residues are ranked on the following criteria.

The magnitude chemical shift change from the closed to open state of the input protein. Assume it is reasonable to use the closed state chemical shift data as a reference starting point for the input protein.
The directional chemical shift change between input and target proteins. Again, assume it is reasonable to use the closed state chemical shift data as a reference starting point for the input protein.
The magnitude chemical shift change from the closed to open state of the target protein.
The variance of all the chemical shift changes from the closed to open state of an Averaged target protein.

The residues are ranked from best to worst in each criteria and a residue selection table similar to the one below is shown. Columns 4 - 7 respectively reflect the criteria mentioned above. The overall rank of a residue (see column 2) is calculated as a weighted average of the individual ranks of columns 4 - 7. Unless explicitly set in the defaults file each criteria carries equal weighting. Column 3 is an estimate of the percent open score calculated as the component of the input close/open vector projected onto the target vector. Column 8 is the number of chemical shift data available in the target protein for computing the variance.


 Figure 10

As further explanation, here is an example of how one can analyze the above table. Residue 29 is ranked #1 because 3+2+9+5=19 and residue 28 is next at 1+12+1+31=45. Although residue 28 has the greatest chemical shift change, it is somewhat penalized for the variance (31/82) of its chemical shifts in the Averaged target protein. Also of note is residue 27 which only has two assigned chemical shifts in the Averaged target protein and they are not close together (66/82). Note that comparing the variance of populations of unequal sample size involves multiplying by the appropriate t Distribution value.

The overall Percent Open value as shown on the main panel (see figure 2) is a simple unweighted average of column 3 in the residue selection table.

Future Development and Conclusions

A more complex algorithm for inferring structural properties based on chemical shift only is likely not warranted at this time. It can be argued that the software can do a better job of prediction if we give higher weights to higher ranked residues, or weight the database proteins unequally, or devise formulas that work with scores and not ranks. However the main goal of the software is to present a simple model which a user can understand.
It is also postulated that a comprehensive pairwise comparison of all database entries would allow the software to make reasonable estimations for each protein close/open state, once a reference protein is established. For example, if structural models of protein A and protein B lead one to conclude that both are completely open, yet the chemical shift movement in key residues may be following a discernable pattern which suggests one is less open than the other.
Although the initial impetus for this software was to predict close/open states of proteins with assigned chemical shift data, other interesting structural questions involving transitional states (eg, binding/non-binding) may also be readily applicable with this software.

Questions

The variance orbplus uses to rank the database entries; is this simply determined by summing the standard deviations in the x and y coordinates between all entries and then subsequently ranked by lowest overall standard deviation? Yes, except all entries are scaled first so that the averaged vector is the same length for each residue. If you don't do this, then you bias your results to select residues that are not undergoing any change. Scaling does have the effect of magnifying the measurement error so the smaller residue shifters are negatively biased. Given the current use of orbplus, there is no impetus to reconcile this bias.
How is the average vector calculated? What angle is associated with the vector? The software assumes that all input and database data have a common closed state (as identified in the database index file). Given this assumption, the average vector for a particular residue is the average of the x and y coordinates of all the available and active data for that residue in the database. Once you have calculated the average vector, finding the angle with the input data vector is standard trigonometry.

Table of Contents

Overview

Authors and References

Download and Installation

Data Setup

Startup

Input

The Residue Tableau

Calculation Results

Spectral Plots

The Target Protein

The Chemical Shift Profile Histogram

Display Residues in pymol

Prediction Calculation

Future Development and Conclusions

Questions