xcrvfit - curve fitting tool

xcrvfit
Brian Sykes Lab

Version: 5.0.3 - Jun 2012
Download and Installation

Purpose: A graphical X-windows program for binding curve studies and NMR spectroscopic analysis.

Latest News
Overview and References
Download and Installation
Basic Use and Tutorial
Fitting Functions
Input Files
Output Files
Multiple plots
Data Overlays
Data Exclusion
Parameter Grid Fitting
Settings and Preferences
Customizing your Plots
Appendix: History of xcrvfit

Overview and References

The power of xcrvfit lies in its convenience to process multiple datasets of various formats, the ability to experiment with fitting parameter scenarios, and the ability to customize the graphics. The program runs quickly and on any platform supporting X-windows. Output from the program includes a graph of the function through the data, the rmse of the fit, a measure of the sensitivity of each fitting function parameter, and a table showing how each datapoint fits the function. The current version allows users to overlay or place multiple fits on one plot.

Authors: Robert Boyko, Brian Sykes
This software is inhouse and currently not published. You can reference it in the following fashion for now:

xcrvfit: a graphical X-windows program for binding curve studies and NMR
spectroscopic analysis, developed by Boyko, R. and Sykes, B.D.  (University of Alberta).
website: http://www.bionmr.ualberta.ca/bds/software/xcrvfit

Download and Installation

This software is a self contained tcl/tk starkit which should make installation fairly trivial. See the Release notes for the latest updates. Download the software appropriate for your machine:
- MacOSX (13 MB) - Mac (intel)
- MacOSX (13 MB) - Mac (ppc)
- Linux x86 (11 MB) - Intel x86
- Linux x86_64 (11 MB) - Intel x86 64 bit
Macosx - Double click the download file. You should now get a new file on the desktop (or your download directory) represented by the xcrvfit graph icon. You may want to drag the xcrvfit icon into a more permanent directory like /Applications or keep it on your desktop. Double click on the icon to run the program.
Linux - Same as the macosx procedure but you may have to explicitly drag the directory from the opened window to the desktop. If you do not get an xcrvfit folder, then un-tar the download file (eg, tar xvf xcrvfit*.tar) and move it to your Desktop.
See install.notes for other ways to install the software and other technical notes.

Basic Usage and Examples for New Users

(Optional) Download our examples It is very helpful to see how xcrvfit works with various examples before trying it out on your own data. For examples, please download our data and examples (2.7MB) file and double click (or un-tar) this file to create the xcrvfit-5.0-x-examples folder. You can run an example by double clicking on the xcrvfit.command file packaged within each example directory. Note: For linux, you may get a dialog asking what to do next, just click on the "Run" button.
Create a directory and place your datafile in it.
Start the program.
Double click the xcrvfit icon or see these notes for command line usage. If you do not get a graphical window, re-visit the installation section of this manual.
Enter your datafile.
Use the entry box provided or use the "Browse" button to find it on your file system. Here is example data that you can use. For an explanation of the various data formats that xcrvfit recognizes see the Input Files section. Use the view button to verify your input data file.
Select the fitting function.
Click and hold the button currently labelled "Line" and select from our current collection of functions. For a more detailed explanation of these functions see Fitting Functions . Once the function has been selected, the program displays the appropriate parameter fields.
Enter each starting parameter value in the box provided and then press the "Show Fit" button.
A graph of the fitting function is displayed along with a root mean square error value. If you do not see the fitting function, it could be that your starting parameters are not very close.
Press the "Best Fit" button to calculate the best fit of your function to the data.
Some functions will converge to the best fit function regardless of the starting parameters. Others are very sensitive and a close approximation is required first. If you have difficulties getting convergence try holding one of more parameters as a constant.
Click the Results button.
Here you can see how every datapoint corresponds to the fitting function. Also, you can see how sensitive each fitting parameter is by looking at the StdDev field.

Fitting Functions

Here is a summary description of all the fitting functions in this version of xcrvfit

However you will likely find that the Details button next to the Fit Function on the main panel is the best way to figure out the xcrvfit functions. Here we may show you multiple test cases for each specific function, where each test case includes the readme files, input data, output files and plots, and application defaults files (which allow you to understand how we got the results we did). All the examples presented here are those that are found in our examples download .

Input Files

The xcrvfit program can read data files which are in one of the following formats.

crvfit

Example 1 : In the simplest case, the text file contains only one dataset. The x-values are in column 1 and y-values in column 2 with at least one white space character separating each number.
Example 2 : Some functions require the input of second independent variable. For eample, the XY2 function which models titration data takes into account the measurements of protein dilution. Any secondary x-values are entered in the 3rd column.
Example 3 : Multiple datasets can be placed in one file however an identifying label must be found on the line preceding each data set. The alphanumeric label cannot have spaces (white space).
Example 4 : It is also possible to specify measurement errors for any or all data points. Following on the same line as the data point, the following four data columns are specified:

the value to subtract from x to get a measured lower bound of x.
the value to add to x to get a measured upper bound of x.
the value to subtract from y to get a measured lower bound of y.
the value to add to y to get a measured upper bound of y.
Currently the specification of measurement errors does not affect any of the algorithms used in best fitting. It is simply a visual aid that can help a user to evaluate the legitimacy of proposed fits. This visual aid can be turned off/on via the settings menu.
Example 5 : Older versions of xcrvfit sometimes had issues reading data files created on MS Windows because of control M characters at the end of each line. This issue should now be resolved.
kay
In this format the x-values are placed on the first line and each subsequent line contains an identifier (likely the amino acid number) and the corresponding y-values. Any entry preceded with a "#" is treated as a missing value in the table.
Example 6 : note that identifier (first column) can be either numeric or a label
Example 7 : with missing data values
jfit or fitspec
This format refers to Varian's vnmr 'fitspec.data' file format. The first entry is the number of points, the second is start of plot, the third is width of plot. The following lines contain each y-value, one per line.
Example 8
fp
Example 9 : This format refers to Varian's vnmr 'fp.out' file format.

Note:

crvfit

Output Files

First, xcrvfit tries to figure out the name of the directory which contains all the output files. If the user provides no information about this, then the default place is $HOME/Desktop/xcrvfit_out . It is usually good practice to specify the output directory in the

xcrvfit.defaults

Below is an example results file that is generated when the user hits the Best Fit button and convergence is achieved. Here we can see the total root mean square error of the data to the fitting function as well as the values/error estimates for the parameters. The "Point by Point Analysis" indicates the difference between the calculated and measured y-values. Click the Save button to write the results to a file you specify.

Settings and Preferences

It takes a fair amount of work to setup xcrvfit. You have to tell the software where the data is, the fitting function and reasonable starting values, or even change plotting variables to get the nicest looking plot possible. Once you have it, you do not want to do it again.

Click on File->Save Current Settings and the software asks you where to save these defaults. The most logical place to save your settings is in the same directory as your data input. Do this and call it xcrvfit.defaults.

If you want to start xcrvfit by clicking on an icon, say in /Applications, then how will the software know where to read in your saved defaults? In this case, take a close look at the examples data that comes with the software . Copy xcrvfit.command to the directory that contains your data and run this command (either in a terminal or clicking on the icon). This is the best way to organize your multiple xcrvfit runs.

However, if you insist on taking a more short-sighted view, you can save xcrvfit preferences to $HOME/.xcrvfit.defaults. Now when you click the xcrvfit icon and it will read your preferences.

Here are some of the most common edits people may wish to make on the xcrvfit.defaults file:

# where is your data file  (can be full pathname)
*data_file:     data.fp

# where is your output directory (can be full pathname)
*output_dir:    xcrvfit_out

# Selected function and starting parameter values
*func_key:      EXPON
*func_parms:    100.0:-0.1:-8

# customize graphing parameters to get a nicer looking fit/data plot (plot 0)
*y_min_0:       -20.0
*y_max_0:       120
*y_tics_0:      8
*x_min_0:       0.0
*x_max_0:       70.0
*x_tics_0:      8

Customizing Your Plots

Plot-Settings

Most of the items are self-descriptive. Here are the ones that may require some explanation:

Apply Changes to: Current Data vs Current Plot vs All Plots
Show Data Error Bars
Graph Label - This is the title of the graph
Data Label
Left/Right/Top/Bottom Margins
xFormat / yFormat
Fitting Points
Plot Window Geometry

Multiple Plots

First, read in your multi-dataset datafile, set the fitting function and decide which fitting parameters should be held constant. Create a single plot of the first dataset you wish to fit.
Then select the Options->Multiple Plots menu item which brings up a multiple-plots window. Because we will make frequent clicks in this window, move it to an uncluttered section of your screen.
Proceed by clicking the Add a New Plot button. Immediately the next dataset is displayed in a new plot on the xcrvfit-plot window. Back to the main panel, you may need to use the Next and Previous buttons or Label entry box to set the desired dataset. In this case we have settled on dataset res_13. Click the Best Fit button to see the best way to fit this data.
Continue this process until you have the number of plots you wish to display. Resize the plotting window to a size that looks good.
Use the Plot-Settings->xcrvfit-plot menu item to make any changes. In this case, we can see there are too many tic marks especially on the x-axis. We also decide to lower the margin values for a more compact look and change the background to white. After making these changes, we use Save Plot to save our image in a jpeg, tiff or ps format.

The Next and Prev buttons are used to select the current plot. You can tell which plot is current because the data flashes in the plot whenever the button is pressed. Once you have selected a current plot, you can Remove Current Plot or you can click the Plot-settings -> xcrvfit-plot menu item to make plotting parameter changes which apply only for this plot.

For small numbers of plots, you can use the Plot Layout menu item to manually set the dimension for the plots in the plotting window. The default Plot Layout is to increase the number of columns by one whenever the row and column values are equal. If we need another plot and we have more columns than rows, then we draw the next plot in a new row.

Data Overlays

First, read in your multi-dataset datafile, set the fitting function and decide which fitting parameters should be held constant. Create a single plot of the first dataset you wish to fit.
Select the Data Overlays item from the Options menu and drag this window to an open space on your screen.
Proceed by clicking the New Overlay button. Immediately the next dataset is displayed on the plotting window. If this is not the dataset you want, go back to the main xcrvfit panel and use the Next and Previous buttons to select it.
Now is a good time to set the data symbol, colors and any other attribute for this dataset. Click the Settings->xcrvfit-plot menu item. If you want to duplicate this example, set the data symbol to oval and fitting function to green.
Back to the xcrvfit main panel, set the function and parameters and use Show Fit or Best Fit to display the fitting function.
Repeat steps 3 through 5 for any additional datasets.
To tidy up the graph, I set the Apply Changes to item to All Plots and fix any global plot attributes (eg, xMax). Finally I may decide to add a graph title, resize the window and click drag all the labels to their final spot before hitting the Save Plot button. These options can also be set ahead of time in the xcrvfit.defaults file as depicted in the T2_RELAX->test1 example case.

What happens if I make a mistake and want to change some attribute involving a previous dataset? Use the Data Overlays->Set Current Data button to select this dataset. You can tell which dataset is current because the data flashes in the plot whenever the button is pressed.

Data Exclusion

The purpose of the Options -> Data: Select Range menu item is to allow the user to change what y-values are included/excluded from the curve fitting process. As shown in the plot above, this could mean cutting the tails of noisy data, or it could also mean the elimination of data points which are suspect of being in error.

To understand how to mimic the plot above, start by bringing up the data-range window.

First we eliminate all the data points by clicking the All radio button and followed by the Exclude button. Then we click the Range button and use the mouse to move both slider buttons to determine the area of the curve that we want to fit. Press the Include button and now all points used in the fitting calculation are displayed in black. Finally click the Best Fit button to see how this graph looks when compared to fitting all the data.

To elimate specific points from the fitting calculation, the user selects Range, positions the sliders, but this time presses Exclude. Here is an example of a how a fitting curve may look before and after we have excluded the possibly erronous data point.

Parameter Grid Fitting

What happens when the user has several valid datasets that essentially model the same process? The xcrvfit software has a special way to model one or two parameters which are held constant at various levels on all the data, and then report the value of the parameter(s) which had the least total error over all the datasets.

First, read in your multi-dataset datafile and set the fitting function. Click on the Multiple Data Best Fit button and you are presented with the first window below. For this particular test, we are interested in the values of kd1 and kd2 that best fit all the data we selected.

As shown below, we have decided to only consider the first 7 datasets (hilited in pink). Note that the multfit-select window is drawn to help the user visualize how the current dataset values compare to the others. Since the data is in kay format, the x-values are all the same for each y-value (that is why all the data values are stacked in distinct vertical lines). This is a relatively effective and easy way for the user to evaluate the credibility of each dataset in the test suite.

Then we give initial values to S_LM1 and S_LM2 and by specifying "float" we allow the fitting algorithm to choose these parameters to minimize the error in the fitting function for each particular dataset. Meanwhile, we are interested in not only holding kd1 constant, but that constant will start at a lower bound and will be stepped up by an increment until the upper bound is reached. The same is done for kd2. Press the Go button to enable the calculations.

The window presented below is the output from the calculations where we can see the best fitting parameters for the chosen datasets. The top set of numbers indicates the overall lowest sum of squares of error for all our selected residues. This is achieved somewhere in the vicinity of kd1=2.1e-5 and kd2=0.009 (remember we taking relatively large steps between values of kd1 and kd2).

At this point it is interesting to see a plot of total error at each kd1,kd2 pair. In order to do this, the xcrvfit software has been written to use the scientific plotting package plplot . The plot is shown when the user clicks Plot Sum of Square Errors. Here we can get a sense of the trough in the surface plot where values of kd1 and kd2 give essentially the same solution to the fitting problem.

The plplot software is capable of producing many types of 3D plots, however we have only written a module to encorporate surface plots. Also plplot may be complex to install for some users so we have included an executable version (5.9.9) within the xcrvfit application.

The xcrvfit software presents a limited number of settings for configuring the 3D plot under Plot-settings -> multfit-3d. Contour plots are currently not supported and to change the point of view for viewing the surface plot, try changing values for Surface Plot Alt and Surface Plot Azimuth.

Finally users will be interested to see what kinds of plots are generated for the individual datasets using the overall best values of kd1 and kd2. Press the Show Fit for each Dataset button to get the window below. In this particular case, all the datasets can be fit quite nicely without any outliers. Note that you will probably want to change some of the Plot-Settings -> multfit-plot options and save the changes with File -> Save Current Settings . At this point, click on the Show Fit for each Dataset button and now we can effectively analyze how our global best fit at kd1=.0001, kd2=.009 works with each individual dataset. Do some datasets not fit well and why might that be?

Appendix: History of xcrvfit

Crvfit

Fast Crvfit

CrvfitS

2 sites, Metal 2 displaces Metal 1, slow exchange
2 sites, Metal 1 binds apo, slow exchange
2 sites, Metal 1 binds apo, fast exchange

Crvfit_nmrdata

Dimerfit

Best Fit Multiple Data

Jfit

Back to Software Centre

This file last updated:

Questions to: bionmrwebmaster@biochem.ualberta.ca

Table of Contents