smartnotebook developer

Smartnotebook Developer Notes

Here are some of my answers to technical aspects of this project. Although these are developer notes, most users will also find interesting tidbits here as well.

As far as programming projects go, this one is on the edge of being manageable. Smartnotebook is a big project running on several platforms using several different programming languages and third party software. You will want to consider all these notes before you embark on your own programming projects involving smartnotebook.

Please keep me up to date on your progress. If you can determine that snb does not have the capability to support your interesting project/idea, then let me know what you need and I may be able to do something for the next release.

I need some hints in understanding how I go about adding my standalone software to smartnotebook.

Starting with a simple example, look at the file "tcl/init.tcl" and scan for the sys_run_makeshift string. This is the routine which generates shift constraints. You can see in the "tcl/sys.tcl" file how to call the executable with the appropriate arguments. In the directory "src/makeshift" a developer can test and make sure the software works correctly. I typically put all my test cases in test.run and execute one or all of them.

The second example involving orb demonstrates how you can construct your tcl based software package such that it continues to have its own development life and yet still be part of snb. Start in "tcl/init.tcl" and see how we call the orb_main routine which takes an installation directory and workspace directory for parameters. Go to "tcl/orb.tcl" and look at the way we construct the main program, its construction is fairly typical and not hard to follow. However orb is standalone software so in order to accommodate this aspect the top level program to call orb looks something like this for most unix machines and like this for macosx aqua. When parameter files, documentation or examples are needed, they are found in "lib/orb" of the installation directory. When executables are needed, they are found in "bin" of the installation directory. Again, this is fairly common practice in software development and should not present too many difficulties for developers.

You will also need to make modifications in snb to encorporate your software. For purposes of compatibility, you want to make as few changes as possible in the current *.tcl files. Your goal is to make minimal changes to "tcl/init.tcl" or "tcl/panel.tcl" (eg, define a menu button that runs your software) and place the rest in your own named tcl file. Typical examples of 3rd party tcl software being added to snb are orb.tcl and xalign.tcl.

I am more interested in improving the current product as opposed to adding existing software to it.

If you are writing routines strictly for smartnotebook then this can be problematic as you likely want to use or modify existing data structures or routines. Smartnotebook is not a mature product and it is still too early to say what parts can be set in stone. My feeling on this issue is that you should notify me of your intentions and then we would agree on a time frame where you have control of the latest software. It will be your responsibility to return the software in good shape, properly testing and documenting the changes.

Like most people involved in nmr software, I only work on this project part-time and don't want to stifle good ideas. If we could progress the software in a semi-systematic fashion, then this will mean less work in the long run. I am sure only a couple people will be interested in pursuing this product, so that is why this simple type of collaboration could work.

Good examples to look at include "tcl/folded.tcl" and "tcl/submit.tcl". These modules are basically separate except that they will use the common smartnotebook data structures and perhaps call a few standard procedures.

I am not able to compile xalign on my machine. Why not?

There may be issues with releasing source code for xalign. Please notify us if the included executables are not sufficient.

What can you tell me about the tcl code that will help me understand what is going on.

Briefly, here are some of my mannerisms for understanding the source code. As you can see, I do not use a tcl builder to build the gui.

Global Variables: The following globals are accessed throughout the software, the names start with "my". The global variables snbVar and snbConst are special in that ALL subroutines should know about them.

	mySeq - data structure based on accessing sequence information
	myConnect - data structure for all the connectivity information
	mySpectra - variables which relate to defining the current experiment
	myChain - data structure of user defined chains

	snbVar - generic global variables, all tcl routines should include
	snbConst - generic global constants, all tcl routines should include.
	   Please define global constants in "tcl/snb.tcl".

Here is more information about snbVar. If the first argument is:

fn , then it refers to a file name
help , then it refers to a help file
exp , then it refers to the list of experiments
parm , then it refers to a parameter value
panel , then it refers to a gui panel variable
windowname , then it refers to variables associated with that window
CAPITALIZED , then these variables are similar to constants

You may want to define your own category for snbVar.

You will need to take some time to understand the various relationships between mySeq, myConnect, mySpectra and myChain. These are the data structures that allow you to extract and manipulate information to present to the user. The best way to gain this knowledge is to scan the various read data routines in data.tcl to see how these data structures are created and used. As snb has evolved, I cannot be certain that the data structures reflect the most efficient/convenient way to access information.

Static Globals: These are globals with the intention of only having meaning within one tcl file. Again it allows the programmer to know at a glance which variables have no repercussions outside the current tcl file. I have decided to prefix these variables with the characters "var" and the following table associates a variable with a tcl file.

	varOption         options.tcl
	varNv             nvdraw.tcl
	varGui            panel.tcl
	varFold           folded.tcl
	varSubmit         submit.tcl
	varOrb            orb.tcl
	varXalign         xalign.tcl

Subroutine Names: When I call subroutines which are defined in a different tcl file, I almost always encorporate the filename as a prefix for the subroutine name (eg, data_read_folded is found in data.tcl, usnb_prompt_user is in usnb.tcl). Routines which do not have a prefix matching a tcl file are local procedures. Without a system like this, it is easy to forget where procedures are located and who calls them.

Tcl Files: Again I find it essential to group all tcl routines which share common dependencies or function.

assign.tcl - procedures dealing with assigning chains
chains.tcl - editing of entire chains
connect.tcl - editing of connections in a chain
data.tcl - general purpose reading of input data and writing output
fit.tcl - procedures dealing with fitting chemical shifts to the sequence
folded.tcl - procedures dealing with managing folded peaks
init.tcl - smartnotebook initialization process
nvdraw.tcl - nmrview dependency procedures
options.tcl - handles functions in the options menu
orb.tcl - prediction of sequence constraints from various input data
panel.tcl - main snb window creation
prefs.tcl - saving snb user preferences
search.tcl - handle search button queries in the snb window
sequence.tcl - handle sequence panel events in the snb window
submit.tcl - user created connectivity records
suggest.tcl - build chains from assignments in peaklists
sys.tcl - unix dependency procedures
usnb.tcl - simple utilities specifically for snb
util.tcl - generic utility tcl scripts
xalign.tcl - needed by orb for aligning homologous sequences

Can you offer me some hints on your development environment?

In the devtools directory you will see 4 small scripts which allow me to continue to develop snb with a reasonable degree of confidence. You can use these scripts too, but you'll have to customize them for your own world.

pack-compile: Run this script to compile software and create executables for the various platforms. Likely this is not an issue for you.
pack-examples: Run this script to create the examples tarfile that users can download, place it on the download site.
pack-html: I use this script to update the documentation on the website and that comes with the snb package.
pack-snb: This script bundles smartnotebook for all supported platforms, places the snb-v5.x.build.tar file on a download site.

Lib Files: Anytime your software requires parameter files, snb will expect to find these in "lib" and that is where you should put them. Place your html documentation in the devtools/docs directory.

Because example data sets are so big, it makes sense to have a separate downloadable tar file. Do not put big datasets in lib, have the installation software or system administrator decide what to do with it.

Source Files: If you are interested in sharing your compilable source code, make a directory in src and whatever you want to share place it here. You may also want to include your examples and test scripts if people are interested in testing your module separate of smartnotebook.

Here are some ideas for future smartnotebook projects.

Adding auto-assignment like software to suggest chains
Writing rules for other experiment sets
Upgrade the connection generating software (peakcon)
Don't treat peakpicking as a black box, take a more active role.
Better help facilities for the user, balloon help, better documentation
Ideas for improving the user interface, making it more clear
Figuring out how to use smartnotebook with nmrviewJ
Decide on standards which snb will adhere to, let developers know

I want to write my own rules files for generating connectivities for snb

This is not an easy task given my simple database so I am discouraging it only because you will probably be frustrated. The software I use to generate connectivities is called peakcon and you can read all about it here. I also have ideas that will greatly improve peakcon to allow more complex queries. I will eventually implement those ideas if no developer presents a better alternative to generating connections from peaklists.

That being said, snb is now more capable of handling the plethora of false positive connectivities that peakcon can generate because of filtering at the chain building stage. In the directory lib/experiments/triple-res/*.rules , you will see the "peakcon" scripts which generate particular types of connectivities. These scripts will be using corresponding rules.* input files. The rules files are all slightly different based on whether we are expecting to find Cb or not, or if weak peaks are present or not.

If your peakpick file is not picked relatively clean, the simple peakcon software will not be able to handle it.

I have a better way to generate a set of connectivities rather than using peakcon.

Super, even though you can define the connectivity record format (see lib/experiments/triple-res/hsqc-1 for an example), I believe you will find it easier to just create records in my format, at least initially. See further on in this document for that description.

The quickest way to import your connectivity files into snb is to name them con.* and placed them in the snb.out directory. If you label your connectivity records differently (eg, field 3: axx instead of dxx) then you will be able to directly compare your records to mine in the connections panel. Once you are convinced that your method is better you will want to replace my "sys_run_connect" subroutine with a call to your software (instead of my peakcon program). You may also want to change snb code so that it is responsible for gathering the files and parameters needed to call your software.

I do feel fairly strong in stipulating that new connection creation software should be able to work for arbitrary experiments.

What do the fields mean in the connectivity file format?

The following is the example which I have followed so far for the triple resonance experiment sets. Enter one connectivity per line and use white space as a delimiter. Comment lines are preceded with a "#".

Field 1:  Reference Peak Id (i-1)
Field 2:  Reference Peak Id (i)
Field 3:  Label which identifies this connectivity record type (eg, dxx, dgx)
Field 4:  Probability score of this connectivity
Field 5:  Peak Id for Ca(i-1) (hncacb)
Field 6:  Peak Id for Cb(i-1) (hncacb)
Field 7:  Peak Id for Ca(i) (hncacb)
Field 8:  Peak Id for Cb(i) (hncacb)
Field 9:  Peak Id for Ca (cbcaconh)
Field 10: Peak Id for Cb (cbcaconh)
Field 11: Not used
Field 12: Not used
Field 13: Not used
Field 14: Not used
Field 15: Intensity value of Ca peak (hncacb)
Field 16: Intensity value of Cb peak (hncacb)
Field 17: Shift value Ca(i-1)
Field 18: Shift value Cb(i-1)
Field 19: Shift value N(i-1)
Field 20: Shift value Hn(i-1) 
Field 21: Shift value Ca(i)
Field 22: Shift value Cb(i)
Field 23: Shift value N(i)
Field 24: Shift value Hn(i)

When a connectivity is displayed, I want to display additional spectral strips. Of the three spectral configurations you define, I have a 4th that would be more informative. How do I do this?

Your best bet is to study how the "cctocsy" example is configured. You will want to look at lib/experiments/triple-res/hsqc-2 and compare that configuration file to hsqc-1 in the same directory. Diff the files and read the associated documentation, it is fairly straight forward to understand how the configuration file for cctocsy was created.

Adding a new view with existing nv and xpk files is somewhat easier and for that you can compare hsqc-1 and hsqc-3.

When you have a spectral view that you think others would appreciate, email me your configuration file and it will probably be included in the next software release.

I am writing some auto-assign software, I am more interested in importing entire (assignable) chains into snb rather than just connections. What are my options here?

Absolutely, you have two choices. First, is to setup your initial reference peaklist file with your software's suggested assignments. Smartnotebook should read these in and automatically generate the suggested chains. You should look at the example which comes with the software (hsqc-suggest). Note that it is possible to have sequential assignments without a corresponding connectivity record. I think these are exceptionally interesting cases, most of the time it will be a missing peakpick but maybe it will be a software error.

Your second choice is to create the "snb.out/chains" file. Then, when you start up snb and go through the initialization process, tell snb to read in this file (screen 6 of 6). Another possibility to make debugging easier is to quit snb, copy your chains to snb.out/chains and then restart snb and see if snb can figure out what you did.

Please feel free to consult with me on this issue. Until someone actually tries to do it, I can only imagine how it would work and I design the software accordingly. I think the first time through will be a bit of a struggle.

What do the fields mean in your chains file format

It is easiest to explain the format of the chains file in terms of an example:

1 (the last chain user worked on, specify "1" if you are building this file)
# Rule  RefId(i-1) RefId(i)  Record-Identifier
87 3 87:DKDA
 85  39  dxx 0.44    97  33 365 176
 39  22  dxx 0.72   365 176 243 287
 22 140  dxx 0.53   243 287  13 128
-1    4 15-20-59-50-60
 15  20  dxx 0.39    45  58 404  89
 20  59  dxg 0.84   404  89 384 -99
 59  50  dgx 0.81   384 -99 312 232
 50  60  dxx 0.40   312 232   8 113

In this example there are 2 chains. The first chain is assigned at sequence position 87, the second chain is not assigned (-1). The numbers 3 and 4 indicate the number of connectivity records in each respective chain. The fields "87:DKDA" and "15-20-59-50-60" are simply informative labels that identify this chain in the chains menu of snb.

What follows next are the connectivity records that make up each chain (see the previous section on the connectivity record format). You will notice that the connectivity records are shortened to the first 8 fields that uniquely identify the connectivity. If you have more than 8 fields (eg, the entire connectivity record) then those fields are ignored. Any line starting with "#" is a comment line and ignored. White space is used as a delimiter for all the fields.

Why is reference ID searching with an empty chain so slow?

Note that searching can sometimes take a few seconds to calculate for an empty chain. The reason for this is that every possible position in the sequence must be tested for a fit. Also if the reference spectra ID you are examining has many connectivity variations due to overlap, then there are lots of calculations. Future versions of this software will need a method for the user to break the calculation before it finishes. Or perhaps the programming algorithm can be improved upon. The best solution to this issue does not seem obvious to author and suggestions are welcomed.

What are your standards for code revision?

Future versions of smartnotebook will follow a 3 number system x.y.z where x is a major revision, y is a minor revision and z is mostly for bug fixes.

You do not have to make any changes to your input/output when the upgrade is at the 3rd level. For example, if you upgrade from v4.0.0 to v4.0.1 you install the new software and need not worry about incompatibilities with your current environment.

Upgrades at the first or second level probably means that you should should start a brand new smartnotebook session. Do not remove the old session just yet (ie, the files in snb.out) as it is very probable that the chains and folded peaks files are still the same format.

Tip: Keep a README file of any customization changes. That way you can be confident that any problems in an upgrade of smartnotebook is not traceable to a customization you made and later forgot about.

How difficult would it be to use a different graphics interface instead of nmrview?

Probably harder than it looks. Even though nvdraw.tcl is only 260 lines of tcl code, there is still the experiment dependent nmrview code for initially defining the spectral environment (another 200 lines). For examples of this code see lib/experiments/triple-res/*.nmrview.

However nmrview is fairly popular and Bruce Johnson is furthering the software so is there a reason to use different spectral graphics?

What about merging snb into nmrview?

Smartnotebook is destined to be open software whether it gets used or not. If people want snb integrated into nmrview, I doubt I'll object. But I'm pretty certain I'm committed to the open product.

Smartnotebook will be impossible to maintain, especially if several people decide to add/change software. This is a recipe for disaster.

As far as I can see only two things can happen and neither is bad. The first is that developers are not interested or unable to succeed in adopting the software. The snb authors continue to further this software on their own timescale.

The second is that multiple flavors of the software emerge each containing an innovative array of software tools and experiment sets. Yes, this would be a maintenance mess, but with every mess there is incentive to clean-up. I believe that clean-up would be in the form of a re-designed smartnotebook model once it becomes clear what the responsibilities of the various software should be and the interested parties. This would be a good time to re-incarnate smartnotebook as "OpenNMR" or the "OpenNMR Consortium" or something similar. I think the new name could be justified given the openness of the project and the more broad view it would take. But for now, this is just dreaming...

Smartnotebook Home

This file last updated:

Questions to: bionmrwebmaster@biochem.ualberta.ca