HOW TO USE THIS MANUAL

    The SEQSEE manual (Version 1.2) you are now reading is composed 
of15 parts.  The first seven sections are intended to serve as a general 
introduction and describe such aspects as system requirements, installation 
procedures and menu operations.  Sections VIII and IX offer more detailed 
descriptions on how to run SEQSEE with section IX offering a fully 
documented tutorial to assist first-time users in developing an 
understanding of SEQSEE.  Section X of this manual offers explicit examples of 
input and output for each of the SEQSEE menu options.  Sections XI, XII and 
XIII describe SEQSEE file structures, libraries, databases, help facilities and 
general UNIX commands.  Section XV offers a "Q & A" tutorial, suggestions for 
trouble shooting and other potentially helpful pieces of advice.  A list of 
recommended readings and references is also included at the end of this 
manual along with two appendices.  Appendix 1 is a copy of the SEQSEE 
control file (seqsee.parms) while Appendix 2 provides a detailed explanation 
of the "STATS" output.
      We DO NOT recommend that users read through this manual cover-
to-cover, instead,we suggest that all first-time users should make an effort 
to read through the first 30 or 40 pages of this document, including the 
tutorial (although sections III and IV may be ignored).  Those who wish to 
learn more about the program or who are having difficulty in understanding 
the I/O operations for any one of the SEQSEE functions are invited to browse 
through sections X, XI and XII.  If a user wishes to modify any one of the 
databases or to change the default parameters in the SEQSEE control file, we 
suggest that the user carefully read section XIII and/or Appendix 1.  If a 
user wishes to learn more about the actual operation of any one of the 
algorithms, this may be done by following up on the recommended readings 
or by directing their inquiries to bionmr@biochem.ualberta.ca:


I. INTRODUCTION

    The past few years have seen an explosion in the use of computers in 
molecular biology.  This is in no small part due to the vast quantity of raw 
protein and nucleic acid sequence data that has been generated in the last 
decade.  For example, since 1980 the number of sequences in the PIR 
databank alone has increased from less than 2000 to almost 50,000 separate 
entries today.  Without the aid of a computer it would simply be impossible 
to attempt to analyze or categorize this huge reservoir of biological 
information or to manage the rapid influx of new sequence data which is 
being generated every single day.
    Fortunately, through the establishment of publicly funded databanks 
such as GENBANK, SWISS-PROT, NBRF-PIR and EMBL, most of us have been 
spared the headache of keeping up with this information explosion.  Now, 
much of this sequence data is readily available, in computer readable format, 
to any scientist who wishes to subscribe to it.  While the development of 
these centralized databases has certainly helped in the rapid dissemination 
of sequence information, it has not necessarily solved the problem of its 
rapid "assimilation".  As a consequence, a great deal of privately funded 
effort has been directed towards the development of software (and 
hardware) which would permit molecular biologists to quickly compare, 
analyze and otherwise dissect sequence data in a useful or informative 
manner.  Programs or programming suites such as Intelligenetics' IG suite, 
Wisconsin's GCG suite, IBI's MacVector, and others are now widely available 
for this purpose and are typically designed to run on computers of all sizes 
and shapes including IBM PC's, Macintosh's, VAX's, and SUN workstations.  
Many of these larger and more costly programs permit flexible sequence 
manipulations such as database searching, aligning, comparing and matching 
-- all at the touch of a button.
    It is a result of the widespread implementation of these software 
packages (in conjunction with their accompanying databases) that a number 
of extremely important and very useful "discoveries" have been made.  
These include, just to mention a few, the identification of a number of new 
and important receptor families, the identification of numerous repetitive or 
recurrent folding "modules", the identification of various oncongenic 
products and the establishment of evolutionary relatedness between 
hundreds of previously unidentified or poorly understood protein products 
(see Doolittle, 1990 and references therein).
    The success that molecular biologists have had at performing simple, 
yet important "computer experiments" has induced others, including X-ray 
crystallographers, NMR spectroscopists, protein engineers and evolutionary 
biologists to begin using or adapting these same software packages to help 
answer specific questions of their own.  In particular, it is now becoming 
increasingly common to see some of the more advanced sequence analysis 
programs (involving multiple sequence alignments and advanced structure 
prediction algorithms) being used to predict the tertiary structure of 
previously uncharacterized proteins (Schulz, 1988).  Likewise, the push to 
develop more efficient methods for identifying the potential function or 
active site location of newly isolated proteins is leading to the development 
of methods which are, in effect, redefining the meaning of "homology" or 
"sequence similarity" (Gribskov et al., 1987).  In addition, new databases 
containing secondary structural information, phi and psi angles, torsion angle 
restraints, NMR chemical shifts, sequence motifs and the like, are continually 
being added to the current software arsenal to permit even more diverse 
inquiries and analyses.
    Quite clearly the development of sequence analysis packages is 
entering into a phase of very rapid expansion with many new and 
unforeseen applications being proposed for an increasingly diverse and 
substantially larger investigative population.  Of course with this rapid 
expansion comes the usual problems of limited availability, restricted 
usability and increased costs of most sequence analysis software products.  
As a result, a rather problematic "software stratification" is developing in 
the field with the most powerful (and most useful) packages becoming more and 
more expensive while the freely available, unintegrated "shareware" 
products are becoming less and less useful.  In response both to this program 
diversification and this software stratification we have endeavored to 
develop a publicly available software package (called SEQSEE - SEQuence 
SEEker) which offers the program diversity of the expensive packages at the 
"cost" of the freely available shareware products.
    Specifically SEQSEE is a multi-purpose menu-driven suite of programs 
designed to provide a fully integrated, state-of-the-art package for the 
analysis and display of protein sequences and protein databases.  It has been 
designed with considerable flexibility in mind so as to permit the addition of 
new features and new algorithms when they are developed or as they are 
reported in the literature.  It contains many of the features available in some 
the most comprehensive commercially available programs such as rapid 
database searching, flexible pattern matching and multiple sequence 
alignment.  It also contains a large number of structural analysis and 
prediction programs which have been enhanced through the incorporation of 
several unique databases.  In this regard, SEQSEE has been developed 
expressly from the point of view of those protein chemists who are 
interested in questions pertaining to both structure and function.  As a 
result, we believe SEQSEE offers a number of important enhancements and 
many unique advantages over what is typically found in other commercially 
available software packages.  SEQSEE has already been used in the analyses 
of fibroma/myxoma viral products (Upton et al., 1992, 1993), cystic fibrosis 
gene products, fish anti-freeze proteins (Sonnichsen et al., 1993) and a 
variety of growth factors.  In many respects SEQSEE has performed beyond 
our expectations and it is as a consequence of its consistent (and sometimes 
unexpected) success that we believe it should be made freely available to all 
members of the scientific community.  We hope you will find SEQSEE 
as useful in your work as we have found it in ours.


II. BEFORE YOU BEGIN

1) Please make sure you have read the USER NOTICE.

2) Try to read the first 30 pages of the manual (including the tutorial) to 
gain some familiarity with the package and its overall operation and design.

3) If you have just received SEQSEE (either by tape or anonymous FTP) and 
are wanting to install it on your computer system please proceed to Sections 
III and IV to learn more about the system requirements and installation 
procedures.  On the other hand, if SEQSEE has already been installed on 
your system, ask your system administrator where the "seqsee" directory is 
located.  On a SUN workstation this will typically be "/home/local/seqsee".  
You must have full access to this directory in order to run the program.

4) If SEQSEE is to be operated in a UNIX environment, try to learn a little 
about the "vi" editor.  Knowing how to scroll through a file, how to quit and 
how to make rudimentary editing changes with this editor will help out a 
lot.  A short summary of "vi" commands with some brief explanations can be 
found in Section XIII.

5) It is recommended that SEQSEE be run in a window which is at least 75 to 
80 columns wide.  Narrower windows may lead to some character strings 
vanishing off the right edge of the screen or being wrapped around to 
produce difficult-to-read output.

6) For its complete and proper operation, SEQSEE requires a control file.  The 
SEQSEE control file contains all of the program's default parameters and file 
pathways.  When SEQSEE is first started, it will typically check to see if the 
file "seqsee.parms" (the control file) exists in the current directory. If it is
not there, then a check for a pre-designated location for a file with that 
name will be performed.  Note that if you wish to make changes to the 
default parameters, you must have a copy of "seqsee.parms" in your current 
directory.  The control file may be edited either before running SEQSEE or 
while you are in SEQSEE using the "File Viewer" menu option.  A complete 
explanation of the SEQSEE control file is provided in the Appendix located at 
the end of this manual.

7) To kill any current or unwanted operation in SEQSEE, simply type "^c".  
This will immediately halt the job and return you to the main menu.  (The 
notation "^" or "CNTL" is often used to designate the control key on the 
computer keyboard).


III. PROGRAM COMPATIBILITY & COMPUTER REQUIREMENTS

    The current version of SEQSEE (version 1.2) is configured to run on 
most kinds of UNIX workstations.  SEQSEE is written in the C programming 
language (consistent with both ANSI and standard C) and is compatible with 
the UNIX BSD operating system implemented on many current SUN 3, SUN 4 
and SUN Sparcstation computers, the IRIX operating system available on 
most Silicon Graphics workstations and the MACH operating system found 
on all NeXT workstations.  Portability to other computers (IBM, VAX, 
Macintosh) is possible although this has not yet been implemented.  
Considerable effort has been made to make the programs and I/O operations 
as machine-independent as possible.  Consequently SEQSEE does not 
offer any machine-specific graphics capability or machine-dependent 
windowing capacity. These enhancements (using X windows) may appear in 
later versions of the program.
    Taken together, the various programs and subroutines in SEQSEE 
amount to over 10,000 lines of source code.  If the accompanying databases, 
libraries, manuals, installation routines and compiled versions of the main 
programs are included, the whole SEQSEE suite occupies some 7 megabytes of 
memory.  The NBRF Protein Information Resource (PIR version 34.0) and the 
SWISS-PROT protein sequence database (version 23.0) take up an additional 
115 megabytes of memory.  Currently the only major sequence databanks 
compatible with SEQSEE are the NBRF-PIR database (both PIR and 
Intelligenetics format) and the SWISS-PROT database (both SWISS-PROT 
and Intelligenetics format).  While we acknowledge that there are some 
differences between the SWISS-PROT and the PIR databanks, this 
discrepancy should not be of any concern to most users.  Future versions of 
the program are expected to be compatible with all four major databases 
(PIR, SWISS-PROT, EMBL and GENBANK) in a number of different formats 
(Standard, IG, GCG etc.).  Computers running the full suite of SEQSEE 
programs require at least 4 megabytes of RAM (although 8-16 MB is 
recommended) and at least 150 megabytes of additional hard disk memory 
to accommodate both the programs and the relevant databases.  It is also 
important to note that SEQSEE requires at least 16 MB of "swap-space" when 
running on most UNIX-based machines.
    Additional copies or updates of SEQSEE and its accompanying 
databases may be obtained through  our website at:

            http://www.bionmr.ualberta.ca/ bds/software

IV. INSTALLATION

    If you have received SEQSEE from our anonymous FTP site it is in a 
compressed format and therefore must be "uncompressed" and "untarred" 
before it can be compiled and installed.  Versions of SEQSEE received on 
magnetic tape are in a regular format and need not be unpacked.  Magnetic 
tapes may be read using one of the following commands:

            1) SGI computers:        tar -xvf /dev/tape
            2) SUN computers:        tar -xvf /dev/rst0

"Tarring" the tape will read all of the files contained on the tape directly into 
a directory called "seqsee" within your current directory.  Reading the entire 
tape will typically take about 5-10 minutes.
    With your copy of SEQSEE you will find a total of more than 30 files 
and directories containing all of the required routines, databases and 
libraries needed to run the SEQSEE suite.  A complete listing (using the UNIX 
"ls" command) of these files should look something like this:

        COPYRIGHT    alexis/       init.c        sb_align/
        Makefile     browse/       install/      seqed/
        README       calc.c        lib/          seqhelp/
        VERSION      databases/    libc/         seqret/
        a_cfas/      docs/         main.c        seqsearch/
        a_gor/       dotplot/      moment/       seqsee.h
        a_homol/     fast_align/   mult_align/   seqsee.parms
        a_membrane/  fleqsee/      nw_align/     sequences/
        a_moment/    hsearch/      psearch/      stats/
        a_motif/     hydro/        refscan/

In the third column of this list, you will notice the directory called "install/".  
This particular directory contains an installation script (or macro) known 
as a csh program as well as several other programs of note.  The 
installation script is:

                       install

Before you install SEQSEE throught your whole system you should install
it only for yourself.  The installation script allows you to do either.
If you install it only for yourlself, it will allow you to 
experiment with the program and to investigate how well it works on your 
own computer environment.  We INSIST that you do this before deciding if 
SEQSEE should be placed on your full system.  SEQSEE should only be
installed system wide when a decision has been made to make SEQSEE 
available to all system users.

    The other 4 files in the "/install" directory include:  

    1) README :          A copy of the instructions you are now reading
    2) seqdb.fnames:     Contains pathnames for sequence database
    3) refdb.fnames:     Contains pathnames for references database 
    4) xparms:           Csh program - Called by install to 
                         build "seqsee.parms"

                    **********************

Before beginning the installation process it is important to note that SEQSEE 
can use four types of sequence databases.  These are described below:

    1) SWISS-PROT: This is publicly available from a number of 
    anonymous FTP sites (for example:  ncbi.nlm.nih.gov).

    2) SWISS-PROT_IG Intelligenetics format: This is available through the 
    Intelligenetics Corporation only.  It contains the same information as 
    the standard SWISS-PROT database, but in a different format.

    3) NBRF-PIR: This is publicly available from a number of anonymous 
    FTP sites (for example:  ftp.bchs.uh.edu)

    4) PIR_IG Intelligenetics format: This is available through the 
    Intelligenetics Corporation only.  It contains the same information as 
    the standard PIR database, but in a differentformat.

If you wish to run SEQSEE you must get one of the above databases either 
through an anonymous FTP site (addresses given above) or through the 
appropriate database vendor (Intelligenetics or NBRF).  Typically, the 
standard NBRF-PIR and SWISS-PROT databases have both the sequences 
and the references located together in the same file(s).  The Intelligenetics 
versions of the PIR and SWISS-PROT databases actually has the sequence 
data located in a separate file (or set of files) which is distinct from the 
reference and bibliographic data.  These format differences affect the way 
that you install SEQSEE, so please take note.  Remember that during the 
installation process (see below), you must set up SEQSEE with one of the 
above databases -- and only one of the above databases.

                    ***********************

You need to be in the install directory (this one) to run the
installation script.
 
The installation script does the following things:
 
    1) Asks where to put the library files that SEQSEE uses
       then copies all of the library files to that place.
    2) Asks where to put the executable files
    3) Asks if you have set up the 'seqdb.fnames' and 'refdb.fnames' files.
    4) Sets up the seqsee.parms file (default parameters file)
    5) Sets up the io.lib.c file, which is needed to compile SEQSEE
    6) Sets the default editor
    7) Sets the compiler and compiler options
    8) Compiles the SEQSEE programs
    9) Moves the SEQSEE programs to where you said the executables should go
 
Please look through the README file in the "install" directory 
for additional information on SEQSEE installation.

***************************************************************************
 
SEQSEE INSTALLATION FOR A SINGLE USER:
 
You do NOT need to be root to do this.  This example assumes that a
single user is installing SEQSEE from within their own account.  For
illustration purposes let's say the user's name is ``fido'' and their
home directory is ``/somemachine/home/fido''.
 
(1)
At the prompt:
 
   >>  Give the full path name of where the SEQSEE Library files should"
       be installed.  These include the help files, the tables,"
       the documentation (manual),  and the enclosed database files."
       Press <RET> for the default (/usr/local/lib/seqsee)."
 
 
Give the name of a directory in the current users account.  For example,
    /somemachine/home/fido/lib/seqsee
 
(2)
At the prompt:
 
    >>  Where should SEQSEE executables exist on the system?
        Enter the directory or press <RET> for the default
        (/usr/local/bin).
 
Give the name of the current user's bin directory.  For example,
    /somemachine/home/fido/bin
 
(3)
The other questions are self-explanatory.  When the installation
program has completed, the user will be able to run the program by just
typing:
    seqsee
from within their bin directory (or if the PATH is set up correctly,
the user can type ``seqsee'' from any directory).
 
***************************************************************************
 
SEQSEE INSTALLATION SYSTEM WIDE:
 
You MUST be root to do this.  The installation is identical to be above,
but instead of giving paths into the user's account, you will be installing
SEQSEE where it is accessible to everyone.
 
(1)
At the prompt:
 
   >>  Give the full path name of where the SEQSEE Library files should"
       be installed.  These include the help files, the tables,"
       the documentation (manual),  and the enclosed database files."
       Press <RET> for the default (/usr/local/lib/seqsee)."
 
Give the name of where the library files should be located.  This must
be accessible to everyone.  The default should work.
 
(2)
At the prompt:
 
    >>  Where should SEQSEE executables exist on the system?
        Enter the directory or press <RET> for the default
        (/usr/local/bin).
 
Give the name of where the executables should reside so that everyone
can run them.  This should be a directory which is in everyone's
PATH.
 
(3)
The other questions are self-explanatory.  When the installation
program has completed, any user will be able to run the program by just
typing:
    seqsee
from any location.

***************************************************************************
 

SEQSEE is organized such that each module is an independent entity, distinct 
from the "main driver".  The purpose of the main driver is, simply, to call the 
appropriate program and to display or save the results.  Each module in 
SEQSEE is written in standard C code (although we cannot guarantee that 
differences will not exist between some compilers).  SEQSEE should be easily 
portable to almost any UNIX machine.  If you are porting SEQSEE to a 
different (ie. non-UNIX) system, you will have to make changes to the driver 
and the "unix.lib.c" file in the "libc" directory.  This should not prove to be 
too difficult as most systems have comparable command structures.


V. SEQSEE -- GENERAL DESCRIPTION

    Following is a brief description of the general functions that have been 
implemented in the current version (Version 1.2) of SEQSEE:

1) SEQUENCE ENTRY & EDITING - New sequences and sequence files may be 
created, entered and/or edited using a computer-directed protocol found in 
SEQED.  The program permits the flexible entry and storage of sequence 
information in both upper and lower case -- with or without spacing.

2) STRUCTURAL ANALYSIS - Sequences may be analyzed statistically or 
predictively for the extent and location of secondary structure, active-site 
motifs, sequence signatures, membrane spanning regions, flexibility, 
hydrophobicity, hydrophobic moments and many other features.  The 
structural analysis routines are designed to help the user in determining 
important aspects of structure and function in those cases where very little 
is known about the protein of interest.

3) SEQUENCE COMPARISON & ALIGNMENT - Sequences may be compared 
against a database, against themselves or alternatively individual sequences 
may be aligned in a pair-wise or multi-layered fashion depending on the 
choice of program or program parameters.  Sequence alignments are marked 
explicitly to distinguish between exact matches and similar matches.  
Consensus sequences are generated for all multiply-aligned sequences.  
Choices of scoring matrices and gap penalties are possible.  Sequence 
alignment and comparison are two excellent methods for discerning protein 
function and evolutionary relatedness.

4) FLEXIBLE PATTERN MATCHING - Pattern matching to a database 
(PIR, SWISS-PROT, SEQBANK, etc.) or to individual sequences may be done 
using a flexible query language (for exact matches) or a homology-based 
matching protocol (using a scoring matrix).  Flexible pattern matching 
is ideal for the identification and location of suspected sequence motifs.

5) SEQUENCE LOCATION, RETRIEVAL & SCANNING - Sequences, names of 
sequences, accession numbers and bibliographic information may be 
scanned, retrieved or precisely located in either the PIR or SWISS-PROT 
databases (or in other user-specified databases) using a number of browsing, 
database scanning or pattern matching programs.  These routines are ideal 
for interactive identification and retrieval of database sequences.


VI. THE SEQSEE SUITE


    This section provides a more detailed description of the functions and 
subroutines currently available in SEQSEE.  Note that each subroutine 
description is presented in the same order that it appears in the main SEQSEE 
menu.  As well as providing a more complete description of the SEQSEE 
functions, we hope this section will also be of some interest to those wishing 
to understand the character of sequence analysis in general.  (Note that 
program names appear in upper case letters).


1) HELP 
HELP contains an abridged version of the SEQSEE manual for online  
consultation.  A menu is provided with a selection of various topics and 
accompanying descriptions.  Online help does not offer the same detailed 
information as the hardcopy SEQSEE manual, hence detailed inquiries should 
be directed to the manual.


2) ENTER / EDIT A SEQUENCE
The program known as SEQED is used for the entry and editing of new (or 
old) sequence files.  The program first queries the user as to whether he or 
she wishes to:

    1) Enter a new sequence.
    2) Edit an old sequence.

If one chooses to enter a new sequence the program queries the user for the 
name of the sequence file (sequence filename), the name of the sequence 
(sequence name) and finally, the actual sequence (using the standard single 
letter amino acid code).  Sequences may be entered using either lower case 
letters, upper case letters or an arbitrary combination of both.  In other 
words, sequence entry is case independent.  The program also ignores blank 
characters so sequence entries may have as many blank spaces as desired.  
A "sequence ruler" is presented at the top of each sequence file entry line to 
permit quick identification of residue positions as they are typed.  After each 
group of 50 characters has been entered, the user is expected to press 
<return> so that a new sequence ruler can appear.  Upon completion of the 
sequence entry, the user must enter the '$' character to indicate to the 
computer that the typing process has finished.

3) RETRIEVE SEQUENCE FROM DATABASE
The program SEQRET is designed to allow the user to retrieve complete 
sequences or groups of sequences from the PIR database using either the PIR 
accession number or protein name (or portion thereof).  Thus one may seek 
and select only a single sequence for a specific purpose, or entire protein 
families to create special user-specified databases.  The sequences may be 
saved and/or edited for further analysis (as in the preparation of files for 
multiple sequence alignments).  All sequences are saved in a SEQFILE format 
and, therefore, are ready to be analyzed by any of the other SEQSEE  
functions.


4) SEQUENCE STATISTICS
The STATS program carries out a simple statistical analysis of any given 
protein sequence.  It calculates and displays the molecular weight, the amino 
acid composition, average hydropathy (Kyte and Doolittle, 1982), total 
charge, predicted iso-electric point, expected quantity of exposed and 
interior surface area (Chothia, 1976; Richards, 1977; Miller et al., 1987), 
expected packing volume (Richards, 1977; Janin, 1979), predicted specific 
volume (Zamayatnin, 1972), aggregation potential (Fisher, 1964), estimated 
solvation free energy of folding (Chiche et al., 1990) and a host of other 
values that may be of structural or statistical interest (See Appendix 2).  
Note that STATS can only be used on sequence files in the SEQFILE format.


5) STRUCTURE PREDICTION
ALEXIS is a comprehensive structural analysis program which has been been 
developed expressly for the SEQSEE software suite.  ALEXIS performs 
calculations on the extent and location of potential membrane spanning 
regions, the identification of short sequence folding motifs, the prediction of 
the protein folding class (Chou and Zhang, 1992) and the prediction of 
secondary structure using the cumulative results of five different and well-
tested methods.  Detailed descriptions of the techniques and their respective 
enhancements are given below:

    a) MEMBRANE SPANNING REGIONS
    This calculation uses the central point maxima technique first 
    described by Klein et al. (1985).   This has been shown to be the most 
    accurate method for membrane spanning identification through 
    independent tests performed by Fasman & Gilbert (1990).  The 
    method uses a linear discriminant model to test the probability that 
    any given sequence is membrane spanning.  The hydrophobicity scale 
    (and hence the the discriminant equation) has been adopted 
    specifically for the Kyte-Doolittle parameters.  Some modifications 
    have been introduced to this scale to permit better discrimination of 
    the membrane spanning regions.  The program is designed to 
    determine, first, if there are membrane spanning regions and, 
    second, where they are located.

    b) CHOU-FASMAN SECONDARY STRUCTURE PREDICTION
    This procedure predicts the secondary structure for any given protein 
    sequence through a modified Chou and Fasman (1974, 1978) 
    algorithm.  The Chou-Fasman algorithm is based on statistically 
    observed propensities of all 20 amino acids to occur in various protein 
    secondary structures.  Despite its widespread use and general 
    popularity, it is a technique not without its shortcomings.  In an 
    attempt to improve both its accuracy and its general utility, a number 
    of modifications to the original algorithm have been made.  Some of 
    these changes include the adoption of the simplified rules of Williams 
    et al., (1987) and the use of updated Chou-Fasman parameters as 
    derived from SEQBANK.  With these new modifications, this technique 
    can predict secondary structures with a 59.8% level of accuracy.  A 
    random three-state prediction, on the other hand, is expected to be 
    only 33.6% correct (based on the disposition of secondary structures in 
    SEQBANK).

    c) HYDROPHOBIC MOMENT SECONDARY STRUCTURE PREDICTION
    This procedure determines the secondary structure for any given 
    protein sequence on the basis of hydrophobic periodicities.  It has its 
    origins with the Fourier analysis of hydrophobicity profiles as first 
    proposed by Eisenberg et al., (1984).  In contrast to the statistical 
    techniques of Chou and Fasman, it is an approach that is based on well 
    established physico-chemical principles.  According to Eisenberg, 
    stretches of residues with hydrophobic periodicities in the range of 90 
    to 120 degrees (corresponding to a hydrophobic residue every three 
    to four residues) are typically found in alpha-helices, while stretches 
    of amino acids with hydrophobic periodicities of 160 to 180 degrees 
    (corresponding to alternating hydrophobic and hydrophilic residues) 
    are typically in beta strands.  By introducing a number of 
    modifications to Eisenberg's original proposal, including the use of 
    optimized hydrophobicity parameters and the introduction of Chou-
    Fasman conformational probabilities, the level of prediction accuracy 
    can reach 64.5% (This value was calculated using the structural 
    assignments available in SEQBANK).

    d) GARNIER,OSGUTHORPE,ROBSON SECONDARY STRUCTURE PREDICTION
     Commonly called the GOR method (after the three authors' initials) this 
    procedure predicts the secondary structure on the basis of parameters 
    obtained through information theory.  It is based on a series of 
    proposals originally put forward by these investigators in the 1970's 
    (Garnier, et al., 1978).  It is very much a statistical technique, not 
    unlike the Chou-Fasman approach, except that it takes into account the 
    positional preferences of amino acids within helices, beta-strands and 
    coils.  Despite its high level of parameterization, the procedure is 
    extremely fast (when computerized) and is consistently rated among 
    the most accurate of known methods.  With recent modifications in 
    place, including some degree of re-parameterization of the previously 
    published values found in Gibrat et al. (1987), the method attains a 
    64.6% level of accuracy (This value was calculated using the structural 
    assignments available in SEQBANK).

    e) HOMOLOGY-BASED SECONDARY STRUCTURE PREDICTION
    This procedure determines the secondary structure for any given 
    protein sequence by searching for short stretches of homologous 
    sequences and comparing them to known protein structures.  It is 
    based on a number of related proposals simultaneously offered by 
    several authors in 1986 (Nishikawa and Ooi, 1986; Sweet, 1986 and 
    Levin et al., 1986). The most recent implementation of this procedure, 
    as described by Levin and Garnier (1988), has been adopted for use in 
    SEQSEE.  In this version, SEQBANK is used as the database of known 
    structures from which sequence homologies are sought.  This method 
    is the most accurate secondary structure prediction scheme presently 
    known.  For proteins sharing greater than 25% sequence similarity 
    with any protein in SEQBANK, the method approaches a level of 
    accuracy of 87%.  For proteins possessing no significant homology, the 
    prediction is 66.0% correct.  SEQSEE uses a specially optimized amino 
    acid exchange matrix in order to achieve these high scores.

    f) MOTIF-BASED SECONDARY STRUCTURE PREDICTION
    This procedure predicts secondary structure based on primary 
    sequence patterns contained in the files SEQMOTIF1 and SEQMOTIF2.  
    It is an extension of the methods first proposed by Rooman and 
    Wodak (1988, 1991) for identifying and incorporating well established 
    sequence/structure patterns in secondary structure prediction 
    schemes.  The procedure, as it is currently implemented, can only 
    perform structural predictions (on average) on less than 20% of the 
    residues in any given sequence.  However, for those regions that 
    are predicted, the confidence level is often very high (> 80%).

    g) CONSENSUS SECONDARY STRUCTURE PREDICTION
    This procedure determines a consensus secondary structure based on 
    the cumulative scores of the five methods described above.  The 
    residue specific scores for each method are weighted according to its 
    expected prediction accuracy. The homology-based technique has the 
    strongest weighting and the Chou-Fasman technique has the weakest.  
    The consensus method is generally found to improve overall 
    prediction accuracy by one to two percent and, furthermore, it can 
    greatly simplify the interpretation of the other six predictions.  We 
    recommend that the consensus prediction be used when only a single 
    answer or a single method is desired.


6) SEQSITE PATTERN SEARCH
The SEQSITE procedure allows the user to search any given sequence for 
active sites, binding sites, signature sequences, sequence motifs, 
phosphorylation sites and potential antigenic sties.  A library of more than 
1000 signature sequence patterns, 50 phosphorylation sites and 20 
generalized antigenic regions can be scanned when this function is invoked.  
All sites are identified by residue location, matched template pattern and at 
least one current reference.  This type of "function search" is extremely 
useful for determining the properties and features of newly sequenced or 
poorly characterized proteins.


7) FLEXIBILITY
The program named FLEQSEE predicts the flexibility and mobility of various 
regions in a protein based on sequence information alone.  Flexibility is 
calculated on the basis of the Karplus algorithm (Karplus and Schulz, 1985).  
This procedure determines main-chain mobility by using smoothed averages 
of X-ray thermal B factors taken from approximately 30 highly resolved 
structures.  In SEQSEE, flexibility may be used to determine the position and 
length of coil regions by locating all "significant" maxima (those maxima 
which exceed a minimum threshold) in the flexibility plot.  Flexibility plots 
may also be used to identify surface-seeking elements or to locate strongly 
antigenic regions of any given sequence.


8) HYDROPHOBIC MOMENT
MOMENT calculates the hydrophobic moment of a sequence using the 
Cornette et al. (1987) scale of hydrophobicity and the Fourier analysis 
technique of Eisenberg et al. (1984).  Calculations are preformed over a set 
"sequence window" of predefined length using a range of values specific to 
helical periodicities (90 to 120 degrees), exterior beta strand periodicities 
(160 to 180 degrees) and interior beta strand periodicities (0 degrees).  The 
values for helix and beta strand may be compared with one another and to a 
minimum cutoff value (usually around 5) to identify amphipathic helices or 
beta strands.  This method has some utility in identifying potential T-cell 
epitopes (amphipathic helices) and other biologically important structures.


9) HYDROPHOBICITY
HYDRO calculates the smoothed hydrophobicity (over a window of pre-
defined length) of any given sequence using a choice of several 
hydrophobicity scales.  The operator may choose (using the control file) from 
the Eisenberg consensus scale (Eisenberg et al., 1984), the Kyte-Doolittle 
scale (Kyte and Doolittle, 1982), the Cornette scale (Cornette et al., 1987) or 
the Parker-HPLC scale (Parker et al., 1986).  The Hopp-Woods antigenicity 
scale (Hopp and Woods, 1981) is also available for antigenicity 
determination.  Hydrophobicity plots may additionally be used to locate 
membrane spanning regions in some types of proteins (hydrophobic regions 
of 20 or more residues).  A choice of both "raw" and "scaled" values is 
offered.


10) FAST ALIGNMENT SEARCH
FAST_ALIGN is a k-tuple based fast alignment algorithm based loosely on 
the speed-up protocols incorporated in Lipman and Pearson's FASTA (1988) 
and Altschul et al.'s BLAST  (1990).  First, a table of homologous 3-tuples is 
generated for the query sequence using a modified scoring matrix.  Second, a 
look-up table of these 3-tuples and their respective location is prepared 
from the query sequence.  Third, a look-up table is prepared of 3-tuples for 
each sequence in the database.  The two look-up tables (one from the query 
and the second from the database) are then compared and matches are 
identified.  The result is a one-dimensional "spectrogram" of homologies 
characterized by low level noise (poor matches) and the occasional sharp 
peak (a string of matches).  Database sequences with sufficiently high peaks 
are then pulled out and rigorously aligned using the Needleman-Wunsch 
program to determine the significance of the alignment.  The program is 
capable of searching the complete PIR database and then ordering and 
aligning 50 homologous matches of a 100 residue query sequence in less 
than 90 seconds.  This is an extremely powerful technique to accomplish 
quick inquiries regarding protein relatedness and identification.  
FAST_ALIGN may be used to align sequences against the PIR, SWISS-PROT, 
SEQBANK or a user-specified database with a SEQFILE format.  Several 
choices of scoring matrices are possible and these include: the Unity matrix, 
the Dayhoff PAM 250 matrix (Dayhoff et al., 1983), the McLachlan matrix 
(McLachlan, 1971) and the RBO matrix (unpublished).  The RBO matrix is the 
default scoring matrix.


11) EXHAUSTIVE ALIGNMENT SEARCH
NW_ALIGN is a program which carries out an exhaustive pair-wise  
alignment of any given query sequence to all other sequences in a given 
database.  Only those sequences with scores above a certain user-defined 
threshold are retained.  The algorithm used for this procedure is based on 
the Needleman-Wunsch (1970) approach for pair-wise alignment.  This 
dynamic programming method is guaranteed to find the optimal alignment 
between any two sequences for any given scoring matrix and gap penalty.  
Alignments can either be done against the PIR database, SWISS-PROT, 
SEQBANK or a user defined database in the SEQFILE format.  If alignments 
are done against SEQBANK, knowledge of the secondary structure is included 
to determine the location and length of gaps (Lesk et al., 1986).  A choice of 
scoring matrices and gap penalties is available.  The scoring matrices include: 
the Unity matrix, the Dayhoff PAM 250 matrix (Dayhoff et al., 1983), the 
McLachlan matrix (McLachlan, 1971) and the RBO matrix (unpublished).  The 
RBO matrix is the default scoring matrix.  Scores are rigorously calculated on 
the basis of comparisons to randomized sequence alignments as suggested by 
Dayhoff et al. (1983).  The program is extremely time consuming with a 
query sequence of 100 residues typically taking 4 hours to complete on 
a SUN Sparcstation.  However, the improvement in overall alignment 
accuracy and the possibility of identifying very remote and previously 
unidentified relationships may well be worth the wait.  NW_ALIGN also 
incorporates another program called SB_ALIGN which is capable of 
performing structure-based alignments using the approach of Lesk et al. 
(1986).  SB_ALIGN is only called when conducting alignments against the 
SEQBANK database.  If the user wishes to place an exhaustive alignment run 
into the background (to prevent the computer from being tied up for long 
periods of time) this can be done as follows:

    1) Press the "control" and "z" keys simultaneously to 
    temporarily stop the job.

    2) Type "bg" and press the "return" key to restart the 
    program in the background.

The results can be viewed at any time by re-opening the SEQSEE window and 
inspecting the *.tmp files that are automatically created and updated during 
the alignment run.


12) ALIGN 2 OR MORE SEQUENCES
The program MULT_ALIGN uses a modification of the pair-wise Needleman-
Wunsch protocol to align two or more protein sequences.  The method is 
closely related to the progressive alignment procedure first described by 
Barton and Sternberg (1987), which permits rapid and accurate multiple 
alignments for up to several hundred proteins.  A consensus sequence is also 
generated for each pair-wise or multiple alignment.  A choice of scoring 
matrices and gap penalties is available.  Sequences which are to be aligned 
must be contained in SEQFILE formats, either in the form of databases (for 
multiple alignments) or singly (for pair-wise alignments).  The procedure for 
aligning more than two sequences (like the fast alignment search described 
in section 10) is fundamentally heuristic in nature and so it cannot be 
proven that the resulting alignments are mathematically optimal.


13) PATTERN SEARCH
This procedure can search the SEQBANK, SWISS-PROT, or PIR databases or, 
alternately,  a sequence of your own choosing to find exact pattern matches 
according to the following rules (note the sequence patterns are case 
INDEPENDENT):

    a) X         Match exact residue specified where X = any amino acid
    b) !X        Match any residue EXCEPT X 
    c) *         Wild card character--matches any amino acid
    d) [XYZ]     "OR" braces--match X "or" Y "or" Z.  
    e) X&Y       "AND" character--match X "and" Y no matter what the 
                 separation
    f) X{2,8}Y   Match X and Y if separation is between 2 and 8 residues.  
                 "Range" braces--allow a range of wild card characters. i.e. 
                 {2,8} = 2 to 8 "*"
    g) $**X      Match X if located 2 residues from N terminus -- 
                 "Termination" characters are used to mark either the 
                 beginning (N terminus) or end (C terminus) of a     
                 sequence

Pattern Search (PSEARCH) is constructed to allow the user to enter several 
patterns at once, both on a single line (using the "&" feature) or on separate 
lines.  Patterns appearing on separate lines are treated as "independent" 
patterns (meaning they don't have to appear in the same protein sequence) 
while patterns with "&" characters are viewed as "dependent" patterns 
(meaning they do have to appear in the same protein sequence).  Some 
examples of sequence pattern searches are given below: 

    AA***K      Find all occurrences of 2 alanines together followed 
                by any 3 residues followed by a single lysine
    
    AA!P!P!PK   Find all occurrences of 2 alanines together followed 
                by any 3 residues (as long as they're NOT prolines) 
                followed by a single lysine. (ie. look for AA***K 
                except AAP**K, AA*P*K, AA**PK, AA*PPK, AAPP*K, 
                AAPPPK)

    [AG][AG]*[KR]    Find all occurrences of 2 alanines or 2 glycines or 
                any combination of the two followed by any 
                residue followed by a lysine or an arginine. (ie. 
                look for AA*K, AG*K, GA*K, GG*K, AA*R, AG*R, GA*R 
                and GG*R)

    AA*K&I**R   Find all occurrences of 2 alanines together 
                followed by any amino acid followed by a single 
                lysine, AND if that pattern is found, then find all 
                occurrences of a single isoleucine followed by any 
                two amino acids followed by a single arginine IN 
                THE SAME PROTEIN SEQUENCE. (ie. look for AA*K 
                and I**R within a sequence)

    AA{2,5}[KR] Find all occurrences of 2 alanines together followed 
                by at least two but no more than 5 amino acids 
                (any type) followed by either a lysine or an     
                arginine. (ie. look for AA**[KR], AA***[KR],     
                AA****[KR] and AA*****[KR])

    ${3,5}M     Find all occurrences of methionine that are     
                between 3 and 5 residues from the N terminus.  
                (ie. look for $***M, $****M and $*****M)

Of course any combination of the above queries could be used in a PSEARCH 
pattern search.  Other examples of PSEARCH queries may be found by 
browsing through the SEQSITE database.


14) HOMOLOGY SEARCH
The HSEARCH program searches either the PIR database, SWISS-PROT, 
SEQBANK or a compatible user-defined database to find the "nearest" or most 
homologous matches to any given sequence.   Homologies are determined 
according to any one of four user-defined scoring matrices (described 
earlier) with the default being the RBO scoring matrix.  Presently, gap 
penalties are not yet incorporated into the homology search routine.  The 
homology search is a useful complement to other pattern search routines, 
especially when attempting to locate distantly related or difficult-to-identify 
sequence motifs. 


15) DOT PLOT
DOTPLOT is an extremely flexible program developed to produce character 
representations of standard dot plots (Lipman and Pearson, 1985).  The low 
resolution of most character-defined screens prevents the incorporation of a 
useful graphic representation of dot plot results and hence a character 
representation with a user defined "threshold" has been incorporated to 
overcome this problem.  DOTPLOT may be used to compare a sequence with 
itself (to identify internal repeats), with another sequence (for pair-wise 
alignments), with a SEQFILE compatible database or with the PIR or SWISS-
PROT databases (for medium speed alignments).  By using DOTPLOT in 
conjunction with a database it is possible to look for homologies between any 
shared regions in a group of sequences.  Such an option has proven to be 
quite useful in identifying previously unrecognized motifs or unexpected 
similarities in a number of proteins. 


16) PROTEIN DATABASE REFERENCE SEARCH
The program REFSCAN is designed to allow the user to locate and retrieve 
specific sequence references from the PIR or SWISS-PROT databases using 
either the accession number, the name (or portion thereof) or a 
bibliographic/functional reference.  This feature allows the user to quickly 
access important information about many newly sequenced proteins  
pertaining to their function, structure or relationship with other proteins in 
the database.

17) FILE VIEWER
BROWSE permits the user to edit or view a variety of database files while 
still in the SEQSEE environment.  Abbreviated versions of the SWISS-PROT 
and PIR databases (which provide sequence name, source and accession code 
only) may be viewed directly with this command.  Likewise, the complete 
SEQBANK database may also be displayed and scrolled through at leisure.  
BROWSE also permits the user to interactively edit the SEQSEE control 
file (SEQSEE.PARMS).  This allows the user to customize SEQSEE program 
parameters in almost any manner desired.  Standard UNIX commands may 
be used for scrolling through or locating particular character strings in 
any of the files.

0) EXIT SEQSEE
Closes all current files and returns the user to the general operating system.  
The program may be restarted by typing "seqsee".  If the program crashes or 
hangs up for any reason, simply type "^c" (i.e. press the "control" and "z" keys
simultaneously).  This will stop all processes and return the user to the main 
menu.


VII. GETTING STARTED

Although it is possible to run SEQSEE simply by signing on and typing

                     seqsee

(regardless of what directory you are in), we recommend that all regular 
SEQSEE users should try to do the following:

1) Create your own directory for SEQSEE and make this your current  
directory (ie. make the directory by typing "mkdir seqsee" and then type "cd 
seqsee" to get into this directory).  Having your own SEQSEE directory will 
help you better organize your input files and results.

2) Copy the control file "seqsee.parms" into your SEQSEE directory.  Your 
system administrator should be able to tell you where it can be found.  The 
command for this might typically be:
  
            cp /usr/local/seqsee/seqsee.parms .

Please note the period at the end of this command -- it stands for "current 
directory" (i.e. the directory you are already in).  Having a copy of the 
"seqsee.parms" file will permit you to change almost any of the default 
parameters.  This can allow you to "customize" SEQSEE to suit your own 
special needs.

3) If you already have sequence files somewhere in your computer, copy 
them into your SEQSEE directory.  Try to ensure that these files are in the 
proper SEQFILE format.  Of course you can always use SEQSEE to create or 
retrieve your own sequence files which will automatically conform to the 
SEQFILE format.

4) Although there are no absolute windowing requirements for SEQSEE, we 
do recommend that your screen or window be at least 80 characters wide 
and at least 25 or more lines in length.  (The 25 line window length is 
actually the upper limit for VT100 or Mac/PC terminal emulators).  Choosing 
a window this size (or larger) will permit easy viewing of your output , help 
files and menus.  If your terminal or terminal emulator permits, we also 
recommend having more than one window on the screen since this can make 
the file manipulations and the viewing of intermediate results much more 
convenient.

If you have done all of the above operations and are satisfied with 
your current status, simply type:

                    seqsee  

The following menu should appear (see below):


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>>

To continue with the program, simply type in any one of the above menu 
numbers and press "<return>".  SEQSEE will automatically prompt the user for 
a variety of input or output names (such as input filenames, output 
filenames, accession numbers etc.).  It is important to note that SEQSEE 
requires protein sequences for input in 10 of its 18 functions, hence it is 
important to have at least one sequence file (a SEQFILE prepared using 
either menu item #2 or #3) stored in a known location.  This way you only 
have to type in the sequence filename -- and not the whole sequence -- each 
time you make a function call.
    SEQSEE has been specifically designed to be a self-guiding interactive 
tool so it is hoped that all computer queries will easily lead the uninitiated 
user through the program without much difficulty or confusion.  For those 
wishing a more complete introduction to the SEQSEE suite, we recommend 
that they carefully study the tutorial presented in the next section.
            

VIII. TUTORIAL   "SO YOU THINK YOU'VE FOUND SOMETHING"

Let us suppose that you and a collaborator have succeeded in isolating a 
small protein from Bacillus subtillus which appears to act as an oxidizing co-
factor for certain cellular processes.  After many weeks of amino acid 
analysis and peptide sequencing, your collaborator provides you with the N-
terminal sequence of the first 60 amino acids of this new protein.  You are 
requested to find out anything you can about this partial sequence, and to 
report to your colleague as soon as possible.  Sounds like a job for SEQSEE!
    Let's demonstrate how you might go about analyzing this sequence 
using just a few of the options available in SEQSEE.  Note that in this example 
we will first show how a new sequence is entered.  Then we will 
demonstrate how the sequence can be analyzed statistically.  We will also 
show how to check this sequence for sequence motifs and how to compare 
(and align) the query sequence against the PIR database.  Finally we will 
demonstrate how to search SEQBANK to locate those proteins which might be 
evolutionarily related to the query sequence.  
So here it goes...


1) Sign on to a computer
2) Type "seqsee" (the following menu should appear)


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>>


3) Type "2" (and press <return>) so that you can input your new sequence.  
This puts you into the program SEQED.  When in this program you are, 
first, required to choose an option for entering or editing your sequence.  
Since we wish to enter a new sequence we will select option "1". Second, you 
are asked to provide a name for your sequence.  In this case we'll call it 
"bacillus_redoxase".  The sequence name is required for record keeping 
purposes only.


>> 2 


What would you like to do?

  1) Enter new sequence.
  2) Edit old sequence
  0) Exit

Enter a number (then press return).

>> 1


Seqed (Version 1.2)

Enter name for sequence. Use underscores instead
of blanks to separate words. (eg. thioredoxin_human)

>> bacillus_redoxase


Enter each amino acid (one letter code).
You may enter up to 50 amino acids on one line.
Press <return> to get a new prompt line.
When you are done enter $ and press <return>.


         1         2         3         4         5 
12345678901234567890123456789012345678901234567890
         |         |         |         |         |
msdklihitddsfdtdvikadgailvdfwaewcgpckmiapildeladey


         1         2         3         4         5 
12345678901234567890123456789012345678901234567890
         |         |         |         |         |
qgkltvakln$


4) After typing in the above 60 residues, press "$" and then <return>.  The 
newly prepared sequence will then be "echoed" to the screen in the 
precisely the same format it will be stored.  This is done so that you may 
inspect your sequence for errors and make any required corrections.  Note 
that you have been placed in the "vi" editor and so it is essential to have 
some rudimentary knowledge of how this editor actually works.  Changes to 
the sequence can be either upper or lower case -- it is not necessary to keep 
all entries or corrections in upper case.


>Title: bacillus_redoxase
MSDKLIHITDDSFDTDVIKADGAILVDFWAEWCGPCKMIAPILDELADEY
QGKLTVAKLN
~
~
~
~
~
~


5) If no corrections are necessary, type ":q" to exit the editor.  If you 
do make corrections, type ":wq" and this will save the corrections you have 
made.  After exiting the "vi" editor, you are asked if you should save the file.  
Upon replying (usually with a 'y') you are then asked to provide a name for 
the sequence file (we'll call it redoxase.seq).  After responding you are 
immediately returned to the main SEQSEE menu.  Note that you have now 
created a SEQFILE called "redoxase.seq" containing the amino acid sequence 
of "bacillus redoxase".


:q


Save this file? (Y/N)

>> y


Enter sequence filename.

>> redoxase.seq


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>>


6) Type in "4" to choose the Sequence Statistics option.  This function 
performs a quick and useful statistical analysis of the sequence to help you 
identify any peculiar trends in the sequence that might not be obvious on 
first inspection.


>> 4


Sequence Statistics (Version 1.2)


Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>>


7) Type in "1" and <return> since you already have a UNIX file (a SEQFILE 
called "redoxase.seq") containing your sequence.  You will then be queried 
for the name of this sequence file.  Enter it and press <return> as usual.


>> 1


Enter input sequence filename.

>> redoxase.seq


8) The statistical analysis will automatically be written to the screen in the 
format shown below.  You may scroll through the file and make any changes 
you wish.


**************************************************************

    Program......: stats (version 1.2)
    Description..: Statistical Analysis of a Sequence
    Date.........: Tue Feb 2 09:57:14 1993

    Sequence Name: bacillus_redoxase
    1 MSDKLIHITD DSFDTDVIKA DGAILVDFWA EWCGPCKMIA PILDELADEY
   51 QGKLTVAKLN

**************************************************************

Molecular Weight......:    6684.86
Amino acids...........:    60
Mean residue weight...:    111.41


                *** Amino Acid Composition ***

  Amino    Freq       Freq       E(Freq)    Weight     E(weight)
  Acid     (total) (percent)     (percent)  (percent)   (percent)
    A       6        10.00        <8.84>      6.40        <5.73>
    C       2         3.33        <2.09>      3.09        <1.97>
    D       9        15.00        <5.89>    15.54         <6.17>
    E       3         5.00        <5.90>      5.81        <6.94>
    F       2         3.33        <3.70>     4.42         <4.96>
    G       3         5.00        <8.29>     2.57         <4.31>
    H       1         1.67        <2.12>     2.06         <2.64>
    I       6        10.00        <5.40>    10.18         <5.57>
    K       5         8.33        <6.22>     9.01         <7.27>
    L       6        10.00        <7.93>    10.18         <8.18>
    M       2         3.33        <1.97>     3.94         <2.35>
    N       1         1.67        <4.59>      1.71        <4.78>
    P       2         3.33        <4.51>      2.91        <3.99>
    Q       1         1.67        <3.75>     1.92         <4.37>
    R       0         0.00        <4.21>     0.00         <6.00>
    S       2         3.33          <6.59>      2.61      <5.23>
    T       3         5.00        <5.96>      4.55        <5.49>
    V       3         5.00        <7.12>     4.46         <6.43>
    W       2         3.33        <1.37>     5.59         <2.32>
    Y       1         1.67        <3.56>     2.45         <5.30>


Note: E(x) are expected values based on average
    amino acid content of soluble proteins.

**************************************************************

Hydrophobicity Parameters: /canopus/rbo/seqsee/lib/kyte.parms

Average Hydrophobicity (ah)...................:    0.78
Notes: ah = -2.67 --> Average Protein
     ah >  0.10 --> Hydrophobic Protein
     ah < -6.00 --> Hydrophilic Protein

Ratio of Hydrophilicity to Hydrophobicity (rh):    0.95
Notes: rh =  1.22 --> Average Protein
     rh >  1.90 --> Non-folding Protein
     rh <  0.85 --> Insoluble Protein

Percentage of Hydrophobic residues............:   56.67
Notes: Average percentage is 52.44
     Hydrophobic Amino Acids are ACFGHILMVWY

Percentage of Hydrophilic residues............:   43.33
Notes: Average percentage is 47.56
     Hydrophilic Amino Acids are DEKNPQRST

Ratio of %Hydrophilic to %Hydrophobic.........:    0.76
Notes: rhp = 0.91  --> Average Protein
     rhp > 1.43  --> Non-folding Protein
     rhp < 0.77  --> Insoluble Protein

**************************************************************

Number of  Basic amino acids:         5
Number of Acidic amino acids:        12
Estimated pI for protein....:        4.60

pH:     3     4      5       6       7       8       9      10        11
Charge: 7.1   3.7   -2.6    -5.0    -5.9    -7.0    -9.0    -11.9    -14.4

Total linear charge density.:        0.32

**************************************************************

Polar Area of Extended Chain...............:    3666.20 Angs**2
Non-Polar Area of Extended Chain...........:    6923.10 Angs**2
Total Area of Extended Chain ..............:   10359.60 Angs**2


Polar ASA of Folded Protein................:    1117.84 Angs**2
Non-Polar ASA of Folded Protein............:    2839.88 Angs**2
ASA of folded protein .....................:    3957.72 Angs**2

Ratio of Folded to Extended Area...........:            0.40

*************************************************************

Buried Polar Area of Folded Protein........:    2096.61 Angs**2
Buried Non-polar Area of Folded Protein....:    3654.08 Angs**2
Buried Charge Area of Folded Protein.......:     239.61 Angs**2
Total Buried Surface.......................:    5990.30 Angs**2

Expected Number and Fraction of Residues 95% Buried

A:    1 (0.166)    C:    1 (0.284)    D:    0 (0.038)    E:    0 (0.022)
F:    1 (0.291)    G:    0 (0.127)    H:    0 (0.127)    I:    2 (0.317)
K:    0 (0.004)    L:    2 (0.284)    M:    1 (0.304)    N:    0 (0.041)
P:    0 (0.056)    Q:    0 (0.038)    R:    0 (0.013)    S:    0 (0.069)
T:    0 (0.079)    V:    1 (0.271)    W:    0 (0.218)    Y:    0 (0.085)

Number of buried Amino Acids...............:        7

*************************************************************

Packing Volume (estimate)..................:    8300.24 Angs**3
Packing Volume (actual)....................:    8149.90 Angs**3
Interior Volume of Protein.................:    4056.60 Angs**3
Exterior Volume of Protein.................:    4093.40 Angs**3

Partial Specific Volume....................:       0.73 ml/g

Fisher Volume Ratio (actual)...............:       1.01
Fisher Volume Ratio (idealized)............:       1.50

 >>> Molecule likely forms dimer or multimer (aggregates). <<<

Protein Solubility.........................:       1.47
Notes: solubility = 1.6 --> Average Protein
     solubility < 1.1 --> Insoluble Protein

 >>> Protein is likely water soluble. <<<

*************************************************************

Radius of Protein..........................:    15.17 Angs
RMS end to end distance of Ext. chain......:    81.24 Angs
Radius of Gyration of Extended chain.......:    33.17 Angs

*************************************************************

Solvation Free Energy of Folding...........:    -43.38 kcal/mol
~
~
~
~
~


9) After checking through the stats file, you may exit it simply by typing ":q" 
as before.  The computer then asks you whether it should save the file or 
not.  As usual we will respond with "y" and give our results file the name 
"redoxase.stat".  After entering the name, the main SEQSEE menu appears on 
the screen once again.


:q


Save this file? (Y/N)

>> y


Enter output filename.

>> redoxase.stat


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>>


10) Lets see if we can uncover more information about the sequence 
by checking if it contains any unusual sequence motifs or sequence patterns.  
This might give us an idea of what it does or what it looks like.  To do so, 
let's choose option "6" and initiate SEQSITE. 


>> 6


Seqsite Pattern Search (Version 1.2)

Please select a sequence motif database

  1) SEQSITE.db    (general sequence motifs)
  2) PHOSITE.db    (general phosphorylation sites)
  3) EPISITE.db    (antigenic sites)

Enter a number (then press return).

>> 1


11) We have three database options to choose from in the SEQSITE 
program.  Since we are primarily interested in determining whether this 
protein contains any known sequence motifs we will choose "SEQSITE.db"  
(option 1) which contains information on more than 1000 general sequence 
motifs.  After typing "1" and pressing <return> the following output appears.


Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>>


12) Type in "1" and press <return>, as before, since you already have 
a UNIX file containing your sequence.  As usual you are required to give the 
name of your sequence file (or SEQFILE) containing the sequence so we will 
type in "redoxase.seq".


>> 1

Enter input sequence filename.

>> redoxase.seq


13) After pressing <return> the SEQSITE analysis will automatically be 
written to the screen in the format shown below.  You may scroll through the 
file and make any changes you wish.


****************************************************************

    Program......: seqsite (version 1.2)
    Description..: Search for Interesting Motifs
    Date.........: Thu Feb 16 13:02:21 1993

    Sequence Name: bacillus_redoxase
    1 MSDKLIHITD DSFDTDVIKA DGAILVDFWA EWCGPCKMIA PILDELADEY
   51 QGKLTVAKLN

    Database.....: /sirius/local/seqsee/databases/seqsite.db

****************************************************************

**********(1)*********

Motif Matched...: *[STA]*[WG]C[AVG][PH]C*
Sequence Matched: WAEWCGPCK
Amino Acids.....: 29-37

 GLEASON, F.R. ET AL., FEMS MICRO REV. 54:271-297(1988)
 ACTIVE SITE FOR PROKARYOTIC/EUKARYOTIC THIOREDOXIN-LIKE MOLECULES


 Number of motifs found..:    1
 Number of motifs scanned: 1110
~
~
~
~
~


14) Well, Well...It looks like we've found something.  It appears that bacillus 
redoxase contains the active site for a certain class of molecules called 
thioredoxins.  Before leaping to any conclusions, though, we should check to 
see whether bacillus redoxase shares other similarities to thioredoxins or 
whether this shared sequence pattern is simply an accident of evolution.  To 
answer this question we need to get back to SEQSEE's main menu.  To do this 
we type ":q" as before and save the file by replying with a "y" and giving this 
file a name like "redoxase.site".


:q


Save this file? (Y/N)

>> y


Enter output filename.

>> redoxase.site


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>>


15) The best method to determine the evolutionary relatedness of any one 
protein with another is to perform a database alignment. Such a database 
alignment is offered by both the Fast Alignment Search and the Exhaustive 
Alignment Search options in SEQSEE (numbers 10 and 11 on the menu).  
Since we want a really quick (but not absolutely accurate) answer to our 
question, let's choose the Fast Alignment Search by typing "10".


>> 10

Fast Alignment (Version 1.2)

Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>>


16) Type in "1" and press <return>, as before, since you already have 
a UNIX file containing your sequence.  As usual you are required to give the 
name of your sequence file (or SEQFILE) containing the sequence 
(redoxase.seq).  For this type of program you are also required to indicate 
how many of the best alignments you want to keep -- we'll choose the top 
50.


>> 1

Enter sequence input filename:

>> redoxase.seq

This program keeps track of the top 'x' alignments.
Enter a value for 'x' where 0 < x < 500.

>> 200


17) For a search of this magnitude, the computer will take about 60 
to 90 seconds.  While doing the search, the program will indicate what it's 
doing and how many sequences it has scanned.  At the end of the search the 
top 200 alignments are printed out in descending order, with the best 
alignment at the top of the file.  Following is a sample of the output you 
should expect to see.


Initializing lookup table...

Reading database file: /sirius/seqsee/databases/pir.IG/*

Proteins: 1000    BestScore: 6624    GroupScore: 6624
Proteins: 2000    BestScore: 6624    GroupScore:  495
Proteins: 3000    BestScore: 6624    GroupScore: 1147
~
~
~
~
~
***************************************************************

    Program......: fast_align (version 1.2)
    Description..: Fast Alignment on database
    Date.........: Thu Feb 16 13:16:00 1993

    Sequence Name: bacillus_redoxase
    Amino Acids..: 60

    Database.....: PIR (Intelligenetics Version)

    Scoring Mat..: /sirius/local/seqsee/lib/wt.align
    Gap Penalty..: 20
    Gap Size Pen.: 5
    Tuple Cut-off: 48

***************************************************************


Number of proteins tested.: 44890
Number of alignments found:   200


***********(1)**********
Title....: Thioredoxin precursor -- Eschericia coli
Id.......: TXEC
FastScore: 6624
NW Score.: 1224
Matches..: 56

Query Seq..:                  MSDKLIHITDDSFDTDVIKADGAILVDFWAEW
Matching...:                  ||||*||*|||||||||*||||||||||||||
Database...:MLHQQRNQHARLIPVELYMSDKIIHLTDDSFDTDVLKADGAILV
DFWAEW

Query Seq..:CGPCKMIAPILDELADEYQGKLTVAKLN
Matching...:|||||||||||||*||||||||||||||
Database...:CGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIP
TLLLF

Query Seq..:
Matching
Database...:KNGEVAATKVGALSKGQLKEFLDANLA
~
~
~
~
~


18) It looks like we've got a hit!  Clearly bacillus redoxase is very closely 
related to E. coli thioredoxin.  Indeed, given the level of similarity between 
the two we can be quite certain the bacillus redoxase is actually bacillus 
subtillus thioredoxin.  A quick check through the full alignment file will 
reveal that thioredoxins are actually very common proteins that seem to 
ubiquitous in just about every creature presently known.  Obviously we 
would like to know more about thioredoxins so that we may find out what 
they do and how they function.  Of course we could run to the library 
and look up a few references, but we could also save ourselves some time by 
finding out immediately if some thioredoxins have already had their 
structures investigated by crystallography or NMR spectroscopy.  To do so 
we need to get back to SEQSEE's main menu.  As usual we type ":q" and save 
the file using the name "redoxase.align".


:q


Save this file? (Y/N): 

>> y


Enter output filename: 

>> redoxase.align


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>>


19) SEQSEE contains a file called SEQBANK which is a compilation of the 
sequences and secondary structures of all proteins which have had their 
structures reported in the literature.  We can access this databank through 
the File Viewer option (number 17 on the menu) and search for any 
occurrences of thioredoxin in this databank.  So let's type "17" and see what 
happens.


>> 17

File Viewer (Version 1.2)

What would you like to browse?

  1) User specified file
  2) PIRSEE database
  3) SWISSEE database
  4) SEQBANK database
  5) SEQSEE Control File
  0) Exit

Enter a number (then press return).

>>


20) Enter "4" and <return> since we want to inspect the SEQBANK database.  
Once this is done we should see the following file:


>> 4


#                    SEQBANK
#                 (REVISED DEC. 1992)
#
#                COPYRIGHT APRIL, 1992
#                  DAVID S. WISHART
#
#               DEPARTMENT OF BIOCHEMISTRY
#                UNIVERSITY OF ALBERTA
#                  EDMONTON, ALBERTA
#                     CANADA
#                     T6G 2H7
#
#    SEQBANK is a compilation of sequences and "consensus" secondary
#structure assignments of soluble proteins and peptides which have had
#their.....
~
~
~

>ACTIN (RABBIT SKELETAL)
#REFERENCE : KABSCH, W. ET AL., NATURE 347:37-44 (1990)
#REFERENCE : FLAHERTY, K.M. ET AL., PNAS 88:5041-5045 (1991)
#SEQBANK ID: 1
#BRKHAVN ID:
#PIR-NBR ID: ATRB
#SWISPRO ID: ACTS$RABIT
#RESOLUTION: 2.8
#R FACTOR  : 23.8
#FOLD CLASS: M
#NUM RESIDU: 375

DEDETTALVC DNGSGLVKAG FAGDDAPRAV FPSIVGRPRH QGVMVGMGQK
CCCCCCBBBB BBBCCBBBBB BBCCCCCCBB BBCCBBBBCC CCCCCCCCCC

DSYVGDEAQS KRGILTLKYP IEHGIITNWD DMEKIWHHTF YNELRVAPEE
CBBBCHHHHH HCCBBBBBCC BBBCBBBCCH HHHHHHHHHH HCCCCCCCCC

HPTLLTEAPL NPKANREKTM QIMFETFNVP AMYVAIQAVL SLYASGRTTG
CCBBBBBCHH HHHHHHHHHH HHHHHCCCCC BBBBBBCHHH HHHHCCCCBB

IVLDSGDGVT HNVPIYEGYA LPHAIMRLDL AGRDLTDYLM KILTERGYSF
BBBBCCCCBB BBBBBBCCBB BCCBBBBBCC CHHHHHHHHH HHHHHHCCCC

VTTAEREIVR DIKEKLCYVA LDFENAMATA ASSSSLEKSY ELPDGQVITI
CCHHHHHHHH HHHHHHCCCC CHHHHHHHHH HCCCCCCBBB BBCCCCBBBB

GNERFRCPET LFQPSFIGME SAGIHETTYN SIMKCDIDIR KDLYANNVMS
CCHHHHHHHH HHHCCCCCCC CCHHHHHHHH HHHHCCCHHH HHHHCCBBBB

GGTTMYPGIA DRMQKEITAL APSTMKIKII APPERKYSVW IGGSILASLS
CCCCCCCCHH HHHHHHHHHH HCCCCCBBBB CCHHHHHHHH HHHHHHHHCC

TFQQMWITKQ EYDEAGPSIV HRKCF
HHHHHCCCCH HHHHHCCHHH HHHCC
~
~


21) To check if thioredoxin is in this databank we use one of the "vi" editor 
commands for character string searches.  This is done by typing  
/THIOREDOXIN/ followed by <return> (note that this search is NOT case-
sensitive).  Once the command is entered, the files is scrolled through to 
locate the word "THIOREDOXIN".  Alternately, we could just scroll through the 
database and look for the word "THIOREDOXIN" in the sequence file header.  
This is easily done since SEQBANK is arranged alphabetically.  Regardless of 
how we choose to do it we find that we are indeed fortunate for it appears 
that E. coli thioredoxin has already had its crystal structure solved.  This will 
eventually permit us to accurately model the bacillus subtillus molecule and 
might also help us explain its apparently unusual redox activities.  To leave 
the File Viewer option, simply type ":q".


>THIOREDOXIN (E. COLI)
#REFERENCE : HOLMGREN, A. ET AL., PNAS (USA) 72:2305-2309 (1975)
#REFERENCE : DYSON, H.J. et al., BIOCHEMSITRY 28:7074-7087 (1989)
#REFERENCE : KATTI, S.K. et al., J. MOL. BIOL. 212:167-184 (1990)
#SEQBANK ID: 242
#BRKHAVN ID: 2TRX
#PIR-NBR ID: TXEC
#SWISPRO ID: THIO$ECOLI
#RESOLUTION: 1.7
#R FACTOR  : 16.5
#FOLD CLASS: M
#NUM RESIDU: 108

SDKIIHLTDD DFDTDLVKAD GAILVDFWAE WCGPCKMIAP ILDEIADEYQ
CCBBBBBBCC HHHHHHHHCC CBBBBBBBBC CCCCHHHHHH HHHHHHHHHC

GKLTVAKLNI DQNPGTAPKY IGRGIPTLLL FKNGEVAATK VGALSKGQLK
CCBBBBBBBC CCCHHHHHHH HHHCCCBBBB BBCCCBBBBB BCCCHHHHHH

EFLDANLA
HHHHHHHH


In sum, this exercise demonstrates how it is possible to begin with a 
fragmentary peptide sequence of unknown structure and function and end 
up with a great deal of knowledge about that peptide's putative structure, 
probable function and potential origin -- all in the matter of a few minutes.  
While this example clearly demonstrates the potential utility of SEQSEE it is 
important to understand that the results were obtained by adopting an 
efficient analytic strategy summarized below:

    1) Enter the new sequence into a SEQFILE
    2) Conduct a statistical analysis of the sequence using STATS
    3) Scan for sequence motifs using SEQSITE
    4) Carry out a fast database alignment with FAST_ALIGN
    5) Browse through the SEQBANK file to identify potentially related 
       structures and preliminary references


IX. SUMMARY OF SEQSEE MENU OPTIONS
SAMPLE INPUT AND OUTPUT WITH 
EXPLANATIONS

2. Enter/Edit a Sequence


            *******************************************
            *                                         *
            *    1. Choose option 1) to enter a       *
            *       new sequence                      *
            *    2. Enter the sequence name           *
            *    3. Enter the sequence (remember      *
            *       to type $ when finished)          *
            *    4. Check the sequence for errors     *
            *    5. Exit editor with ":q" or ":wq"    *
            *    6. Save the file                     *
            *                                         *
            *******************************************

The program known as SEQED is used for the entry and editing of new (or 
old) sequence files.  The program first queries the user as to whether he or 
she wishes to:

     1) Enter a new sequence
     2) Edit an old sequence 

If one chooses to enter a new sequence the program queries the user for the 
name of the sequence file (sequence filename), the name of the sequence 
(sequence name) and finally, the actual sequence (using the standard single 
letter amino acid code).  Sequences may be entered using either lower case 
letters, upper case letters or an arbitrary combination of both.  In other 
words, sequence entry is case independent.  The program also ignores blank 
characters so sequence entries may have as many blank spaces as desired.  
A "sequence ruler" is presented at the top of each sequence file entry line to 
permit quick identification of residue positions as they are typed.  After 
each group of 50 characters has been entered, the user is expected to press 
<return> so that a new sequence ruler can appear.  Upon completion of the 
sequence entry, the user must enter the '$' character to indicate to the 
computer that the typing process has finished.  Should any non-standard 
amino acid characters appear in the sequence file the program produces an 
error message and aborts the file saving procedure.


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) There are no options available for this particular menu item.

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 2                                     <--- <user input>


What would you like to do?

  1) Enter new sequence.
  2) Edit old sequence.
  0) Exit

Enter a number (then press return).

>> 1                                    <--- <user input>

Seqed (Version 1.2)

Enter name for sequence. Use underscores instead
of blanks to separate words. (eg. thioredoxin_human)

>> bacillus_redoxase                        <--- <user input>


Enter each amino acid (one letter code).
You may enter up to 50 amino acids on one line.
Press <return> to get a new prompt line.
When you are done enter $ and press <return>.


         1         2         3         4         5 
12345678901234567890123456789012345678901234567890
         |         |         |         |         |
msdklihitddsfdtdvikadgailvdfwaewcgpckmiapildeladey    <--- <user input>


         1         2         3         4         5 
12345678901234567890123456789012345678901234567890
         |         |         |         |         |
qgkltvakln$                                <--- <user input


>Title: bacillus_redoxase
MSDKLIHITDDSFDTDVIKADGAILVDFWAEWCGPCKMIAPILDELADEY
QGKLTVAKLN
~
~
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> redoxase.seq                            <--- <user input>


3. Retrieve Sequence from Database


            *******************************************
            *                                         *
            *    1. Indicate how your query will      *
            *       be entered                        *
            *    2. Enter the search query (remember  *
            *       to type "quit" when finished)     *
            *    3. Check the output file for the     *
            *       desired sequence or sequences     *
            *    4. Edit the file (if desired)        *
            *    5. Exit the editor with :q or :wq    *
            *    6. Save the file                     *
            *                                         *
            *******************************************


The program SEQRET is designed specifically for the user to find and retrieve 
complete sequences from the PIR or SWISS-PROT databases using either the 
PIR/SWISS-PROT accession number or the protein name (or portion thereof).  
Note that multiple sequence identifiers using the conjunctive "&" symbol 
may be employed for increased specificity (eg. CYSTIC&FIBROSIS&HUMAN 
for HUMAN CYSTIC FIBROSIS).  Thus one may seek and select only a single 
sequence for a specific purpose, or entire protein families to create special 
user-specified databases.  The sequences may be saved and/or edited for 
further analysis (ie. multiple alignments).  All sequences are saved in a 
SEQFILE format.  
    When using this function, the user is required to identify the method 
by which the query will be entered (either the keyboard or through a UNIX 
file) as well as the exact Id numbers or sequence names which must be 
searched for in the database.  Note that the user MUST type "quit" on the 
final line of his or her search string.  The word "quit" is used by the program 
as a termination flag and is essential for proper functioning of the program.  
The user is also required to provide a name for the output file (we suggest 
using the suffix ".scan" for consistency).  Note that only the protein (or 
peptide) name and its sequence are included in the output.


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS:

1) Choice of retrieval from PIR, PIR_IG, SWISS-PROT or SWISS-PROT_IG 
database formats.

2) Choice of file-update frequency (i.e. every 100, 500 or 1000 sequences).

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 3                                    <--- <user input>


Seqret (Version 1.2)

How will you enter your search queries?

  1) Protein Name(s) entered from the keyboard
  2) Protein Name(s) taken from a file
  3) Protein Id(s) entered from the keyboard
  4) Protein Id(s) taken from a file
  0) Exit program

Enter a number (then press return)

>> 1                                    <--- <user input>

Enter one search string per line.
Use underscores instead of blanks (eg. CYSTIC_FIBROSIS).
Use '&' symbol for conjunction (eg. FIBROSIS & CYSTIC).
Type QUIT (then press return) when done.

>> THIOREDOXIN                            <--- <user input>

>> quit                                <--- <user input>


Reading database file: /sirius/seqsee/databases/pir/*

Proteins scanned: 1000      Matches found...:    8        
Proteins scanned: 2000      Matches found...:    8    
Proteins scanned: 3000      Matches found...:    8        
~
~

*******************************************************************

    Program......: seqret (version 1.2)
    Description..: Sequence Retrieval Results
    Date.........: Thu Feb 16 14:01:34 1993

    Database.....: PIR (Intelligenetics Version)

    Searchstrings: THIOREDOXIN

*******************************************************************

>TXBY1  THIOREDOXIN I - YEAST (SACCHAROMYCES CEREVISIAE)
MVTQLKSASEYDSALASGDKLVVVDFFATWCTPCKMIAPMIEKFAEQYSD
AAFYKLDVDEVSDVAQKAEVSSMPTLIFYKGGKEVTRVVGANPAAIKQAI
ASNV

>TXBY2  THIOREDOXIN II - YEAST (SACCHAROMYCES CEREVISIAE)
MVTQFKTASEFDSAIAQDKLVVVDFYATWCGPCKMIAPMIEKFSEQYPQA
DFYKLDVDELGDVAQKNEVSAMPTLLLFKNGKEVAKVVGANPAAIKQAIA
ANA

>TXEC   THIOREDOXIN PRECURSOR - ESCHERICIA COLI
MLHQQRNQHARLIPVELYMSDKIIHLTDDSFDTDVLKADGAILVDFSATW
CGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLF
KNGEVAATKVGALSKGQLKEFLDANLA

>TXFK   THIOREDOXIN - CORYNEFORM BACTERIUM ATCC11425
ATVKVDNSNFQSDVLQSSEPVVVDFWAEWCGPCKMIAPALDEIATEMAGQ
VKIKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSADW
IKASA

>TXAI   THIOREDOXIN - ANABAENA SP.
SAAAQVTDSTFKQEVLDSDVPVLVDFWAPWCGPCRMVAPVVDEIAQQYEG
KIKVVTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSADW
TLEKHL
~
~
~
~
~

:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> trx.scan                            <--- <user input>


4. Sequence Statistics


            *******************************************
            *                                         *
            *    1. Indicate how your sequence will   *
            *       be entered                        *
            *    2. Enter the sequence or sequence    *
            *       filename                          *
            *    3. Check the output file for         *
            *       interesting information           *
            *    4. Exit the editor with ":q"         *
            *    5. Save the file                     *
            *                                         *
            *******************************************


The STATS program carries out a simple statistical analysis of any given 
protein sequence.  When using this program, the user is required to indicate 
how the sequence will be entered (via a SEQFILE in this case), what the name 
of the SEQFILE will be and what the name of the output file should be (we 
chose the suffix ".stat" for consistency).  The output file provides information 
on the number of residues, molecular weight, the amino acid composition, 
average hydropathy (based on the Kyte Doolittle parameters), total charge, 
predicted isoelectric point, expected quantity of exposed and interior surface 
area (Miller et al., 1987), expected packing volume (Richards, 1977), 
predicted specific volume, aggregation potential (Fisher, 1964), estimated 
solvation free energy of folding (Chiche et al., 1990), expected folded-protein 
radius and many other values that may be of structural or statistical 
interest.


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of hydrophobicity values (kyte.parms, hphob.* files or user-
defined).

2) Choice of threshold or cutoff values.

3) Choice of definition for hydrophobic and hydrophilic amino acids.

4) Choice of molecular volume values (mol.volume or user-defined).

5) Choice of residue-specific surface area values (mol.surfarea or user-
defined).

6) Choice of amino acid partial specific volumes (mol.parspecvol or user-
defined).

7) Choice of residue-specific polar, nonpolar and charged surface areas 
(mol.asa or user-defined).

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 4                                    <--- <user input>


Sequence Statistics (Version 1.2)


Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter input filename: 

>> redoxase.seq                            <--- <user input>


****************************************************************

    Program......: stats (version 1.2)
    Description..: Statistical Analysis of a Sequence
    Date.........: Thu Feb 16 13:02:21 1993

    Sequence Name: bacillus_redoxase
    1 MSDKLIHITD DSFDTDVIKA DGAILVDFWA EWCGPCKMIA PILDELADEY
   51 QGKLTVAKLN

****************************************************************

Molecular Weight......:    6684.86
Amino acids...........:    60
Mean residue weight...:    111.41


                *** Amino Acid Composition ***

  Amino    Freq       Freq        E(Freq)    Weight     E(weight)
  Acid   (total)   (percent)    (percent)  (percent)    (percent)
    A       6        10.00        <8.84>      6.40        <5.73>
    C       2         3.33        <2.09>      3.09        <1.97>
    D       9        15.00        <5.89>     15.54        <6.17>
    E       3         5.00        <5.90>      5.81        <6.94>
    F       2         3.33        <3.70>      4.42        <4.96>
    G       3         5.00        <8.29>      2.57        <4.31>
    H       1         1.67        <2.12>      2.06        <2.64>
    I       6        10.00        <5.40>     10.18        <5.57>
    K       5         8.33        <6.22>      9.01        <7.27>
    L       6        10.00        <7.93>     10.18        <8.18>
    M       2         3.33        <1.97>      3.94        <2.35>
    N       1         1.67        <4.59>      1.71        <4.78>
    P       2         3.33        <4.51>      2.91        <3.99>
    Q       1         1.67        <3.75>      1.92        <4.37>
    R       0         0.00        <4.21>      0.00        <6.00>
    S       2         3.33        <6.59>      2.61        <5.23>
    T       3         5.00        <5.96>      4.55        <5.49>
    V       3         5.00        <7.12>      4.46        <6.43>
    W       2         3.33        <1.37>      5.59        <2.32>
    Y       1         1.67        <3.56>      2.45        <5.30>


Note: E(x) are expected values based on average
    amino acid content of soluble proteins.

**************************************************************

Hydrophobicity Parameters: /canopus/rbo/seqsee/lib/kyte.parms

Average Hydrophobicity (ah)...................:    0.78
Notes: ah = -2.67 --> Average Protein
     ah >  0.10 --> Hydrophobic Protein
     ah < -6.00 --> Hydrophilic Protein

Ratio of Hydrophilicity to Hydrophobicity (rh):    0.95
Notes: rh =  1.22 --> Average Protein
     rh >  1.90 --> Non-folding Protein
     rh <  0.85 --> Insoluble Protein

Percentage of Hydrophobic residues............:   56.67
Notes: Average percentage is 52.44
     Hydrophobic Amino Acids are ACFGHILMVWY

Percentage of Hydrophilic residues............:   43.33
Notes: Average percentage is 47.56
     Hydrophilic Amino Acids are DEKNPQRST


Ratio of %Hydrophilic to %Hydrophobic.........:    0.76
Notes: rhp = 0.91  --> Average Protein
     rhp > 1.43  --> Non-folding Protein
     rhp < 0.77  --> Insoluble Protein

**************************************************************

Number of  Basic amino acids:         5
Number of Acidic amino acids:        12
Estimated pI for protein....:        4.60

pH:      3      4       5       6      7        8      9      10        11
Charge: 7.1    3.7     -2.6    -5.0    -5.9   -7.0    -9.0   -11.9     -14.4

Total linear charge density.:        0.32

**************************************************************

Polar Area of Extended Chain...............:    3666.20 Angs**2
Non-Polar Area of Extended Chain...........:    6923.10 Angs**2
Total Area of Extended Chain ..............:   10359.60 Angs**2


Polar ASA of Folded Protein................:    1117.84 Angs**2
Non-Polar ASA of Folded Protein............:    2839.88 Angs**2
ASA of folded protein .....................:    3957.72 Angs**2

Ratio of Folded to Extended Area...........:            0.40

*************************************************************

Buried Polar Area of Folded Protein........:    2096.61 Angs**2
Buried Non-polar Area of Folded Protein....:    3654.08 Angs**2
Buried Charge Area of Folded Protein.......:     239.61 Angs**2
Total Buried Surface.......................:    5990.30 Angs**2

Expected Number and Fraction of Residues 95% Buried

A:    1 (0.166)    C:    1 (0.284)    D:    0 (0.038)    E:    0 (0.022)
F:    1 (0.291)    G:    0 (0.127)    H:    0 (0.127)    I:    2 (0.317)
K:    0 (0.004)    L:    2 (0.284)    M:    1 (0.304)    N:    0 (0.041)
P:    0 (0.056)    Q:    0 (0.038)    R:    0 (0.013)    S:    0 (0.069)
T:    0 (0.079)    V:    1 (0.271)    W:    0 (0.218)    Y:    0 (0.085)

Number of buried Amino Acids...............:        7

*************************************************************

Packing Volume (estimate)..................:    8300.24 Angs**3
Packing Volume (actual)....................:    8149.90 Angs**3
Interior Volume of Protein.................:    4056.60 Angs**3
Exterior Volume of Protein.................:    4093.40 Angs**3

Partial Specific Volume....................:       0.73 ml/g

Fisher Volume Ratio (actual)...............:       1.01
Fisher Volume Ratio (idealized)............:       1.50

 >>> Molecule likely forms dimer or multimer (aggregates). <<<

Protein Solubility.........................:       1.47
Notes: solubility = 1.6 --> Average Protein
     solubility < 1.1 --> Insoluble Protein

 >>> Protein is likely water soluble. <<<

*************************************************************

Radius of Protein..........................:    15.17 Angs
RMS end to end distance of Ext. chain......:    81.24 Angs
Radius of Gyration of Extened chain........:    33.17 Angs

*************************************************************

Solvation Free Energy of Folding...........:    -43.38 kcal/mol
~
~
~
~
~

:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> redoxase.stat                        <--- <user input>


5. Structure Prediction


            *******************************************
            *                                         *
            *    1. Indicate how your sequence will   *
            *       be entered                        *
            *    2. Enter the sequence or sequence    *
            *       filename                          *
            *    3. Check the output file for         *
            *       interesting information           *
            *    4. Exit the editor with ":q"         *
            *    5. Save the file                     *
            *                                         *
            *******************************************


    The Structure Prediction function is a comprehensive structural 
analysis program which has been been developed expressly for the SEQSEE 
software suite.  This program performs calculations on the extent and 
location of potential membrane spanning regions, the identification of short 
sequence folding motifs, the prediction of the protein folding class and the 
prediction of secondary structure using the cumulative results of six 
different and well-tested methods.
    When using this program, the user is required to indicate how the 
sequence will be entered (via a SEQFILE in this case), what the name of the 
SEQFILE will be and what the name of the output file should be (we chose 
the suffix ".struc" for consistency).  The output file provides information on 
the location (if any) of membrane spanning regions, the identification 
of the probable protein folding class, the identification and location of all 
sequence/structure motifs and a complete prediction of the secondary 
structure of the protein or peptide.  In the latter case a three state 
prediction is used where we have defined H = helix, B = beta strand and C = 
coil.  Note that in the output file, the following designations apply: 

    Homology      Structure prediction using the homology method of Levin 
                  et al. (1988).
    Moment        Structure prediction using the hydro-moment method of 
                  Eisenberg (1984).
    GOR           Structure prediction using the method of Garnier et al. 
                  (1978).
    Chou-Fas      Structure prediction using the method of Chou and 
                  Fasman (1978).
    MotifLit      Structure prediction using sequence/structure motifs 
                  taken from the literature.
    MotifCmp      Structure prediction using sequence/structure motifs 
                  from computer searches.
    Consens       Structure prediction using a weighted sum of the above 
                  methods.


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of membrane spanning hydrophobicity values (kyte.parms, 
hphob.* files or user-defined).

2) Choice of scaling constants for membrane spanning test.

3) Choice of sequence motifs for identifying secondary structures  
(seqmotifX.db or user-defined).

4) Choice of statistical summary reporting frequence (i.e. every 100, 500 or 
1000 sequences).

5) Option to print individual motifs which match to the query sequence.

6) Option to print prediction and scoring arrays.

7) Choice of structure database for homology structure prediction 
(SEQBANK.db or user-defined).

8) Choice of scoring matrix to perform homology-based secondary structure 
prediction.

9) Choice of minimum threshold (test-stat) to identify significant homology.

10) Choice of structure-weighted scoring multipliers.

11) Choice of off-set values and multipliers for score normalization.

12) Option to apply smoothing functions "x" times to predictions.

13) Choice of weighting constants to force N and C-terminal predictions to be 
COIL.

14) Choice of scaling constants to weight HELIX or BETA predictions 
differently.

15) Option to smooth predicted structure (reduces "noise").

16) Choice of hydrophobic moment parameters (moment.parms or user-
defined).

17) Choice of number and type of hydrophobic periodicity tests for HELIX, 
BETA and COIL.

18) Choice of window size and weighting factors for HELIX, BETA and 
COIL.

19) Choice of GOR parameters for GOR secondary structure prediction 
(gor.new or gor.orig)

20) Choice of Chou-Fas parameters for secondary structure prediction 
(cfas.parms or user-defined).


******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 5                                    <--- <user input>


Alexis - Structure Prediction (Version 1.2)

Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter input filename:

>> trx.seq                                <--- <user input>


*************************************************************

    Program......: alexis (version 1.2)
    Description..: Structure Prediction/Analysis
    Date.........: Thu Feb 16 13:04:16 1993

    Sequence Name: THIOREDOXIN - ESCHERICHIA COLI
    Amino acids..: 108

*************************************************************

     *** Membrane Spanning Region Check ***

     No membrane spanning region found.        
   
      *** Structural Motifs from Literature ***         

*************(1)************
Amino Acid......: 29
Sequence Matched: AEWCGPC
Database Motif..: [TA]*WC[AG][PH]C
Motif Prediction: BCCCCHH
Reference.......: THIOREDOXIN ACTIVE SITE I

*************(2)************
Amino Acid......: 63
Sequence Matched: NPG
Database Motif..: [PGDN][PG][PGDN]
Motif Prediction: CCC
Reference.......: 90% ACCURATE MOTIFS


*************************************************************

Expected % alpha helix content: 39
Expected % beta sheet content: 32
Expected % coil content: 27

Correlation Coefficients for Protein Folding Class:
 A = 0.937100
 B = 0.874118
 M = 0.939555

Protein belongs to MIXED folding class

*************************************************************

    *** Secondary Structure Prediction ***


Sequence:SDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQ
Homology:CCBBBBBBCCHHHHHHHCCCCBBBBBBBBCCCCHHHHHHHHHHHHHHHHC
Moment..:CCCBBBBCCCCCCBBBBCCCCHHHHHHHHHHCCCHHHHHHHHHHHHHHCC
GOR.....:CCCBBBBBCCCCHHHHHHHHHHHBBHHHHHCCCCCBBBCCHHHHHHHHHH
Chou-Fas:CCCHHHHHCCCCHHHHHHCCCCHHHHHHHHHCCCCCBHHHHHHHHHHHCC
MotifLit:XXXXXXXXXXXXXXXXXXXXXXXXXXXXBCCCCHHXXXXXXXXXXXXXXX
MotifCmp:XXXXXXXXXCXXXXXXXCXXXXXXXXXXXXXXXCXXXXHHHXXXXXXXXX
Consens.:CCBBBBBBCCHHHHHHHCCCCBBBBBBBBCCCCHHHHHHHHHHHHHHHHC
~
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> trx.struc                            <--- <user input>


6. Seqsite Pattern Search


            *******************************************
            *                                         *
            *    1. Indicate how your sequence will   *
            *       be entered                        *
            *    2. Enter the sequence or sequence    *
            *       filename                          *
            *    3. Check the output file for         *
            *       interesting information           *
            *    4. Exit the editor with ":q"         *
            *    5. Save the file                     *
            *                                         *
            *******************************************
    

This procedure allows the user to search any given sequence for active sites, 
binding sites, signature sequences, phosphorylation sites, antigenic sites, and 
related functional or structural sequence patterns.  A library of more than 
1000 signature sequence patterns is contained in the SEQSITE database.  An 
addtional 50 phosphorylation sites is found in the PHOSITE database and a 
further 20 generalized antigenic sites is found in the EPISITE database  This 
type of "function search" is extremely useful for determining the properties 
and features of newly sequenced or poorly characterized proteins.
    When using this program, the user is required to indicate how the 
sequence will be entered (via a SEQFILE in this case), what the name of the 
SEQFILE will be and what the name of the output file should be (we chose 
the suffix ".site" for consistency).  The output file provides information on 
all sequence motifs identified including: what sequence was matched in the 
query protein, where this match occurred, the identity of the matching 
sequence motif, the most current reference describing the sequence motif 
and the name of the sequence motif as it is most commonly referred to in 
the literature.


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of general sequence motif database (SEQSITE.db or user-defined).

2) Choice of general phosphorylation site database (PHOSITE.db or user-
defined).

3) Choice of T-cell and B-cell antigenic site database (EPISITE.db or user-
defined).

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 6                                    <--- <user input>

SEQSITE Pattern Search (Version 1.2)

Please select a sequence motif database

  1) SEQSITE.db    (general sequence motifs)
  2) PHOSITE.db    (general phosphorylation sites)
  3) EPISITE.db    (antigenic sites)

Enter a number (then press return).

>> 1                                    <--- <user input>


Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter input filename: 

>> redoxase.seq                            <--- <user input>


****************************************************************

    Program......: seqsite (version 1.2)
    Description..: Search for Interesting Motifs
    Date.........: Thu Feb 16 13:02:21 1993

    Sequence Name: bacillus_redoxase
    1 MSDKLIHITD DSFDTDVIKA DGAILVDFWA EWCGPCKMIA PILDELADEY
   51 QGKLTVAKLN

    Database.....: /sirius/local/seqsee/databases/seqsite.db

****************************************************************


**********(1)*********
Motif Matched...: *[TA]*WC[AG][PH]C*
Sequence Matched: WAEWCGPCK
Amino Acids.....: 29-37


 GLEASON, F.R. ET AL., FEMS MICRO REV. 54:271-297(1988)
 ACTIVE SITE FOR PROKARYOTIC/EUKARYOTIC THIOREDOXIN-LIKE MOLECULES

 Number of motifs found..:    1
 Number of motifs scanned: 1110
~
~
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> redoxase.site                        <--- <user input>


7. Flexibility


            *******************************************
            *                                         *
            *    1. Indicate how your sequence will   *
            *       be entered                        *
            *    2. Enter the sequence or sequence    *
            *       filename                          *
            *    3. Check the output file for         *
            *       interesting information           *
            *    4. Exit the editor with ":q"         *
            *    5. Save the file                     *
            *                                         *
            *******************************************


The program named FLEQSEE predicts the flexibility and mobility of various 
regions in a protein based on sequence information alone.  Flexibility is 
calculated on the basis of the Karplus algorithm (Karplus and Schulz, 1985). 
In SEQSEE, flexibility may be used to determine the position and length 
of coil regions by locating all "significant" maxima (those maxima which 
exceed a minimum threshold) in the flexibility plot.  Flexibility plots may 
also be used to identify surface-seeking elements or to locate strongly 
antigenic regions of any given sequence.
    When using this program, the user is required to indicate how the 
sequence will be entered (via a SEQFILE in this case), what the name of the 
SEQFILE will be and what the name of the output file should be (we chose 
the suffix ".flex" for consistency).  The output file provides a numeric 
representation of the flexibility profile of the input sequence.  A legend 
located at the top of the file provides a means of interpreting the numbers in 
quasi-physical terms.  


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of output format ("raw" or "scaled" scores).

2) Choice of flexibility parameters (fleqsee.parms or user-defined).

3) Choice window size (default = 7 residues).

4) Option to vary weighting constants and weighting procedures (triangular, 
parabolic, linear, etc.).

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 7                                    <--- <user input>


Fleqsee (Version 1.2)

Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter input filename:

>> zipper.seq                            <--- <user input>


(OUTPUT WHEN "RAW SCORE" OPTION IS SELECTED)

*****************************************************************

    Program......: fleqsee (version 1.2)
    Description..: Sequence Flexibility Scoring
    Date.........: Thu Feb 16 13:05:26 1993

    Sequence Name: leu_zippper
    Amino Acids..: 35
    Flex Parms...: /sirius/local/seqsee/lib/fleqsee.parms

*****************************************************************


             #   |  RESIDUE  |   B FACTOR
             #   |  RESIDUE  | (NORMALIZED)
           ---------------------------------
              1  |     L     |     9.61
              2  |     Q     |    10.28
              3  |     R     |    10.28
              4  |     M     |     9.47
              5  |     K     |    10.82
              6  |     Q     |    10.28
              7  |     L     |     9.67
              8  |     E     |    10.36
              9  |     D     |    10.53
             10  |     K     |    10.82
             11  |     V     |     9.82
             12  |     E     |    10.36
             13  |     E     |    10.36
             14  |     L     |     9.61
             15  |     L     |     9.61
             16  |     S     |    10.36
             17  |     K     |    10.93
             18  |     N     |    10.06
             19  |     Y     |     9.30
             20  |     H     |     8.94

~
~
~
~
~
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> zipper.flex                            <--- <user input>


(OUTPUT WHEN "WEIGHTED SCORE" OPTION IS SELECTED)

*****************************************************************

    Program......: fleqsee (version 1.2)
    Description..: Sequence Flexibility Scoring
    Date.........: Thu Feb 16 13:05:26 1993

    Sequence Name: leu_zippper
    Amino Acids..: 35
    Flex Parms...: /sirius/local/seqsee/lib/fleqsee.parms


        *** Notes ***

    Flex Scores:    0 1 2 3 4 5 6 7 8 9
                      Low         High

    Likely coil regions found when:
        i) Flexibility is very high (8 or 9)
       ii) Regions with strong maxima (eg. 12466531)

*****************************************************************


Sequence...:LQRMKQLEDKVEELLSKNYHLENEVARLKKLVGER
Score......:98855566666555555433455543345555889
~
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> zipper.flex                            <--- <user input>


8. Hydrophobic Moment


            *******************************************
            *                                         *
            *    1. Indicate how your sequence will   *
            *       be entered                        *
            *    2. Enter the sequence or sequence    *
            *       filename                          *
            *    3. Check the output file for         *
            *       interesting information           *
            *    4. Exit the editor with ":q"         *
            *    5. Save the file                     *
            *                                         *
            *******************************************


MOMENT calculates the hydrophobic moment of a sequence using a modified 
Cornette et al. (1987) scale of hydrophobicity and the Fourier analysis 
technique of Eisenberg et al. (1984).  Calculations are preformed over set 
"sequence window" of predefined length using a range of values specific to 
helical periodicities (90 to 120 degrees) and beta strand periodicities (0 and 
160 to 180 degrees).
    When using this program, the user is required to indicate how the 
sequence will be entered (via a SEQFILE in this case), what the name of the 
SEQFILE will be and what the name of the output file should be (we chose 
the suffix ".mom" for consistency).  The output file provides a numeric 
representation of the hydrophobic moment profile of the input sequence.  A 
legend located at the top of the file provides a means of interpreting the 
numbers in quasi-physical terms.  Hydrophobic moment periodicity values 
and weighting schemes may be altered using the control file (through File 
Viewer).


*******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of output format ("raw" or "scaled" scores).

2) Choice of hydrophobicity parameters (hmom.* files or user-defined).

3) Choice of number of hydrophobic periodicity tests.

4) Choice of type of hydrophobic periodicity tests (HELIX and/or BETA 
periodicity).

5) Choice of window size and periodicity angle for each periodicity test.

6) Control over application of smoothing functions.

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 8                                    <--- <user input>


Hydrophobic Moment (Version 1.2)

Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter input filename:

>> zipper.seq                            <--- <user input>


(OUTPUT WHEN "RAW SCORE" OPTION IS SELECTED)

*****************************************************************

    Program......: moment (version 1.2)
    Description..: Hydrophobic Moment Scoring
    Date.........: Thu Feb 16 13:05:26 1993

    Sequence Name: leu_zippper
    Amino Acids..: 35
    Flex Parms...: /sirius/local/seqsee/lib/hmom.cornet

             *** Notes ***

             BETA SCALING FACTOR: 0.60
             HELIX SCALING FACTOR: 0.42

             HYDROPHOBICITY PARAMETERS
            ----------------------------
            |  A  -0.10      M   0.48  |
            |  C   0.45      N  -0.19  |
            |  D  -0.57      P  -0.45  |
            |  E  -0.39      Q  -0.53  |
            |  F   0.51      R   0.07  |
            |  G  -0.13      S  -0.19  |
            |  H  -0.06      T  -0.40  |
            |  I   0.55      V   0.54  |
            |  K  -0.56      W   0.02  |
            |  L   0.68      Y   0.33  |
            ----------------------------

*****************************************************************


         #   |  RESIDUE  |  RAW BETA  | RAW HELIX
       ----------------------------------------------
          1  |     L     |    0.25    |    0.21
          2  |     Q     |    0.20    |    0.22
          3  |     R     |    0.23    |    0.30
          4  |     M     |    0.05    |    0.32
          5  |     K     |    0.14    |    0.36
          6  |     Q     |    0.22    |    0.29
          7  |     L     |    0.24    |    0.30
          8  |     E     |    0.27    |    0.30
          9  |     D     |    0.25    |    0.29
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> zipper.mom                            <--- <user input>


(OUTPUT WHEN "WEIGHTED SCORE" OPTION IS SELECTED)

*****************************************************************

    Program......: moment (version 1.2)
    Description..: Hydrophobic Moment Scoring
    Date.........: Thu Feb 16 13:05:52 1993

    Sequence Name: leu_zippper
    Amino Acids..: 35
    Moment Parms.: /sirius/local/seqsee/lib/kyte.parms


        *** Notes ***

    Moment scores........: 0 1 2 3 4 5 6 7 8 9
                          Low          High

    High helix scores indicate likely helical regions
    High beta scores indicate likely beta strands

*****************************************************************

Sequence...:LQRMKQLEDKVEELLSKNYHLENEVARLKKLVGER
Helix Score:67899799999887766555667788888889764
Beta Score.:21484445665432234555554345553203142
~
~
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> zipper.mom                            <--- <user input>


9. Hydrophobicity


            *******************************************
            *                                         *
            *    1. Indicate how your sequence will   *
            *       be entered                        *
            *    2. Enter the sequence or sequence    *
            *       filename                          *
            *    3. Check the output file             *
            *       for interesting information       *
            *    4. Exit the editor with ":q"         *
            *    5. Save the file                     *
            *                                         *
            *******************************************


This procedure calculates the smoothed hydrophobicity (over a window of 
user-defined length) of any given sequence using a choice of several 
hydrophobicity scales.  The operator may choose from the Eisenberg 
consensus scale (Eisenberg et al., 1984), the Kyte-Doolittle scale (Kyte and 
Doolittle, 1982), the Cornette scale (Cornette et al., 1987) or the Parker-HPLC 
scale (Parker et al., 1986).  Hydrophobicity charts may be used to 
approximate the positions of coil regions, exposed loops or B-cell antigenic 
determinants in many proteins (hydrophilic regions).
    When using this program, the user is required to indicate how the 
sequence will be entered (via a SEQFILE in this case), what the name of the 
SEQFILE will be and what the name of the output file should be (we chose 
the suffix ".hydro" for consistency).  The output file provides a numeric 
representation of the hydrophobicity profile of the input sequence.  A legend 
located at the top of the file provides a means of interpreting the numbers in 
quasi-physical terms.  Hydrophobicity values and weighting schemes may be 
altered using the control file (through File Viewer).


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of output format ("raw" or "scaled" scores).

2) Choice of hydrophobicity parameters (hphob.* files or user-defined).

3) Choice window size (default = 7 residues).

4) Option to vary weighting constants and weighting procedures (triangular, 
parabolic, linear, etc.).

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 9                                    <--- <user input>


Hydrophobicity (Version 1.2)

Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter input filename:

>> zipper.seq                            <--- <user input>


(OUTPUT WHEN "RAW SCORE" OPTION IS SELECTED)

*****************************************************************

    Program......: hydro (version 1.2)
    Description..: Hydrophobicity Scoring of a Sequence
    Date.........: Thu Feb 16 13:05:26 1993

    Sequence Name: leu_zippper
    Amino Acids..: 35
    Flex Parms...: /sirius/local/seqsee/lib/hphob.kyte

         *** Notes ***

         SCALE FACTOR 1: -45
         SCALE FACTOR 2: 45

         HYDROPHOBICITY PARAMETERS
        ----------------------------
        |  A   0.18      M   0.19  |
        |  C   0.25      N  -0.35  |
        |  D  -0.35      P  -0.16  |
        |  E  -0.35      Q  -0.35  |
        |  F   0.28      R  -0.45  |
        |  G  -0.04      S  -0.08  |
        |  H  -0.32      T  -0.07  |
        |  I   0.45      V   0.42  |
        |  K  -0.39      W  -0.09  |
        |  L   0.38      Y  -0.13  |
        ----------------------------

*****************************************************************


         #   |  RESIDUE  |  RAW SCORE
       ---------------------------------
          1  |     L     |    -0.10
          2  |     Q     |    -1.00
          3  |     R     |    -1.60
          4  |     M     |    -1.50
          5  |     K     |    -1.80
          6  |     Q     |    -1.50
          7  |     L     |    -1.40
          8  |     E     |    -1.70
          9  |     D     |    -1.70
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> zipper.hydro                            <--- <user input>


(OUTPUT WHEN "WEIGHTED SCORE" OPTION IS SELECTED)

*****************************************************************

    Program......: hydro (version 1.2)
    Description..: Hydrophobicity Scoring of a Sequence
    Date.........: Thu Feb 16 13:12:12 1993

    Sequence Name: leu_zippper
    Amino Acids..: 35
    Hydro Parms..: /sirius/local/seqsee/lib/kyte.parms


        *** Notes ***

    Hydro Scores: 0 1 2 3 4 5 6 7 8 9
              Low             High

    High scores indicate strong hydrophobic regions
    Low scores indicate strong hydrophilic regions

*****************************************************************


Sequence...:LQRMKQLEDKVEELLSKNYHLENEVARLKKLVGER
Score......:95222222223455422222223344443455433
~
~

:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> zipper.hydro                            <--- <user input>


10. Fast Alignment Search


            *******************************************
            *                                         *
            *    1. Indicate how your sequence will   *
            *       be entered                        *
            *    2. Enter the sequence filename       *
            *    3. Enter the number of alignments    *
            *       to be saved                       *
            *    4. Check the output file for         *
            *       interesting information           *
            *    5. Exit the editor with ":q"         *
            *    6. Save the file                     *
            *                                         *
            *******************************************


FAST_ALIGN is a k-tuple based fast alignment algorithm based loosely on 
the speed-up protocols incorporated in Lipman and Pearson's FASTA (1988) 
and Altschul et al.'s BLAST  (1990).  The program is capable of searching the 
complete PIR and then ordering and aligning 50 homologous matches of a 
100 residue query sequence in less than 90 seconds.  FAST_ALIGN may be 
used to align sequences against the PIR, SWISS-PROT or a user-specified 
database with a SEQFILE format.  Several choices of scoring matrices are 
possible and these include: the Unity matrix, the Dayhoff PAM 250 matrix 
(Dayhoff et al., 1983), the Mclachlan matrix (Mclachlan, 1971) and the RBO 
matrix.  The RBO matrix is the default scoring matrix. 
    When using this program, the user is required to indicate how the 
sequence will be entered (via a SEQFILE in this case) and what the name of 
the SEQFILE will be.  The user is also required to provide the number of high 
scoring alignments that will be saved (often no more than 100-200 is 
required) as well as the name of an output file (we chose the suffix ".align" 
for consistency).  The output file contains information on the identity of the 
protein where a potential alignment was found, the PIR or SWISS-PROT Id or 
accession number, an initial Fast Alignment Score, the Optimal Alignment 
Score and the number of exact matches found.  Vertical lines (|) are used to 
identify exact matches and asterisks (*) are used to identify homologous 
matches.


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of sequence databases to be scanned (PIR, PIR_IG, SWISS-PROT or 
SWISS-PROT_IG).

2) Choice of scoring matrix (wt.align or user-defined).

3) Choice of minimum score to designate homologous residue pairs.

4) Choice of gap insertion and gap extension penalties.

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 10                                <--- <user input>


Fast Alignment (Version 1.2)

Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter input filename:

>> redoxase.seq                            <--- <user input>


This program keeps track of the top 'x' alignments.
Enter a value for 'x' where 0 < x < 500:

>> 200                                <--- <user input>

Initializing lookup table...

Reading database file: /sirius/seqsee/databases/pir/*

Proteins: 1000    BestScore: 6624    GroupScore: 6624
Proteins: 2000    BestScore: 6624    GroupScore:  495
Proteins: 3000    BestScore: 6624    GroupScore: 1147
~
~
***************************************************************

    Program......: fast_align (version 1.2)
    Description..: Fast Alignment on database
    Date.........: Thu Feb 16 13:16:00 1993

    Sequence Name: bacillus_redoxase
    Amino Acids..: 60

    Database.....: PIR (Intelligenetics Version)

    Scoring Mat..: /sirius/local/seqsee/lib/wt.align
    Gap Penalty..: 20
    Gap Size Pen.: 5
    Tuple Cut-off: 48

***************************************************************


Number of proteins tested.: 44890
Number of alignments found:   200


***********(1)**********
Title....: Thioredoxin precursor - Escherichia coli
Id.......: TXEC
NW Score.: 6624
FastScore: 1224
Matches..: 56

Query Seq..:                  MSDKLIHITDDSFDTDVIKADGAILVDFWAEW     
32
Matching...:                  ||||*||*|||||||||*||||||||||||||
Database...:MLHQQRNQHARLIPVELYMSDKIIHLTDDSFDTDVLKADGAILV
DFWAEW     
50

Query Seq..:CGPCKMIAPILDELADEYQGKLTVAKLN            
         
60
Matching...:|||||||||||||*||||||||||||||
Database...:CGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIP
TLLLF
    100

Query Seq..:
Matching
Database...:KNGEVAATKVGALSKGQLKEFLDANLA                
    127
~
~
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> redoxase.align                        <--- <user input>


11. Exhaustive Alignment Search
PIR/SWISS-PROT Database Option


            *******************************************
            *                                         *
            *    1. Select the database to be         *
            *       searched                          *
            *    2. Indicate how your sequence will   *
            *       be entered                        *
            *    3. Enter the sequence or sequence    *
            *        filename                         *
            *    4. Indicate the number of            *
            *       alignments to be saved            *
            *    5. Check the output file for         *
            *       interesting information           *
            *    6. Exit the editor with ":q"         *
            *    7. Save the file                     *
            *                                         *
            *******************************************


NW_ALIGN is a program which carries out an exhaustive pair-wise 
alignment of any given query sequence to all other sequences in a given 
database.  Only those sequences with scores above a certain user-defined 
threshold are retained.  The algorithm used for this procedure is based on 
the Needleman-Wunsch (1970) approach for pair-wise alignment.  This 
dynamic programming method is guaranteed to find the optimal alignment 
between any two sequences for any given scoring matrix.  Alignments can be 
done against the PIR, SWISS-PROT, SEQBANK or a user defined database in 
the SEQFILE format.  If alignments are done against SEQBANK, knowledge 
of the secondary structure is included to determine the location and length 
of gaps (Lesk et al., 1986).  A choice of scoring matrices and gap penalties is 
available.  The scoring matrices include: the Unity matrix, the Dayhoff 
PAM 250 matrix (Dayhoff et al., 1983), the Mclachlan matrix (Mclachlan, 
1971) and the RBO matrix.  The RBO matrix is the default scoring matrix.  
Other scoring matrices may be chosen by altering the SEQSEE control file 
(through the File Viewer option).  Scores are rigorously calculated on the 
basis of comparisons to randomized sequence alignments as recommended 
by Dayhoff et al. (1983).  The program is extremely time consuming (using 
the PIR/SWISS-PROT option) with a query sequence of 100 residues 
typically taking upto 4 hours to complete on a SUN Sparcstation.  However, 
the improvement in overall alignment accuracy and the possibility of 
identifying very remote and previously unidentified relationships may well 
be worth the wait.  To get around the problem of tying up the computer for 
long periods of time, the user may wish to place an exhaustive alignment run 
into the background.  This can be done as follows:


    1) Press the "control" and "z" keys simultaneously to temporarily 
    stop the job.

    2) Type "bg" and press the "return" key to restart the program in 
    the background.

The results can be viewed at any time by re-opening the SEQSEE window and 
inspecting the *.tmp files that are automatically created and updated during 
the alignment run.
    When using this program, the user is required to identify which 
database he or she wishes to search (the PIR database in this case), how the 
sequence will be entered (via a SEQFILE in this case) and what the name of 
the SEQFILE will be.  The user is also required to provide the number of high 
scoring alignments that will be saved (often no more than 50-100 is 
required) as well as the name of an output file (we chose the suffix ".align" 
for consistency).  The output file contains information on the name of the 
protein where a potential alignment was found, the PIR or SWISS-PROT Id or 
accession number, the Optimal Alignment Score, the Alignment Test Stat 
score (the number of standard deviations away from an expected "random" 
Optimal Alignment Score -- with 5.0 being the minimum for a significant 
match) and the number of exact matches found.  Vertical lines (|) are used 
to identify exact matches and asterisks (*) are used to identify homologous 
matches.


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of sequence database to be scanned (PIR, PIR_IG, SWISS-PROT, 
SWISS-PROT_IG).

2) Choice of scoring matrix (wt.rbo, wt.dayhoff, wt.levin, wt.mclach, wt.unit).

3) Choice of minimum score to designate homologous residue pairs.

4) Choice of method to sort and score aligned sequences (raw score, per 
residue score or jumbled)

5) Choice of jumble test values and thresholds.

6) Choice of file-update frequency (i.e. every 100, 500 or 1000 sequences).

7) Choice of gap-insertion and gap-extension penalties.

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 11                                <--- <user input>

Which database do you wish to search?

  1) Search PIR/SWISS-PROT database
  2) Search SEQBANK database
  0) Exit

Enter a number (then press return).

>> 1                                    <--- <user input>


Exhaustive Alignment (Version 1.2)

Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter input filename:

>> redoxase.seq                            <--- <user input>


This program keeps track of the top 'x' alignments.
Enter a value for 'x' where 0 < x < 500:

>> 50                                <--- <user input>

Reading database file: /sirius/seqsee/databases/pir/*

Proteins Scanned: 100    BestScore:  3.12    GroupScore: 3.12
Proteins Scanned: 200    BestScore:  3.44    GroupScore: 3.44
Proteins Scanned: 300    BestScore:  3.44    GroupScore: 3.22
~
~
***************************************************************

    Program......: nw_align (version 1.2)
    Description..: Best Alignment on PIR Database
    Date.........: Thu Feb 16 13:26:05 1993

    Sequence Name: bacillus_redoxase
    Amino Acids..: 60

    Database.....: PIR (Intelligenetics Version)

    Scoring Mat..: /sirius/local/seqsee/lib/wt.rbo
    Gap Penalty..: 10
    Gap Size Pen.: 2
    Sort Method..: 1
    Random Seed..: 13791

***************************************************************

Number of proteins tested.: 44890
Number of alignments found:    50


***********(1)**********
Title....: Thioredoxin precursor -- Eschericia coli
Id.......: TXEC
Test Stat: 20.83
NW Score.: 1224
Matches..: 56


Query Seq..:                  MSDKLIHITDDSFDTDVIKADGAILVDFWAEW     
32
Matching...:                  ||||*||*|||||||||*||||||||||||||
Database...:MLHQQRNQHARLIPVELYMSDKIIHLTDDSFDTDVLKADGAILV
DFWAEW     
50

Query Seq..:CGPCKMIAPILDELADEYQGKLTVAKLN            
         
60
Matching...:|||||||||||||*||||||||||||||
Database...:CGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIP
TLLLF
    100
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> redoxase.align                        <--- <user input>


11. Exhaustive Alignment Search
SEQBANK option


            *******************************************
            *                                         *
            *    1. Select the database to be         *
            *       searched                          *
            *    2. Indicate how your sequence will   *
            *       be entered                        *
            *    3. Enter the sequence or sequence    *
            *        filename                         *
            *    4. Indicate the number of            *
            *       alignments to be saved            *
            *    5. Check the output file for         *
            *       interesting information           *
            *    6. Exit the editor with ":q"         *
            *    7. Save the file                     *
            *                                         *
            *******************************************


The Exhaustive Alignment Search carries out an exhaustive pair-wise 
alignment of any given query sequence to all other sequences in a given 
database.  Only those sequences with scores above a certain user-defined 
threshold are retained.  The algorithm used for this procedure is based on 
the Needleman-Wunsch (1970) approach for pair-wise alignment.  This 
dynamic programming method is guaranteed to find the optimal alignment 
between any two sequences for any given scoring matrix.  Alignments can 
either be done against the PIR, SWISS-PROT, SEQBANK (as in this case) or a 
user defined database in the SEQFILE format.  If alignments are done against 
SEQBANK, knowledge of the secondary structure is included to determine the 
location and length of gaps (Lesk et al., 1986).  A choice of scoring matrices 
and gap penalties is available.  The scoring matrices include: the Unity  
matrix, the Dayhoff PAM 250 matrix (Dayhoff et al., 1983), the Mclachlan 
matrix (Mclachlan, 1971) and the RBO matrix.  The RBO matrix is the default 
scoring matrix.  Other scoring matrices may be chosen by altering the SEQSEE 
control file (through the File Viewer option).  Scores are rigorously calculated 
on the basis of comparisons to randomized sequence alignments as 
recommended by Dayhoff et al. (1983).  To get around the problem of tying 
up the computer for long periods of time, the user may wish to place this 
type of alignment run into the background.  This can be done as follows:

    1) Press the "control" and "z" keys simultaneously to temporarily 
    stop the job.

    2) Type "bg" and press the "return" key to restart the program in 
    the background.


The results can be viewed at any time by re-opening the SEQSEE window and 
inspecting the *.tmp files that are automatically created and updated during 
the alignment run.
    When using this option, the user is required to identify which 
database he or she wishes to search (the SEQBANK database in this case), 
how the sequence will be entered (via a SEQFILE in this case) and what the 
name of the SEQFILE will be.  The user is also required to provide the 
number of high scoring alignments that will be saved (often no more than 10 
is required) as well as the name of an output file (we chose the suffix ".align" 
for consistency).  The output file contains information on the name of the 
protein where a potential alignment was found, the SEQBANK Id or 
accession number, the Optimal Alignment Score, the Alignment Test Stat 
score (the number of standard deviations away from an expected "random" 
Optimal Alignment Score -- with 5.0 being the minimum for a significant 
match) and the number of exact matches found.  Vertical lines (|) are used 
to identify exact matches and asterisks (*) are used to identify homologous 
matches.  The secondary structure of the SEQBANK protein where the match 
was made is also included in the output file.


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of sequence/structure database to be scanned (SEQBANK.db or 
user-defined).

2) Choice of scoring matrix (wt.rbo, wt.dayhoff, wt.levin, wt.mclach, wt.unit).

3) Choice of minimum score to designate homologous residue pairs.

4) Choice of method to sort and score aligned sequences (raw score, per 
residue score or jumbled)

5) Choice of jumble test values and thresholds.

6) Choice of file-update frequency (i.e. every 10, 50 or 100 sequences).

7) Choice of gap-insertion and gap-extension penalties.

8) Choice of gap-insertion and gap-extension penalites in regions of 
secondary structure.

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 11                                <--- <user input>

Which database do you wish to search?

  1) Search PIR/SWISS-PROT database
  2) Search SEQBANK database
  0) Exit

Enter a number (then press return).

>> 2                                    <--- <user input>

SEQBANK Database Alignment (Version 1.2)

Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter input filename:

>> redoxase.seq                            <--- <user input>


This program keeps track of the top 'x' alignments.
Enter a value for 'x' where 0 < x < 500:

>> 10                                <--- <user input>


Reading database file: /sirius/seqsee/databases/SEQBANK.db

Proteins Scanned: 10    BestScore:  2.57    GroupScore: 2.57
Proteins Scanned: 20    BestScore:  2.57    GroupScore: 2.16
Proteins Scanned: 30    BestScore:  3.01    GroupScore: 3.01
~
~
***************************************************************

    Program......: sb_align (version 1.2)
    Description..: Find Best Alignments in SEQBANK
    Date.........: Thu Feb 16 13:45:15 1993

    Sequence Name: bacillus_redoxase
    Amino Acids..: 60

    Scoring Mat..: /sirius/local/seqsee/lib/wt.rbo
    Gap Penalty..: 10
    Gap Size Pen.: 2
    Sort Method..: 1
    Random Seed..: 13791

***************************************************************

Number of proteins tested.: 267
Number of alignments found:  10


************(1)************
Title....: THIOREDOXIN (E. COLI)
Id.......: 242
Score....: 598
Test Stat: 8.79
Matches..: 56


Query 
Seq..:MSDKLIHITDDSFDTDVIKADGAILVDFWAEWCGPCKMIAPILDELAD
EYQ
Matching...: ||| || ||||||||| ||||||||||||||||||||||||||| ||||
Database...: 
SDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQ
Structure..: 
CCBBBBBBCCHHHHHHHHCCCBBBBBBBBCCCCCHHHHHHHHHHHHHHHC

Query Seq..:GKLTVAKLN
Matching...:|||||||||
Database...:GKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGAL
SKGQLKE
Structure..:CCBBBBBBBCCCCHHHHHHHHHHCCCBBBBBBCCCBBBBBBBCCH
HHHHHH
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> redoxase.align                        <--- <user input>


12. Align 2 or More Sequences


            *******************************************
            *                                         *
            *    1. Indicate you wish to enter one    *
            *       or more sequences                 *
            *    2. Enter a sequence filename         *
            *       containing one or more sequences  *
            *    3. Indicate you have finished        *
            *       entering the sequence files       *
            *    4. Check the output file for         *
            *       interesting information           *
            *    5. Exit the editor with ":q"         *
            *    6. Save the file                     *
            *                                         *
            *******************************************


The program MULT_ALIGN uses a modification of the pair-wise Needleman-
Wunsch protocol to align two or more protein sequences.  The method is 
closely related to the progressive alignment procedure first described by 
Barton and Sternberg (1987), which permits rapid and accurate multiple 
alignments for up to several hundred proteins.  A consensus sequence is also 
produced for each multiple alignment.  A choice of scoring matrices and gap 
penalties is available.  Sequences which are to be aligned must be contained 
in SEQFILE formats, either in the form of databases (for multiple 
alignments) or singly (for pair-wise alignments).  The procedure for aligning 
more than two sequences (like the fast alignment search described in 8) is 
fundamentally heuristic in nature and so it cannot be proven that the 
resulting alignments are mathematically optimal.
    When using this program, the user is required to have the sequences 
he or she wishes to align in at least one or more files before beginning this 
operation.  The program prompts the user for sequence files and their 
filenames until the user indicates "I have entered all sequences".  The user is 
also asked for the name of an output file (we prefer the suffix ".mult" for 
consistency).  The program output includes the names and identification 
codes for all aligned proteins. 


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of scoring matrix (wt.rbo, wt.dayhoff, wt.levin, wt.mclach, wt.unit).

2) Choice of minimum score to designate homologous residue pairs.

3) Choice of method to sort and score aligned sequences (raw score, per 
residue score or jumbled)

4) Option to print all pairwise alignments.

5) Choice of threshold value to print consensus sequence.

6) Choice of gap-insertion and gap-extension penalties.

******************************************************************************

**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 12                                <--- <user input>


Multiple Alignment (Version 1.2)

Input of amino acid sequences for alignment

  1) I wish to enter one or more sequences.
  2) I am done entering all my sequences.
  0) Exit this program.

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter filename containing sequences. Remember wildcard
characters are acceptable within the filename:

>> trx.seqs                            <--- <user input>

Reading Sequence: THIOREDOXIN PRECURSOR - ESCHERICHIA COLI
Reading Sequence: THIOREDOXIN - CORYNEFORM BACTERIUM ATCC11425
Reading Sequence: THIOREDOXIN - ANABAENA SP.
Reading Sequence: THIOREDOXIN M - SPINACH CHLOROPLAST
Reading Sequence: THIOREDOXIN M - SYNECHOCOCCUS SP.
Reading Sequence: THIOREDOXIN - RHODOBACTER SPHAEROIDES

Current number of sequences for alignment: 6

Input of amino acid sequences for alignment

  1) I wish to enter one or more sequences
  2) I am done entering all my sequences
  3) Exit this program

Enter a number (then press return).

>> 2                                    <--- <user input>


Doing Pairwise Alignments.......


*****************************************************************

    Program......: mult_align (version 1.2)
    Description..: Align 2 or More Sequences
    Date.........: Thu Feb 16 13:25:11 1993

    Scoring Mat..: /sirius/local/seqsee/lib/wt.align
    Gap Penalty..: 10
    Gap Size Pen.: 2
    Sort Method..: 0
    Random Seed..: 13791
    Consensus %..: 70

*****************************************************************

Printing Multiple Alignment
***************************

Protein 1: THIOREDOXIN - ANABAENA SP.
Protein 2: THIOREDOXIN M - SYNECHOCOCCUS SP.
Protein 3: THIOREDOXIN - CORYNEFORM BACTERIUM ATCC11425
Protein 4: THIOREDOXIN - RHODOBACTER SPHAEROIDES
Protein 5: THIOREDOXIN PRECURSOR - ESCHERICIA COLI
Protein 6: THIOREDOXIN M - SPINACH CHLOROPLAST


Protein  1:               SAAAQ    VTDSTFKQEVLDSDVPVLVDF
    26
Protein  2:                MSVAAA    VTDATFKQEVLESSIPVLVDF
    27
Protein  3:                ATVK    VDNSNFQSDVLQSSEPVVVDF
    25
Protein  4:                  STVP    VTDATFDTEVRKSDVPVVVDF
    25
Protein  5: MLHQQRNQHARLIPVELYMSDKIIH    
LTDDSFDTDVLKADGAILVDF
    46
Protein  6:         
    KASAEKFIVQDVNDSGWKEFVLQSSEPSMVDF    32
 Consensus: -----------------------------V-D--F---VL-S--P--VDF
~
~
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> trx.mult                            <--- <user input>


13. Pattern Search
PIR/SWISS-PROT Database Option


            *******************************************
            *                                         *
            *    1. Indicate which database or        *
            *       sequence file you wish to search  *
            *    2. Enter the sequence pattern    or  *
            *       patterns to be searched for       *
            *    3. Check the output file for         *
            *       interesting information           *
            *    4. Exit the editor with ":q"         *
            *    5. Save the file                     *
            *                                         *
            *******************************************
            

This procedure searches the SEQBANK, SWISS-PROT or PIR database or a 
single sequence of your own choosing to find exact pattern matches 
according to the following rules (note the sequence patterns are case 
INDEPENDENT):

    a) X         Match exact residue specified where X = any amino acid
    b) !X        Match any residue EXCEPT X 
    c) *         Wild card character--matches any amino acid
    d) [XYZ]     "OR" braces--match X "or" Y "or" Z.  
    e) X&Y       "AND" character--match X "and" Y no matter what the 
                 separation
    f) X{2,8}Y   Match X and Y if separation is between 2 and 8 residues. 
                 "Range" braces--allow a range of wild card characters. i.e. 
                 {2,8} = 2 to 8 "*"
    g) $**X      Match X if located 2 residues from N terminus -- 
                 "Termination" characters are used to mark either the 
                 beginning (N terminus) or end (C terminus) of a sequence

Pattern Search (PSEARCH) is constructed to allow the user to enter several 
patterns at once, both on a single line (using the "&" feature) or on separate 
lines.  Patterns appearing on separate lines are treated as "independent" 
patterns (meaning they don't have to appear in the same protein sequence) 
while patterns with "&" characters are viewed as "dependent" patterns 
(meaning they do have to appear in the same protein sequence).

Some examples of sequence pattern searches are given below: 

    AA***K      Find all occurrences of 2 alanines together 
                followed by any 3 residues followed by a single 
                lysine
    
    AA!P!P!PK   Find all occurrences of 2 alanines together 
                followed by any 3 residues (as long as they are 
                NOT prolines) followed by a single lysine. (ie. look 
                for AA***K except AAP**K, AA*P*K, AA**PK, 
                AA*PPK, AAPP*K, AAPPPK)

    [AG][AG]*[KR]    Find all occurrences of 2 alanines or 2 glycines or 
                any combination of the two followed by any 
                residue followed by a lysine or an arginine. (ie. 
                look for AA*K, AG*K, GA*K, GG*K, AA*R, AG*R, GA*R 
                and GG*R)

    When using this subroutine, the user is required to identify which 
database he or she wishes to search (the PIR database in this case) and what 
the query sequence is (using a single letter amino acid code).  Note that the 
user MUST type "quit" on the final line of his or her search string.  The word 
"quit" is used by the program as a termination flag and is essential 
for proper functioning of the program.  The user must also provide a name 
for the output file (we suggest using a ".patt" suffix for consistency).  The 
output file contains information on the name of the protein where a match 
was found, the PIR Id  number and the location where the match begins in 
the database protein (DbRes). The secondary structure of the SEQBANK 
protein where the match was made is also included in the output file.


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of sequence database to be scanned (PIR, SWISS-PROT, 
SEQBANK.db or user-defined).

2) Option to allow mutliple matches of a search string in a sequence.

3) Choice of file-update frequency (i.e. every 100, 500 or 1000 
sequences).

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 13                                <--- <user input>


Pattern Search (Version 1.2)

How do you wish to use Pattern Search?

  1) Search PIR/SWISS-PROT database
  2) Search SEQBANK database
  3) Search user-defined sequence
  0) Exit

Enter a number (then press return).

>> 1                                    <--- <user input>

enter one sequence pattern per line:
enter QUIT (then press return) when done:

>> IKYLEFISEAIIHVL                        <--- <user input>

>> quit                                <--- <user input>


Reading database file: /sirius/seqsee/databases/pir/*

Proteins Scanned: 1000     Matches: 0
Proteins Scanned: 2000     Matches: 0
Proteins Scanned: 3000     Matches: 0
Proteins Scanned: 4000   Matches: 0

Reading database file: /sirius/seqsee/databases/pir/*

Proteins Scanned: 5000     Matches: 9
~
~
~
~

*******************************************************************

    Program......: psearch (version 1.2)
    Description..: Pattern Search Results
    Date.........: Thu Feb 16 13:28:14 1993

    Database.....: PIR (Intelligenetics Version)

    SearchStrings: IKYLEFISEAIIHVL

*******************************************************************


***********(1)**********
Title.......: Myoglobin - California sealion
Id..........: MYZC     
Amino Acids:  101-115

Sequence..:IKYLEFISEAIIHVL
Matching..:IKYLEFISEAIIHVL


***********(2)**********
Title......: Myoglobin - Gray seal and harbor seal
Id.........: MYSLG       
Amino Acids: 101-115

Sequence..:IKYLEFISEAIIHVL
Matching..:IKYLEFISEAIIHVL
~
~
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> myo.pat                                <--- <user input>


13. Pattern Search (SEQBANK Option)


            *******************************************
            *                            *
            *    1. Indicate which database or         *
            *       sequence file you wish to search    *
            *    2. Enter the sequence pattern or        *
            *       patterns to be searched for        *
            *    3. Check the output file for         *
            *       interesting information        *
            *    4. Exit the editor with ":q"        *
            *    5. Save the file                *
            *                            *
            *******************************************
            

This procedure searches the SEQBANK, SWISS-PROT or PIR database or a 
single sequence of your own choosing to find exact pattern matches 
according to the following rules (note the sequence patterns are case 
INDEPENDENT):

    a) X         Match exact residue specified where X = any amino acid
    b) !X        Match any residue EXCEPT X 
    c) *        Wild card character--matches any amino acid
    d) [XYZ]    "OR" braces--match X "or" Y "or" Z.  
    e) X&Y    "AND" character--match X "and" Y no matter what the 
            separation
    f) X{2,8}Y    Match X and Y if separation is between 2 and 8 residues. 
            "Range" braces--allow a range of wild card characters. i.e. 
            {2,8} = 2 to 8 "*"
    g) $**X    Match X if located 2 residues from N terminus -- 
            "Termination" characters are used to mark either the 
            beginning (N terminus) or end (C terminus) of a sequence

Pattern Search (PSEARCH) is constructed to allow the user to enter several 
patterns at once, both on a single line (using the "&" feature) or on separate 
lines.  Patterns appearing on separate lines are treated as "independent" 
patterns (meaning they don't have to appear in the same protein sequence) 
while patterns with "&" characters are viewed as "dependent" patterns 
(meaning they do have to appear in the same protein sequence).

Some examples of sequence pattern searches are given below: 

    AA***K        Find all occurrences of 2 alanines together 
                followed by any 3 residues followed by a single 
                lysine
    
    AA!P!P!PK        Find all occurrences of 2 alanines together 
                followed by any 3 residues (as long as they are 
                NOT prolines) followed by a single lysine. (ie. look 
                for AA***K except AAP**K, AA*P*K, AA**PK, 
                AA*PPK, AAPP*K, AAPPPK)

    [AG][AG]*[KR]    Find all occurrences of 2 alanines or 2 glycines or 
                any combination of the two followed by any 
                residue followed by a lysine or an arginine. (ie. 
                look for AA*K, AG*K, GA*K, GG*K, AA*R, AG*R, GA*R 
                and GG*R)

    When using this subroutine, the user is required to identify which 
database he or she wishes to search (the SEQBANK database in this case) and 
what the query sequence is (using a single letter amino acid code).  Note that 
the user MUST type "quit" on the final line of his or her search string.  The 
word "quit" is used by the program as a termination flag and is essential 
for proper functioning of the program.  The user must also provide a name  
for the output file (we suggest using a ".pat" suffix for consistency).  The 
output file contains information on the name of the protein where a match 
was found, the SEQBANK Id  number and the location where the match 
begins in the database protein (DbRes). The secondary structure of the 
SEQBANK protein where the match was made is also included in the output 
file.


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of sequence database to be scanned (PIR, SWISS-PROT, 
SEQBANK.db or user-defined).

2) Option to allow mutliple matches of a search string in a sequence.

3) Choice of file-update frequency (i.e. every 100, 500 or 1000 
sequences).

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)           *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart       *
*                        Fred Richards / Brian Sykes           *
* Location..:                 University of Alberta               *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                          10) Fast Alignment Search
   2) Enter/Edit a Sequence              11) Exhaustive Alignment 
Search
   3) Retrieve Sequence from Database      12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics              13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search              15) Dot Plot
   7) Flexibility                      16) Database Reference Search
   8) Hydrophobic Moment              17) File Viewer
   9) Hydrophobicity                   0) EXIT SEQSEE

   Enter the number of the desired function

>> 13                                <--- <user input>


Pattern Search (Version 1.2)

How do you wish to use Pattern Search?

  1) Search PIR/SWISS-PROT database
  2) Search SEQBANK database
  3) Search user-defined sequence
  4) Exit

Enter a number (then press return).

>> 2                                    <--- <user input>

enter each pattern one per line:
enter QUIT (then press return) when done:


>> AED**[MIL]                            <--- <user input>

>> KST***[KRG]*[LIVM]                    <--- <user input>

>> AA***[RK]                            <--- <user input>

>> quit                                <--- <user input>


*******************************************************************

    Program......: psearch (version 1.2)
    Description..: Pattern Search Results
    Date.........: Thu Feb 16 13:28:24 1993

    Database.....: SEQBANK

    SearchStrings: AED**[MIL]
               KST***[KRG]*[LIVM]
               AA***[KR]

*******************************************************************


***********(1)***********
Title......: ALCOHOL DEHYDROGENASE (HORSE LIVER)
Id.........: 7        
Amino Acids: 213-218

Sequence..: AACAAR
Matching..: AA***R
Structure.: HHHCCB

***********(2)***********
Title......: ALKALINE PHOSPHATASE (E. COLI)
Id.........: 9        
Amino Acids: 444-449

Sequence..: AALGLK
Matching..: AA***K
Structure.: HHHHCC
~
~
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> misc.pat                            <--- <user input>


14.  Homology Search
PIR/SWISS-PROT Database Option


            *******************************************
            *                            *
            *    1. Indicate which database or         *
            *       sequence file you wish to search    *
            *    2. Enter the sequence pattern              *
            *       to be searched for            *
            *    3. Enter the number of alignments       *
            *        to be saved                    *
            *    4. Check the output file for         *
            *       interesting information        *
            *    5. Exit the editor with ":q"        *
            *    6. Save the file                *
            *                            *
            *******************************************


The HSEARCH program searches the SWISS-PROT, PIR , SEQBANK or a 
compatible user-defined database (or sequence file) to find the "nearest" or 
most homologous matches to any given input sequence.   Homologies are 
determined according to any one of four user-defined scoring matrices 
(described earlier).  Gaps are not allowed in the homology search (if gaps are 
required, use the fast alignment option instead).  The homology search is a 
useful complement to other pattern search routines, especially when 
attempting to locate distantly related or difficult-to-identify sequence 
motifs. 
    When using this program, the user is required to identify which 
database he or she wishes to search (the PIR database in this case) as well as 
the sequence that is to be searched for in the database.  The user is also 
required to provide a name of an output file (we suggest using the suffix 
".hom") and the number of high scoring searches that are to be kept (we 
chose 50).  The output file contains information on the name of the protein 
where a match was found, the PIR or SWISS-PROT Id or accession number, 
the location where the match begins in the database protein (DbRes) and the 
homology score (Score).  Vertical lines (|) are used to identify exact matches 
and asterisks (*) are used to identify homologous matches.


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of sequence database to be scanned (PIR, SWISS-PROT or user-
defined).
2) Choice of scoring matrix (wt.* files or user-defined).
3) Choice of minimum score to designate homologous residue pairs.
4) Choice of file-update frequency (i.e. every 100, 500 or 1000 sequences).

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 14                                <--- <user input>

Homology Search (Version 1.2)

How do you wish to use Homology Search?

  1) Search PIR/SWISS-PROT database
  2) Search SEQBANK database
  3) Search user-defined sequence
  0) Exit

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter sequence (one-letter code).
Press <return> when done.

>> CFYQRCRGD                            <--- <user input>


This program keeps track of the top 'x' searches
Enter a value for 'x' where 0 < x < 500:

>> 50                                <--- <user input>


Reading database file: /sirius/seqsee/databases/pir/*

Proteins Scanned: 1000        BestScore 134        GroupScore 134
Proteins Scanned: 2000        BestScore 142        GroupScore 142
Proteins Scanned: 3000        BestScore 142        GroupScore 137
~
~


*******************************************************************

    Program......: hsearch (version 1.2)
    Description..: Homology Search Results
    Date.........: Thu Feb 16 14:29:17 1993

    Database.....: PIR (Intelligenetics Version)

    Scoring Mat..: /sirius/local/seqsee/lib/wt.align

*******************************************************************

    Number of proteins tested: 44890
    Number of matches found..:    50


***********(1)***********
Title......: *Lipase - Rat
Id.........: S03672    Amino Acid:   163    
Score......: 164
Query Seq..: CFYQRCRGD
Matching...: ||| || |
Database...: CFYGRCLGF


***********(2)***********
Title......: *Proline-rich protein precursor - Human
Id.........: A33568   Amino Acid:   108    
Score......: 158
Query Seq..: CFYQRCRGD
Matching...: |*||||
Database...: CIYKRCQHP
~
~
~
~
~
:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> misc.hom                            <--- <user input>


14.  Homology Search
SEQBANK Option


            *******************************************
            *                                         *
            *    1. Indicate which database or        *
            *       sequence file you wish to search  *
            *    2. Enter the sequence pattern        *
            *       to be searched for                *
            *    3. Enter the number of alignments    *
            *            to be saved                  *
            *    4. Check the output file for         *
            *       interesting information           *
            *    5. Exit the editor with ":q"         *
            *    6. Save the file                     *
            *                                         *
            *******************************************


The HSEARCH program searches the PIR, SWISS-PROT, SEQBANK or a 
compatible user-defined database (or sequence file) to find the "nearest" or 
most homologous matches to any given input sequence.   Homologies are 
determined according to any one of four user-defined scoring matrices 
(described earlier).  Gaps are not allowed in the homology search (if gaps are 
required, use the fast alignment option instead).  The homology search is a 
useful complement to other pattern search routines, especially when 
attempting to locate distantly related or difficult-to-identify sequence 
motifs. 
    When using this program, the user is required to identify which 
database he or she wishes to search (the SEQBANK database in this case) as 
well as the sequence that is to be searched for in the database.  The user is 
also required to provide the number of high scoring searches that are to be 
kept (we chose 10) as well as the name of an output file (we suggest using 
the suffix ".hom").  The output file contains information on the name of the 
protein where a match was found, the SEQBANK Id or accession number, the 
location where the match begins in the database protein (DbRes), the 
homology score (Score) and the secondary structure.  Vertical lines (|) are 
used to identify exact matches and asterisks (*) are used to identify 
homologous matches.


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of sequence database to be scanned (PIR, SWISS-PROT or user-
defined).
2) Choice of scoring matrix (wt.* files or user-defined).
3) Choice of minimum score to designate homologous residue pairs.
4) Choice of file-update frequency (i.e. every 100, 500 or 1000 sequences).

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 14                                <--- <user input>

Homology Search (Version 1.2)

How do you wish to use Homology Search?

  1) Search PIR/SWISS-PROT database
  2) Search SEQBANK database
  3) Search user-defined sequence
  0) Exit

Enter a number (then press return).

>> 2                                    <--- <user input>

enter sequence (one-letter code).
Press <return> when done.

>> CFYQRCRGD                            <--- <user input>

>> quit                                <--- <user input>


This program keeps track of the top 'x' searches
Enter a value for 'x' where 0 < x < 500:

>> 10                                <--- <user input>


*******************************************************************

    Program......: hsearch (version 1.2)
    Description..: Homology Search Results
    Date.........: Thu Feb 16 14:29:17 1993

    Database.....: SEQBANK

    Scoring Mat..: /sirius/local/seqsee/lib/wt.align

*******************************************************************

    Number of proteins tested: 267
    Number of matches found..:  10


***********(1)**********
Title.....: GLUCOCORTICOID RECEPTOR DNA BINDING DOMAIN (RAT)
Id........: 110    Amino Acid:  56        
Score.....: 126
Query Seq.: CFYQRCRGD
Matching..: | |**|
Database..: CRYRKCLQA
Structure.: HHHHHHHHH

***********(2)**********
Title.....: SHORT SCORPION TOXIN
Id........: 183    Amino Acid:  26    
Score.....: 124
Query Seq.: CFYQRCRGD
Matching..: ||  *|
Database..: CFGPQCLCN
Structure.: BBCCBBBBB
~
~
~
~

:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> misc.hom                            <--- <user input>


15. Dot Plot
Compare 2 Similar Sequences

            *******************************************
            *                                         *
            *    1. Indicate how you wish to use      *
            *       Dot Plot (option 2)               *
            *    2. Indicate how your first sequence  *
            *        will be entered                  *
            *    3. Enter the sequence filename       *
            *       for the first sequence            *
            *    4. Indicate how your second          *
            *        sequence will be entered         *
            *    5. Enter the sequence filename       *
            *       for the second sequence           *
            *    6. Enter the number of alignments    *
            *        to be saved                      *
            *    7. Check the output file for         *
            *       interesting information           *
            *    8. Exit the editor with ":q"         *
            *     9. Save the file                    *
            *                                         *
            *******************************************

DOTPLOT is an extremely flexible program developed to produce character 
representations of standard dotplots.  DOTPLOT may be used to compare a 
sequence with itself (to identify internal repeats), with another sequence 
(for pair-wise alignments), with a SEQFILE compatible database or the 
PIR/SWISS-PROT databases (for medium speed alignments).  
    When using this function, the user is requested to identify which type 
of dotplot will be done (one sequence against another, one sequence against 
a large number of sequences or one sequence against itself), how the 
sequence(s) will be entered (via the keyboard or through a SEQFILE), the 
number of "diagonals" that should be identified and saved and, finally,  the 
name of the output file (we suggest using the suffix ".dot" for consistency).  
The output file contains information on where the diagonal begins in the 
query sequence (QpRes), where the diagonal begins in the "database" 
sequence (DbRes), the level of homology (Homology Score) and the location of 
the diagonal in the DOTPLOT matrix (0 being the location of the main 
diagonal).


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of sequence database to be scanned (PIR, SWISS-PROT or user-
defined).
2) Choice of scoring matrix (wt.* files or user-defined).
3) Choice of minimum score to designate homologous residue pairs.
4) Choice of minimum threshold score and length extension penalties.

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 15                                <--- <user input>


Dotplot (Version 1.2)

How do you wish to use the dot plot algorithm?

  1) Do dotplot against the PIR/SWISS-PROT database
  2) Do dotplot with my two input sequences
  3) Look for internal repeats
  0) Exit

Enter a number (then press return)

>> 2                                    <--- <user input>

*** First amino acid sequence ***

Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter input filename:

>> zipper1.seq                            <--- <user input>


*** Second amino acid sequence ***

Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter input filename:

>> zipper2.seq                            <--- <user input>


This program keeps track of the top 'x' alignments.
Enter a value for 'x' where 0 < x < 500:

>> 20                                <--- <user input>


*******************************************************************

    Program......: dotplot (version 1.2)
    Description..: Finds Regions of Homology (no gaps)
    Date.........: Thu Feb 16 13:31:19 1993

    Scoring Mat..: /sirius/local/seqsee/lib/wt.align
    LengthPenalty: 5
    Min Threshold: 80
    mSearchFlag..: 0 (1=yes, 0=no)

    Sequence Name: leu_zipper1
             1 LQKMKGLENK VAEKLSKNYH LERLRALENK LVGER

*******************************************************************


Number of proteins tested:    1
Number of searches found:    3


**********(1)*********
Title...: leu_zipper1
Id......: Title:    
Score...: 467    Diagonal: 0        QpRes...: 1        DbRes...: 1    

Query Seq..:LQKMKGLENKVAEKLSKNYHLERLRALENKLVGER
Matching...:||*|| ||*|| | ||||||||||*||||||||||
Database...:LQRMKQLEDKVEELLSKNYHLERLKALENKLVGER


**********(2)*********
Title...: leu_zipper
Id......: Title:    
Score...: 123    Diagonal: 20    QpRes...: 1     DbRes...: 21        

Query Seq..:LQKMKGLENKV
Matching...:|***| ||||*
Database...:LERLKALENKL
~
~
~
~
~

:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> zipper.dot                            <--- <user input>


15. Dot Plot
Internal Repeat Option

            *******************************************
            *                                         *
            *    1. Indicate how you wish to use      *
            *       Dot Plot (option 3)               *
            *    2. Indicate how your sequence        *
            *        will be entered                  *
            *    3. Enter the sequence or sequence    *
            *       filename                          *
            *    4. Enter the number of alignments    *
            *        to be saved                      *
            *    5. Check the output file for         *
            *       interesting information           *
            *    6. Exit the editor with ":q"         *
            *     7. Save the file                    *
            *                                         *
            *******************************************

DOTPLOT is an extremely flexible program developed to produce character 
representations of standard dotplots.  The low resolution of most character-
defined screens prevents the incorporation of a useful graphic 
representation of dotplot results and hence a character representation with a 
user defined "threshold" has been incorporated to overcome this problem.  
DOTPLOT may be used to compare a sequence with itself (to identify internal 
repeats), with another sequence (for pair-wise alignments), with a SEQFILE 
compatible database or the PIR/SWISS-PROT databases (for medium speed 
alignments).  
    When using this function, the user is requested to identify which type 
of dotplot will be done (one sequence against another, one sequence against 
a large number of sequences or one sequence against itself), how the 
sequence(s) will be entered, the number of "diagonals" that should be 
identified and saved and, finally, the name of the output file.  The output file 
contains information on where the diagonal begins in the query sequence 
(QpRes), where the diagonal begins in the "database" sequence (DbRes), the 
level of homology (Homology Score) and the location of the diagonal in the 
DOTPLOT matrix (0 being the location of the main diagonal).


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of sequence database to be scanned (PIR, SWISS-PROT or user-
defined).
2) Choice of scoring matrix (wt.* files or user-defined).
3) Choice of minimum score to designate homologous residue pairs.
4) Choice of minimum threshold score and length extension penalties.

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 15                                <--- <user input>


Dotplot (Version 1.2)

How do you wish to use the dot plot algorithm?

  1) Do dotplot against the PIR/SWISS-PROT database
  2) Do dotplot with my two input sequences
  3) Look for internal repeats
  0) Exit

Enter a number (then press return)

>> 3                                    <--- <user input>


*** First amino acid sequence ***

Your amino acid sequence is now required:

  1) Read sequence from an input file
  2) Sequence to be entered via keyboard
  3) I do not have my sequence ready

Enter a number (then press return).

>> 1                                    <--- <user input>

Enter input filename:

>> zipper3.seq                            <--- <user input>


This program keeps track of the top 'x' alignments.
Enter a value for 'x' where 0 < x < 500:

>> 20                                <--- <user input>


*******************************************************************

    Program......: dotplot (version 1.2)
    Description..: Finds Regions of Homology (no gaps)
    Date.........: Thu Feb 16 13:31:19 1993

    Scoring Mat..: /sirius/local/seqsee/lib/wt.align
    LengthPenalty: 5
    Min Threshold: 80
    mSearchFlag..: 0 (1=yes, 0=no)

    Sequence Name: leu_zipper3
             1 LQRMKQLEDK VEELLSKNYH LERLKALENK LVGER

*******************************************************************

    Number of proteins tested:    1
    Number of searches found:    1


***********(1)**********
Title....: leu_zipper3
Id.......: Title:
Score....: 60    Diagonal.: 14    QpRes...: 1        DbRes...: 15
        
    
Query Seq:LQRMKQLEDKV
Matching.:|*|*| ||*|*
Database.:LERLKALENKL
~
~
~
~
~
~


:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> zipper.dot                            <--- <user input>


     16. Database Reference Search

            *******************************************
            *                                         *
            *    1. Indicate how your query will      *
            *       be entered                        *
            *    2. Enter the search query (remember  *
            *       to type "quit" when finished)     *
            *    3. Check the output file for         *
            *       desired information               *
            *    4. Exit the editor with ":q"         *
            *    5. Save the file                     *
            *                                         *
            *******************************************


The program REFSCAN is designed specifically to allow the user to find and 
retrieve sequence references from the PIR or SWISS-PROT databases using 
either the accession number, the name (or portion thereof) or a 
bibliographic/functional reference.    Note that multiple sequence identifiers 
using the conjunctive "&" symbol may be employed for increased reference 
query specificity, for example:CYSTIC & FIBROSIS & HUMAN for HUMAN 
CYSTIC FIBROSIS. 
    When using this function, the user is required to identify the method 
by which the query will be entered (either the keyboard or through a UNIX 
file) as well as the exact Id numbers or reference words which must be 
searched for in the database.  Note that the user MUST type "quit" on the 
final line of his or her search string.  The word "quit" is used by the program 
as a termination flag and is essential for proper functioning of the program.  
After completing the sequence input, the user is required to provide a name 
for the output file (we suggest using the suffix ".scan" for consistency).  
The output from the reference search is essentially self-explanatory.  Note 
that the sequence of the peptide or protein is not included in the output.
    

******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of reference database to be scanned (PIR, SWISS-PROT or user-
defined).

2) Choice of file-update frequency (i.e. every 100, 500 or 1000 
sequences).

******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 16                                <--- <user input>


Refscan (Version 1.2)

How will you enter your search queries?

  1) Protein Name(s) entered from the keyboard
  2) Protein Name(s) taken from a file
  3) Protein Id(s) entered from the keyboard
  4) Protein Id(s) taken from a file
  0) Exit program

Enter a number (then press return)

>> 3                                    <--- <user input>

Enter one index code per line
Type QUIT (then press return) when done:

>> CCHU                                <--- <user input>

>> CCCZ                                <--- <user input>

>> quit                                <--- <user input>


Reading database file: /sirius/seqsee/databases/pir/*

Proteins scanned: 1000        
Proteins scanned: 2000        
~              
~
~


*****************************************************************

    Program......: refscan (version 1.2)
    Description..: Reference Retrieval Results
    Date.........: Thu Feb 16 14:00:01 1993

    Database.....: PIR (Intelligenetics Version)

    SearchStrings: CCHU
           CCHZ

*****************************************************************


>CCHU Cytochrome c - Human
ENTRY            CCHU        #Type Protein
TITLE            Cytochrome c - Human
DATE             #Sequence 30-Sep-1991  #Text   30-Jun-1992
PLACEMENT           1.0     1.0    1.0     1.0      1.0
SOURCE           Homo sapiens #Common-name man
ACCESSION        A31764\ A05676\ A00001
REFERENCE
    #Authors     Evans M.J., Scarpulla R.C.
    #Journal     Proc. Natl. Acad. Sci. U.S.A. (1988) 85:9625-9629
    #Title       The human somatic cytochrome c gene: two classes of
                 processed pseudogenes demarcate a period of rapid
                 molecular evolution.
    #Reference-number A31764
    #Accession   A31764
    #Molecule-type DNA
    #Residues    1-105 <EVA>
    #Cross-reference GB:M22877
REFERENCE            
    #Authors    Matsubara H., Smith E.L.
    #Journal    J. Biol. Chem. (1963) 238:2732-2753.
    #Reference-number A05676
    #Accession  A05676
    #Molecule-type protein
    #Residues   2-28;29-46;47-100;101-105 <MATS>
REFERENCE
    #Authors    Matsubara H., Smith E.L.
    #Journal    J. Biol. Chem. (1962) 237:3575-3576
    #Reference-number A00001   
    #Comment    66-Leu is found in 10% of the molecules in pooled
                protein.

GENETIC
    #Introns    57/1
SUPERFAMILY     #Name cytochrome c
KEYWORDS        acetylation\ electron transport\ heme\
                mitochondrion\ oxidative phosphorylation\     
                polymorphism\ respiratory chain
FEATURE
    2-105       #Protein cytochrome c 
(experimental)
                <MAT>\
    2           #Modified-site acetylated amino end 
(Gly)
                (in mature form) (experimental)\
    15,18       #Binding-site heme (covalent)\
    19,81       #Binding-site heme iron (his, Met) (axial ligands)
SUMMARY         # Molecular-weight 11749 #Length 105 #Checksum 
3247
SEQUENCE

*********************************************************

>CCHZ Cytochrome c - Chimpanzee (tentative sequence)
ENTRY            CCHZ        #Type Protein
TITLE            Cytochrome c - Chimpanzee (tentative sequence)
DATE             #Sequence 17-Mar-1987  #Text  30-Jun-1992
PLACEMENT          1.0        1.0    1.0      1.0        1.0
~
~
~
~
~

:q                                    <--- <user input>

Save this File? (Y/N):

>> y                                    <--- <user input>

Enter output filename.

>> cyt.scan                            <--- <user input>


17. File Viewer


            *******************************************
            *                                         *
            *    1. Indicate which database or file   *
            *       you wish to browse or edit        *
            *    2. Browse or edit the output file    *
            *    3. Exit the editor with ":q" or :wq  *
            *                                         *
            *******************************************


The File Viewer option permits the user to edit or view a variety of database 
files.  Through this program it is possible to locate or identify complete 
sequences within the PIR or SWISS-PROT databases, to locate sequences 
from the SEQBANK database, to view or edit sequences written as SEQFILEs 
and to view, edit or change the SEQSEE control file ("seqsee.parms").  In the 
case of viewing PIR database information, all sequence name and accession 
data is contained in a single 1 Mb file called PIRSEE.  In the case of viewing 
the SWISSPROT database, all sequence name and accession data is contained 
in a single 1 Mb file called SWISSEE.  Standard UNIX commands may be used 
for scrolling through or locating all character strings in any of the files. 
    When using this function, the user is only required to choose which 
database or file he or she desires to view.  No other input is required.  The 
database formats have been discussed in earlier sections of this manual and 
will not be elaborated upon here.  The SEQSEE control file ("seqsee.parms") is 
mostly self-explanatory although users wishing to know more about the file 
may consult the Appendix for the annotated version of the control file.


******************************************************************************

AVAILABLE OPTIONS IN "SEQSEE.PARMS":

1) Choice of sequence database to be viewed (PIRSEE, SWISSEE, SEQBANK or 
user-defined).

2) Access to "seqsee.parms" file to alter individual program options.


******************************************************************************


**********************************************************************
* Package...:                SEQSEE  Version 1.2 (c)                 *
* Authors...:       Robert Boyko / Leigh Willard / David Wishart     *
*                        Fred Richards / Brian Sykes                 *
* Location..:                 University of Alberta                  *
*               Protein Engineering Network of Centres of Excellence * 
**********************************************************************

      *** Preliminaries ***                *** Alignments ***
   1) Help                              10) Fast Alignment Search
   2) Enter/Edit a Sequence             11) Exhaustive Alignment Search
   3) Retrieve Sequence from Database   12) Align 2 or more sequences

      *** Structural Analysis ***            *** Scanning ***
   4) Sequence Statistics               13) Pattern Search
   5) Structure Prediction              14) Homology Search
   6) SEQSITE Pattern Search            15) Dot Plot
   7) Flexibility                       16) Database Reference Search
   8) Hydrophobic Moment                17) File Viewer
   9) Hydrophobicity                     0) EXIT SEQSEE

   Enter the number of the desired function

>> 17                                <--- <user input>


File Viewer (Version 1.2)

What would you like to browse?

1) User specified file
2) PIRSEE database
3) SWISSEE database
4) SEQBANK database
5) SEQSEE control file
0) Exit

Enter a number (then press return).

>> 4                                    <--- <user input>


#                    SEQBANK
#                 (REVISED DEC. 1992)
#
#                COPYRIGHT APRIL, 1993
#                  DAVID S. WISHART
#
#               DEPARTMENT OF BIOCHEMISTRY
#                UNIVERSITY OF ALBERTA
#                  EDMONTON, ALBERTA
#                     CANADA
#                     T6G 2H7
#
#    SEQBANK is a compilation of sequences and "consensus" secondary
#structure assignments of soluble proteins and peptides which have had
#their.....


>ACTIN (RABBIT SKELETAL)
#REFERENCE : KABSCH, W. ET AL., NATURE 347:37-44 (1990)
#REFERENCE : FLAHERTY, K.M. ET AL., PNAS 88:5041-5045 (1991)
#SEQBANK ID: 1
#BRKHAVN ID:
#PIR-NBR ID: ATRB
#SWISPRO ID: ACTS$RABIT
#RESOLUTION: 2.8
#R FACTOR  : 23.8
#FOLD CLASS: M
#NUM RESIDU: 375

DEDETTALVC DNGSGLVKAG FAGDDAPRAV FPSIVGRPRH QGVMVGMGQK
CCCCCCBBBB BBBCCBBBBB BBCCCCCCBB BBCCBBBBCC CCCCCCCCCC

DSYVGDEAQS KRGILTLKYP IEHGIITNWD DMEKIWHHTF YNELRVAPEE
CBBBCHHHHH HCCBBBBBCC BBBCBBBCCH HHHHHHHHHH HCCCCCCCCC

HPTLLTEAPL NPKANREKTM QIMFETFNVP AMYVAIQAVL SLYASGRTTG
CCBBBBBCHH HHHHHHHHHH HHHHHCCCCC BBBBBBCHHH HHHHCCCCBB

IVLDSGDGVT HNVPIYEGYA LPHAIMRLDL AGRDLTDYLM KILTERGYSF
BBBBCCCCBB BBBBBBCCBB BCCBBBBBCC CHHHHHHHHH HHHHHHCCCC

VTTAEREIVR DIKEKLCYVA LDFENAMATA ASSSSLEKSY ELPDGQVITI
CCHHHHHHHH HHHHHHCCCC CHHHHHHHHH HCCCCCCBBB BBCCCCBBBB

GNERFRCPET LFQPSFIGME SAGIHETTYN SIMKCDIDIR KDLYANNVMS
CCHHHHHHHH HHHCCCCCCC CCHHHHHHHH HHHHCCCHHH HHHHCCBBBB

GGTTMYPGIA DRMQKEITAL APSTMKIKII APPERKYSVW IGGSILASLS
CCCCCCCCHH HHHHHHHHHH HCCCCCBBBB CCHHHHHHHH HHHHHHHHCC

TFQQMWITKQ EYDEAGPSIV HRKCF
HHHHHCCCCH HHHHHCCHHH HHHCC
~
~
~
~
~
:q                                    <--- <user input>


X. HELP, ON-LINE DOCUMENTATION AND USER MANUALS

We have attempted to document as much of the SEQSEE program as possible.  
In addition to the manual you are presently reading, we have also provided 
a menu driven on-line help facility to act as a complement to the manual.  
The hardcopy version of the user manual may be freely copied and 
distributed to anyone interested in using SEQSEE.  If the original version of 
this user manual is lost, a "low-end" copy of this manual may be printed 
from the file called "manual" which is included with the program.
    The On-line Help facility is essentially a shortened version of 
the SEQSEE manual.  If the user presses the HELP key on the main SEQSEE 
menu, the following HELP menu will appear:


                SEQSEE HELP (Version 1.2)

                What would you like help with?

                1) Authors, version, copyright notice
                2) Introduction to SEQSEE
                3) Recommendations for the beginner
                4) Brief explanation of main menu
                5) Detailed explanation of main menu
                6) Tutorial
                7) Common questions from users
                8) Sequence input format
                9) Amino Acid Info
                0) Exit

            Enter a number (then press return):


The ten functions in the HELP menu can be summarized as follows:

1) AUTHORS, VERSION, COPYRIGHT NOTICE - Provides a short notice 
regarding the location and addresses of the authors, the current version 
number of SEQSEE and the limitations that the user must agree to before 
using the program.

2) INTRODUCTION TO SEQSEE - Provides background information and a brief 
history about SEQSEE, its development and potential applications.

3) RECOMMENDATIONS FOR THE BEGINNER - Provides a list of procedures 
that the first-time user should undertake to make the operation of SEQSEE as 
easy and as convenient as possible.

4) BRIEF EXPLANATION OF MAIN MENU - Provides a short synopsis of 
SEQSEE menu functions.

5) DETAILED EXPLANATION OF MAIN MENU - Provides a more in-depth 
explanation of all 18 menu functions along with examples and references.

6) TUTORIAL - Provides a sample SEQSEE tutorial taken directly from this 
manual.

7) COMMON QUESTIONS FROM USERS - Lists some of the more common 
questions (along with the answers) which have been fielded by the authors 
from a variety of first-time users.

8) SEQUENCE INPUT FORMAT - Describes the SEQFILE format, which is the 
required format for all sequence data in SEQSEE.

9) AMINO ACID INFO - Provides a brief synopsis of amino acid names, 
abbreviations, structures and molecular weights.

0) EXIT - Returns the user to the main SEQSEE menu.


Upon pressing any HELP menu number (except 0) the user will be presented 
with a file displaying the information on the desired subject.  The file may 
be scrolled through using the scrolling control keys (h,j,k,l) and it may be 
exited by simply typing ":q".  On exiting from the file the user is 
automatically returned to the HELP menu.  HELP may be exited by pressing 
"0" (and <return>) as indicated on the menu.


XI. DATABANKS, DATABASES AND LIBRARIES

The SEQSEE suite of programs is complemented by more than 40 different 
databases and library files.  These databanks contain a vast array of 
structural, functional and chemico-physical information that has been 
collected and tabulated from many different sources.  In addition, we have 
generated several new databases specifically for SEQSEE to allow it to 
perform a number of novel functions.  Obviously, without these libraries 
many of the more important features of SEQSEE would be rendered 
inoperable and, indeed, the program would essentially cease to be useful.  
Consequently we believe it is important to identify where these libraries are 
located, how they have been named and precisely what they contain.  Such 
an understanding will ultimately allow the user to update or modify these 
databanks whenever the need or want arises.


DATABASE LOCATION

The SEQSEE databanks are segregated into two categories or directories.  One 
is named DATABASES and the other is named LIB (for libraries). Databases 
are typically the larger of the two types of records with each file containing 
between 400 and 40,000 lines.  Currently, all SEQSEE database files reside in 
the directory called "seqsee/databases" and these include:

        1) EPISITE.db    4) SEQBANK.db      7) SEQSITE.db
        2) PHOSITE.db    5) SEQMOTIF1.db    8) SEQUENCES.db
        3) PIRSEE.db     6) SEQMOTIF2.db    9) SWISSEE.db

Typically, the PIR and SWISS-PROT sequence databases can be placed in the 
"seqsee/databases" directory.  Please note that these major sequence 
databases must be obtained separately by the user either through an 
anonymous FTP site (SWISS-PROT from ncbi.nlm.nih.gov and PIR from 
ftp.bchs.uh.edu) or through the Intelligenetics Corporation.  Because of disk 
space limitations, the PIR and SWISS-PROT databases cannot be included 
with the SEQSEE program when it is obtained through our anonymous FTP 
site.  The PIR and SWISS-PROT databases can 
be included in versions of SEQSEE sent by tape.


On the other hand, smaller data tables which typically contain physico-
chemical parameters are contained in the directory called "seqsee/lib" and 
these include:

    1) alexis.cys       12) homol.weights   23) mol.parspecvol
    2) alexis.norm      13) hphil.hopp      24) mol.surfarea
    3) cfas.data        14) hphob.cornet    25) mol.volume
    4) fleqsee.parms    15) hphob.eisen     26) mol.weights
    5) fracbur.parms    16) hphob.hplc      27) moment.parms
    6) gor.data         17) hphob.kyte      28) wt.align
    7) gor.orig.parms   18) hphob.rbo       29) wt.dayhoff
    8) hmom.cornet      19) kyte.parms      30) wt.levin
    9) hmom.eisen       20) membrane.parms  31) wt.mclach
    10) hmom.hplc       21) mol.asa         32) wt.rbo
    11) hmom.kyte       22) mol.fracbur     33) wt.unit

In the next few pages, we present a brief synopsis of what each of these files 
contains, beginning with the databases. 


DATABASE FILES

a) The PIR Database
One of the most important members of the SEQSEE database library is the 
National Biomedical Research Foundation's Protein Information Resource or 
the PIR (Dayhoff et al., 1983).  Among publicly available protein sequence 
databases it is by far the largest and most up-to-date.  The Sept. 1993 
release (version 34.0), which was used for the compilation of this manual, 
contains 44,890 protein and peptide sequence entries.  Most of these 
sequences are annotated with references and related information.  The PIR 
databank has been broken down into several subfiles containing older 
"annotated" entries and more preliminary "unannotated" entries. 

b) The SWISS-PROT Database
An equally important member of the SEQSEE database library is the 
European Molecular Biology Laboratory's (EMBL) SWISS-PROT.  Maintained 
and compiled by Amos Bairoch, this database is much more fully annotated 
and self-consistent than the PIR.  Accession codes are are also much more 
rational in their construction.  The only drawback to the SWISS-PROT 
database is the dearth of immunoglobulin and peptide-fragment sequences.  
Fortunately, these are still collected by those operating the PIR database.  
SEQSEE currently uses version 23.0 of the SWISS-PROT which contains more 
than 24,000 sequence records.

c) The SEQBANK Database
SEQBANK contains a complete listing of the names, the references, the 
sequences and the secondary structure of proteins which have had their 
structures determined through X-ray crystallography or NMR spectroscopy.  
A total of 267 proteins are included in SEQBANK.  Each of the sequences is 
essentially unique with none of the entries being more than 50% homologous 
to any other entry.  As it presently stands, SEQBANK contains 50,582 
residues of which 17,688 are in helices (35.0%), 14,248 in beta-strands 
(28.2%) and 18,646 in coil configurations (36.9%).

d) The SEQSITE Database
SEQSITE contains a list (including bibliographic entries) of approximately 
1000 sequence motifs and signature sequences which have been identified 
through extensive literature and computer searches.  Many of these 
sequence patterns have proven to be particularly effective in the 
identification of possible or probable enzymatic functions and in the location 
of active sites for a number of previously uncharacterized proteins.  We have 
attempted to adopt the nomenclature of Amos Bairoch's more fully 
annotated database called PROSITE (1990).

e) The EPISITE Database
ANTIGEN contains a relatively short list (including bibliographic entries) of 
recently identified B-cell and T-cell epitopes.  This may (or may not) assist in 
the identification of potential antigenic sites in a variety proteins of 
immunological interest.  Antigenic sites are stored separately from the 
SEQSITE motifs because they are less well defined and, consequently, much 
more common than "signature" sequences.


f) The PHOSITE Database
The PHOSITE database contains a list (including bibliographic entries) of 
potential phosphorylation sites which have been described in the literature.  
Phosphorylation sites are stored separately from the SEQSITE motifs because 
the are generally less well defined and have a tendency of "overwhelming" 
important or useful signature sequence data because of their high 
abundance.


g) The SEQMOTIF1 Database
This small database consists of only 150 entries.  Unlike SEQMOTIF2 
(described below), this database includes most of the longer and more 
complex sequence-structure patterns found in proteins of known structure.  
Many of these have been derived from extensive literature or 
crystallographic database searches.  Some of the SEQMOTIF1 entries include 
such well-known structural elements as the helix-loop-helix domain of 
calcium-binding proteins, the helix-turn-helix motif of DNA-binding proteins, 
and the nucleotide binding fold of kinases and phosphorylases.  Many other, 
lesser known, structural motifs are also included.


h) The SEQMOTIF2 Database
SEQMOTIF2 is a databank containing short sequence strings which have been 
found to have a high propensity for certain secondary structures.  Previous 
workers (Rooman and Wodak, 1988), using much smaller databases, had 
shown that a number of short sequence patterns were regularly found in 
association with certain secondary structures.  By using a far larger database 
(SEQBANK) we have been able to extend this relatively short list of Rooman 
and Wodak's to include almost 1000 simplified sequence "motifs" with their 
associated secondary structures.


i) The PIRSEE Database
This database is essentially a shortened version of the PIR reference 
database.  PIRSEE contains only the protein sequence name (or the first 50 
characters -- whichever comes first) and its corresponding accession 
number.  Note that PIRSEE (and not the complete PIR) is the database which 
is presented when using the "File Viewer" command.


j) The SWISSEE Database
This database is essentially a shortened version of the SWISS-PROT database.  
SWISSEE contains only the protein sequence name (or the first 50 characters 
-- whichever comes first) and its corresponding accession number.  Note that 
SWISSEE (and not the complete SWISS-PROT) is the database which is 
presented when using the "File Viewer" command.


k) The SEQUENCES Database
This represents a compilation of the sequences derived from the SEQBANK 
database.  Because the sequences in SEQBANK are not in a suitable format to 
be used for direct queries with SEQSEE we have re-assembled all 267 
sequences into 267 separate sequence files.  The names of these files have 
been chosen to permit easy identification of the protein sequences contained 
within them (ie. myo.seq = myoglobin).  All of the sequences have been 
extracted from either the PIR or SWISS-PROT databases directly.  Some of 
these sequence files have been edited to remove the leader sequences -- 
as required.


LIB FILES

a) ALEXIS.CYS - Contains amino acid and secondary structure content data 
for the prediction of folding classes among cysteine-rich proteins.

b) ALEXIS.NORM - Contains amino acid and secondary structure content data 
for the determination of folding classes among regular (low cysteine content) 
globular proteins.  The data is used in a modified predictive technique based 
on the approach of Chou and Zhang (1993).

c) CFAS.DATA - Contains recently updated Chou-Fasman parameters (Chou 
and Fasman, 1974; 1978) for secondary structure prediction.  The actual 
values were calculated from data in SEQBANK.

d) FLEQSEE.PARMS - Contains the B-factor values calculated by Karplus and 
Schulz (1985) used to calculate sequence flexibility in the program FLEQSEE.

e) FRACBUR.PARMS - Contains data on the expected fraction of buried 
residues in soluble globular proteins based on the data  compiled by Janin 
(1979).

f) GOR.DATA - Contains recently re-derived parameters for secondary 
structure prediction based on the GOR (information theory) algorithm 
(Garnier et al., 1978)

g) GOR.ORIG.PARMS - Contains original parameters (Garnier et al.,1978) for 
secondary structure prediction using the GOR (information theory) algorithm.

h) HMOM.CORNET - Contains normalized hydrophobicity values determined 
by Cornette et al. (1987) which are reputed to be particularly good for 
calculating hydrophobic moments.

i) HMOM.EISEN - Contains hydrophobicity values calculated by Eisenberg and 
co-workers (1984) and normalized by Cornette et al. (1987) for the purpose 
of calculating hydrophobic moments.

j) HMOM.HPLC - Contains normalized hydrophobicity values calculated by 
Parker et al. (1986) and modified by Cornette et al. (1987) for the purpose of 
calculating hydrophobic moments.

k) HMOM.KYTE - Contains normalized hydrophobicity values calculated by 
Kyte and Doolittle (1982) and modified by Cornette et al. (1987) for the 
purpose of calculating hydrophobic moments.

l) HOMOL.WEIGHTS - Contains weighting parameters (multipliers) used by 
ALEXIS in calculating secondary structure via the homology method.

m) HPHIL.HOPP - Contains the original antigenicity/hydrophilicity values 
calculated by Hopp and Woods (See Cornette et al. (1987) for more 
information). 

n) HPHOB.CORNET - Contains original, unscaled hydrophobicity values 
determined by Cornette et al. (1987).

o) HPHOB.EISEN - Contains original, unscaled hydrophobicity values 
calculated by Eisenberg and co-workers (1984).

p) HPHOB.HPLC - Contains unscaled hydrophobicity values calculated by 
Parker et al. (1986).

q) HPHOB.KYTE - Contains original, unscaled hydrophobicity values calculated 
by Kyte and Doolittle (1982).

r) HPHOB.RBO - Contains original, unscaled hydrophbocity values calculated 
by R. Boyko and D. Wishart (unpublished).

s) KYTE.PARMS - Contains original hydrophobicity values calculated by Kyte 
and Doolittle (1982) which are used in the Klein algorithm (1985) to 
calculate the location of membrane helices.

t) MEMBRANE.PARMS - Contains modified Kyte-Doolittle hydrophobicity 
values which can used by the Klein et al. (1985) algorithm to calculate the 
location of membrane spanning regions.

u) MOL.ASA - Contains the accessible surface area of all 20 amino acids 
measured in square angstroms as given by Richards (1977).

v) MOL.FRACBUR - Contains data on the expected fraction of buried residues 
in soluble globular proteins based on the data  compiled by Janin (1979).

w) MOL.PARSPECVOL - Contains the partial specific volumes of all 20 amino 
acids (Creighton, 1984).

x) MOL.SURFAREA - Contains the surface area of all 20 amino acids measured 
in square angstroms as cited by Richards (1977).

y) MOL.VOLUME - Contains the molecular volumes of all 20 amino acids 
measured in cubic angstroms as cited by Richards (1977).

z) MOL.WEIGHTS - Contains the molecular weights of all 20 amino acids 
measured in daltons (Creighton, 1984).

aa) MOMENT.PARMS (CFAS / BHYDRO / HHYDRO) - Contains modified 
hydrophobicity and secondary structural propensity values to calculate the 
hydrophobic moment and its contribution to secondary structure in the 
program called MOMENT (See Eisenberg et al. (1984) for more details).

bb) WT.ALIGN - An amino acid exchange matrix which was specifically 
developed for the FAST_ALIGN program

cc) WT.DAYHOFF - An exchange matrix developed by Dayhoff and co-
workers (1983) based on mutational replacement frequencies observed for a 
large number of proteins in the PIR database.  Also called the PAM 250 
matrix.  This is the most commonly used matrix is sequence alignments 
despite its many shortcomings.

dd) WT.LEVIN - An exchange matrix developed by Levin et al. (1986) for the 
purposes of secondary structure prediction based on sequence homology.

ee) WT.MCLACH - An amino acid exchange matrix developed by Andrew 
McLachlan (1971) based on the observed propensity of residues to 
substituted for one another as observed in crystal structures.  One of the 
best amino acid exchange matrices available, but unfortunately this is not 
widely known.

ff) WT.RBO - An improved exchange matrix developed by Robert Boyko 
(unpublished) for the purposes of secondary structure prediction based on 
sequence homology.

gg) WT.UNIT - A matrix used in alignments and homology searches where 
only the main diagonal contains non-zero entries.  The Unity matrix should 
be used for crude searches only.


XII. SEQSEE FILE STRUCTURES

There are several file structures that have been adopted for the storage and 
manipulation of files in SEQSEE.  Most sequence files accessed or entered by 
the user are written and stored as SEQFILEs.  On the other hand, library and 
database files are stored in a file specific format.  These database-specific 
formats have been designed to make the file contents both readable and 
accessible.

a) THE SEQFILE FORMAT
The SEQFILE structure is basically composed of a file marker (the symbol 
">"), and a title (sequence name) on the first line of the record.  The sequence 
(in single letter code) appears on all subsequent lines in the SEQFILE record.  
Observe that all SEQFILE sequences are stored in upper case letters but this 
is for only done for enhanced readability.  All programs in SEQSEE are 
capable of reading these files regardless of whether they contain upper or 
lower case letters.  An example of a SEQFILE with a single sequence of 108 
residues is presented below:

    >Title: human_thioredoxin
    MVKQIESKTAFQEALDAAGDKLVVVDFSATWCGPCKMINPFFHSLSEKYS
    NVIFLEVDVDDCQDVASECEVKCTPTFQFFKKGQKVGEFSGANKEKLEAT
    INELV

Note that it is possible for more than one sequence to appear in any given 
SEQFILE as seen here:


    >Title: human_thioredoxin
    MVKQIESKTAFQEALDAAGDKLVVVDFSATWCGPCKMINPFFHSLSEKYS
    NVIFLEVDVDDCQDVASECEVKCTPTFQFFKKGQKVGEFSGANKEKLEAT
    INELV

    >Title: chimp_thioredoxin
    MVKHIESKTAFQEALDAAGDKLVLVDFSATWCGPCKMINPFFHSLSEKYS
    NVIFLEVDVDDCQDVASECEVKCTPTFQFYKKGQKVGEFSGANKEKLEAT
    INELV

    >Title: gorilla_thioredoxin
    MVKQIESKTAFQEALDAAGDKLLVVDFSATWCGPCKMINPFFHSISEKYS
    NVIFLEVDVDDCQDVASECEVKCTPTFQFFKRGQKVGEFSGANKEKLEAT
    INELV

    >Title: gibbon_thioredoxin
    MVKHIESKTAFQEALDAAGDKLVLVDFSATWCGPCKMINPFFHSLSEKYS
    NVIFLEVDVDDCQDVASECEVKCTPTFQFYKKGQKVGEFSGANKEKLEAT
    INELV


These multiple sequence SEQFILES may be constructed from the output of 
the "Retrieve Sequence from Database" option (#3 on the menu) or through 
cutting and pasting other files to one another through the "vi" editor.  This 
latter procedure is best done outside the SEQSEE programming environment.
    As previously mentioned, there are a number of other file formats and 
file structures that are quite different from the SEQFILE format.  These are 
typically associated with the larger database files.  Examples are provided 
below:


b) THE PIR "Sequence" FORMAT
This is a much more compact file structure than what is normally presented 
as the 'typical' PIR format.  In this particular format only the protein 
identification code, the protein name and the protein sequence (in lower 
case) is included in the record.  This "new" file format greatly accelerates the 
searching and aligning processes in SEQSEE.


>CCHU    Cytochrome c - Human
gdvekgkkifimkcsqchtvekggkhktgpnlhglfgrktgqapgysytaanknkgiiwge
dtimeylenpkkyipgtkmifvgikkkeradliaylkkatnel

>CCCZ    Cytochrome c - Chimpanzee
gdvekgkkifimkcsqchtvekggkhktgpnlhglfgrktgqapgysytaanknkgiiwge
dtimeylenpkkyipgtkmifvgikkkeradliaylkkatnel

>CCMQR Cytochrome c - Rhesus macaque (tentative sequence)
gdvekgkkifimkcsqchtvekggkhktgpnlhglfgrktgqapgysytaanknkgiiwge
dtimeylenpkkyipgtkmifvgikkkeradliaylkkatnel
~
~
~

c) THE PIR "Reference" FORMAT
This format is relatively self-explanatory and is well documented in the 
materials that accompany the PIR database.  The example presented here is 
simply for reference purposes only.


>CCHU Cytochrome c - Human
ENTRY            CCHU        #Type Protein
TITLE            Cytochrome c - Human
DATE            #Sequence 30-Sep-1991  #Text   30-Jun-1992
PLACEMENT           1.0     1.0    1.0     1.0      1.0
SOURCE        Homo sapiens #Common-name man
ACCESSION        A31764\ A05676\ A00001
REFERENCE
    #Authors      Evans M.J., Scarpulla R.C.
    #Journal    Proc. Natl. Acad. Sci. U.S.A. (1988) 85:9625-9629
    #Title        The human somatic cytochrome c gene: two classes of
                    processed pseudogenes demarcate a period of rapid
                    molecular evolution.
    #Reference-number A31764
    #Accession   A31764
    #Molecule-type DNA
    #Residues    1-105 <EVA>
    #Cross-reference GB:M22877
REFERENCE            
    #Authors    Matsubara H., Smith E.L.
    #Journal    J. Biol. Chem. (1963) 238:2732-2753.
    #Reference-number A05676
    #Accession   A05676
    #Molecule-type protein
    #Residues    2-28;29-46;47-100;101-105 <MATS>
REFERENCE
    #Authors    Matsubara H., Smith E.L.
    #Journal      J. Biol. Chem. (1962) 237:3575-3576
    #Reference-number A00001   
    #Comment    66-Leu is found in 10% of the molecules in pooled
                    protein.
GENETIC
    #Introns        57/1
SUPERFAMILY        #Name cytochrome c
KEYWORDS        acetylation\ electron transport\ heme\
            mitochondrion\ oxidative phosphorylation\     
            polymorphism\ respiratory chain
FEATURE
    2-105                #Protein cytochrome c 
(experimental)
                        <MAT>\
    2                #Modified-site acetylated amino end 
(Gly)
                      (in mature form) (experimental)\
    15,18                #Binding-site heme (covalent)\
    19,81                #Binding-site heme iron (his, Met) 
(axial 
                     ligands)
SUMMARY        # Molecular-weight 11749 #Length 105 #Checksum 
3247
~
~

d) THE SEQBANK FORMAT
This is the format used for storage of sequence and structural information of 
"solved" protein and peptide structures.  The first line of each sequence 
record contains the name of the protein and its species of origin.  The 
subsequent lines mean the following:

    #REFERENCE : Current or reasonably definitive reference to the 
structure
    #SEQBANK ID: SEQBANK ID number
    #BRKHAVN ID: Brookhaven Protein Databank ID
    #PIR-NBR ID: PIR accession number
    #SWISPRO ID: SWISS-PROT ID code
    #RESOLUTION: Resolution in Angstroms
    #R Factor  : Refinement Factor in Percent
    #FOLD CLASS: protein folding class (B = all beta protein, A = all 
             helical structure, M = mixed alpha helix/beta strand, 
             AB = alpha-beta barrel, CB, CA and CM cysteine rich beta,
             alpha and mixed structures).
    #NUM RESIDU: number of amino acids


 The remaining part of the file contains the complete sequence of the protein 
(in single letter amino acid code) and its secondary structure assignment as 
determined by X-ray crystallography or NMR.  Note that in these secondary 
structure records the following convention is used: H=helix, B=beta strand 
and C=coil.


>ACTIN (RABBIT SKELETAL)
#REFERENCE : KABSCH, W. ET AL., NATURE 347:37-44 (1990)
#REFERENCE : FLAHERTY, K.M. ET AL., PNAS 88:5041-5045 (1991)
#SEQBANK ID: 1
#BRKHAVN ID:
#PIR-NBR ID: ATRB
#SWISPRO ID: ACTS$RABIT
#RESOLUTION: 2.8
#R FACTOR  : 23.8
#FOLD CLASS: M
#NUM RESIDU: 375

DEDETTALVC DNGSGLVKAG FAGDDAPRAV FPSIVGRPRH QGVMVGMGQK
CCCCCCBBBB BBBCCBBBBB BBCCCCCCBB BBCCBBBBCC CCCCCCCCCC

DSYVGDEAQS KRGILTLKYP IEHGIITNWD DMEKIWHHTF YNELRVAPEE
CBBBCHHHHH HCCBBBBBCC BBBCBBBCCH HHHHHHHHHH HCCCCCCCCC

HPTLLTEAPL NPKANREKTM QIMFETFNVP AMYVAIQAVL SLYASGRTTG
CCBBBBBCHH HHHHHHHHHH HHHHHCCCCC BBBBBBCHHH HHHHCCCCBB

IVLDSGDGVT HNVPIYEGYA LPHAIMRLDL AGRDLTDYLM KILTERGYSF
BBBBCCCCBB BBBBBBCCBB BCCBBBBBCC CHHHHHHHHH HHHHHHCCCC

VTTAEREIVR DIKEKLCYVA LDFENAMATA ASSSSLEKSY ELPDGQVITI
CCHHHHHHHH HHHHHHCCCC CHHHHHHHHH HCCCCCCBBB BBCCCCBBBB

GNERFRCPET LFQPSFIGME SAGIHETTYN SIMKCDIDIR KDLYANNVMS
CCHHHHHHHH HHHCCCCCCC CCHHHHHHHH HHHHCCCHHH HHHHCCBBBB

GGTTMYPGIA DRMQKEITAL APSTMKIKII APPERKYSVW IGGSILASLS
CCCCCCCCHH HHHHHHHHHH HCCCCCBBBB CCHHHHHHHH HHHHHHHHCC

TFQQMWITKQ EYDEAGPSIV HRKCF
HHHHHCCCCH HHHHHCCHHH HHHCC
~
~
~
~

e) THE SEQSITE FORMAT
This is the format used for the SEQSITE file.  This file contains a fairly 
complete listing of short sequence motifs and their putative functions.  A 
reference is provided to permit a follow-up of the motif's suspected function.  
The same conventions regarding wildcard characters, end-of-sequence 
characters and so on are used in this file as in the Pattern Search function. 


LIBRARY OF SEQUENCE MOTIFS

>*[KRH][DEN]EL$
 SMITH M.J. ET AL., EMBO J. 8:3581-3586 (1989)
 ENDOPLASMIC RETICULUM DIRECTING SEQUENCE

>*RGD*
 RUOSLAHTII E. ET AL., CELL 44:517-518 (1986)
 FIBRONECTIN ADHESION SITE

>*CDPGYIGSR*
 GRAF, J. ET AL., CELL 48:989-996 (1987)
 MAMMAL LAMININ DOMAIN III B1 CHAIN CELL ATTACHMENT 
SITE

>*[DE][DE]*SG*G*
 BOURDON M.A. ET AL., PNAS 84:3194-3198 (1987)
 GLYCOSAMINOGLYCAN BINDING SITE

>*[DE][DE]**SG*G*
 BOURDON M.A. ET AL., PNAS 84:3194-3198 (1987)
 GLYCOSAMINOGLYCAN BINDING SITE

>*[DE]*[DE]*SG*G*
 BOURDON M.A. ET AL., PNAS 84:3194-3198 (1987)
 GLYCOSAMINOGLYCAN BINDING SITE
~
~
~

Other file structures exist in SEQSEE but those presented above represent the 
most important or the most commonly encountered record types.  Please feel 
free to browse through the other databases and library files -- but try to 
avoid altering their contents in any substantial way.  If you do find an error 
(either in content or in structure), please try to notify us as soon as possible.  
We will try to make the corrections in time for the next release of the 
program.


XIII. MANIPULATING AND EDITING FILES ON IRIS AND SUN WORKSTATIONS

IRIS and SUN Workstations operate under the UNIX operating system.  This 
particular operating system is fast becoming an industry standard because of 
its extensive support and the fact that it can be customized to suit the needs 
of almost any user or programmer.  Unfortunately, it is NOT the most 
user-friendly of operating systems.  
    The UNIX operating system is based on a file or directory hierarchy 
which essentially resembles a tree structure.  At the top of the tree is the 
main directory called "/home".  Moving up or down the tree is accomplished 
by changing directories (using the "cd" command).  The program SEQSEE and 
its associated subroutines resides in the directory "/home/local/seqsee".  In 
this location SEQSEE is actually accessible to all users from their default 
directory when they initially login.  The SEQSEE program may be started 
simply by typing "seqsee".
    To help the uninitiated with some of the intricacies of the UNIX 
system we present the following brief review of some of the more useful 
commands for directory and file manipulation in this "unified" operating 
environment.  Users familiar with the UNIX operating system should skip 
this section.


a) MOVING AND MAKING DIRECTORIES

cd               Places user in home directory (typically "/home/usr").
cd ..            Places user in parent directory (the next highest 
                 directory in the tree).
cd mydir         Changes current directory to "mydir".
cd bigdir/smalldir    Changes or moves user to the directory "smalldir" which 
                 is in "bigdir"
mkdir dir        Creates a new subdirectory called "dir"
pwd              Print Working Directory -- indicates which directory the 
                 user is in.
ls               List files in current directory

b) MOVING AND MAKING FILES

vi file1        Creates the file "file1" and enters the user into the 
                vi editor (see later).
cp file1 file2  Copies "file1" to "file2".  A new "file2" is automatically 
                created.
mv file1 file2  Moves (renames) "file1" to "file2".
rm file1        Removes or deletes "file1" from the current directory.

c) VIEWING A FILE

vi file1        View/Create/Visual Edit "file1".
cat file1       Catalogues or lists contents of "file1" to screen.
grep *** file1  Searches "file1" for the pattern "***".


d) EDITING COMMANDS FOR THE "vi" EDITOR

The "vi" editor is the UNIX visual editor.  It may be started by simply typing 
"vi filename".  This editor is not particularly sophisticated compared to most 
editors available on even small microcomputers, but it is a universal UNIX 
editor and for that reason it is important to understand its command 
structure and mnemonic devices.  Following is a list of the more useful "vi" 
commands:

h        moves cursor left
j        moves cursor up
k        moves cursor down
l        moves cursor right
20+      moves cursor forward 20 lines
20-      moves cursor backwards 20 lines
G        moves cursor to end of file
^g       displays line number where cursor is placed
:n       moves cursor to line number "n"
/word/   searches for the next occurrence of the character string "word"
x        deletes character where cursor is placed
25dd     deletes 25 lines starting with current line (where the cursor is 
         located)
r p      replaces current character with the letter "p"
u        undoes previous editor command
.        repeats last edit command
i        enters into insert mode
esc      exits insert mode (where esc is the escape key)
:q!      quits editing, does not save changes
:wq      saves changes and quits editing
:q       quits editing if no changes made

    *Note that SEQSEE is constructed so that when analyses are completed, 
the program automatically prints the results to the screen while 
simultaneously putting the user into the "vi" editor.  In this way the user can 
manipulate the files in any way he or she wishes.  In most cases the user 
will only want to inspect the files and this may be done simply by scrolling 
through the output with the cursor control keys (hjkl).  The output or 
"results" file can be exited simply by typing ":q" or ":wq" which will then 
return the user to the next menu in SEQSEE.


XIV. PRINTING FILES FROM SEQSEE

Nearly all results produced from a SEQSEE sequence analysis are saved to a 
user-designated file.  These files may be edited either within SEQSEE or 
outside the program using the "vi" editor.  Printing files to a printer is a 
very system dependent operation and if the user is unsure of how to produce a 
hardcopy output from their terminal, they should consult with their 
system manager or local computer "expert" for more details.


XV. TROUBLE SHOOTING


A. QUESTIONS AND ANSWERS ABOUT SEQSEE


Q.  I have logged into my account and wish to use SEQSEE. The system 
    administrator has assured me that SEQSEE is installed and that I have 
    full access to it.  Tell me what steps I should take to most effectively 
    use this program.

A.  1) Although there are no absolute windowing requirements for 
    SEQSEE, it is recommended that your window be at least 80 characters 
    wide and 40 or more lines in length.  This will permit easy viewing of 
    analytical output, help files and function menus.  Having more than 
    one window on the screen will also allow you to look at intermediate 
    results while the program is running in another window.  Therefore 
    we strongly suggest that a "two-window" environment be used.

    2) Create your own directory for running SEQSEE and make this your 
    current directory (use the commands: mkdir seqsee; cd seqsee).  This 
    will help in the organization of your input files and results.


    3) Copy the control file "seqsee.parms" into the "seqsee" directory you 
    have just created.  Your system administrator should be able to tell 
    you where he/she has placed this file on the system.  Typically the 
    command to perform this operation is:
        

                 cp /usr/local/seqsee/seqsee.parms .

    4) If you already have sequence files, it is wise to copy them into your 
    "seqsee" directory as well.  Try to ensure that they are in the proper 
    format (see the sections on SEQFILE formats).  Note that you can 
    always use SEQSEE to create new sequence files which conform to the 
    SEQFILE format.

    5) Once you have complete all of these operations you are ready to use 
    SEQSEE.


Q.  I have typed "seqsee" and I don't get the main menu.  What's wrong?

A.  1) Have you typed "seqsee" correctly? (remember S-E-Q-S-E-E)
    2) Have all of the installation programs been run successfully?
    3) You might be in the wrong directory.  Check for a program called 
    "seqsee" in either your current directory or in some public place on 
    your system.  If you can't find it, ask your system administrator 
    where "seqsee" is supposed to reside.
    4) Check for the control file "seqsee.parms" either in your current 
    directory or in some public place on your system.  Check for any 
    possible corruptions to "seqsee.parms".


Q.  What exactly is the function of the file "seqsee.parms"?

A.  The file "seqsee.parms" contains all of the default parameters that 
    SEQSEE needs to run properly.  When it is run, SEQSEE will first check 
    your current directory to see if you have a "seqsee.parms" file.  If not, 
    it will then use the default "seqsee.parms" which the installation 
    program had previously created.  The control file should be relatively 
    self-explanatory and is also well documented in the manual. 


Q.   What are the most common items that could be changed in the 
    "seqsee.parms" file?

A.  There are many different sets of parameters ranging from 
    hydrophobicity values to similarity matrices which you may wish to 
    experiment with by changing their default values in the 
    "seqsee.parms" file.  Before doing so, however, we recommend 
    that you read up on the section regarding databases and library files.  
    Be aware that if you change similarity matrices (such as changing the 
    "wt.rbo" matrix to the "wt. dayhoff" matrix) you will also have to 
    change other parameters such as "gap penalty" and "gap size penalty".  
    There are also several "print" flags in the "seqsee.parms" file.  These 
    can be turned on or off depending on whether you want terse or 
    verbose output.  Many of the options in the "seqsee.parms" file are 
    strictly for the programmer or for those users who already have an 
    in-depth knowledge of how the algorithms work.


Q.  While running SEQSEE, the screen was cleared and I was placed in 
    some kind of editor.  How do I get out of this mode?

A.  SEQSEE uses the "vi" editor whenever it has results to show to the 
    user.  To exit this editing mode, type ":q" to exit without saving 
    changes or type ":wq" to exit  with all changes saved.


Q.  Is there some way to turn this fullscreen editing feature off?

A.  Yes.  Some people may not like this feature, others may be on 
    terminals which do not support "vi" and they would much prefer to 
    use the commands "more" or "cat" to view their results.  Either way, 
    you can turn off "vi" by changing the "vi" flag from 1 to 0 in the 
    "seqsee.parms" file.


Q.  What should I do if I want to get out of something that I mistakenly 
    got into?  For example, I am doing an exhaustive alignment search and 
    I realize I am using the wrong sequence as a query.

A.    The easiest way to get out of a predicament is to press the "control" 
    and "c" keys simultaneously.  This will kill the operation and take you 
    back to the main seqsee menu.  Another more drastic method of 
    terminating an operation is via the UNIX "kill" and "ps" commands (see 
    your UNIX manual for details).  When taking this form of action there 
    will likely be one or more temporary files created which should be 
    removed as soon as conveniently possible (these files typically contain 
    multi-digit numbers and a ".tmp" suffix).


Q.    Is there some way for me to check on the progress of a particular 
    search (or alignment) without having to wait for the search to end?

A.    Yes.  Most modules in SEQSEE keep intermediate results, especially 
    those functions which can take a very long time to run.  These results 
    are stored in a continually updated file appended with the suffix 
    ".tmp" or ".tmp.ids" (eg. 653120924.tmp).  While SEQSEE is running in 
    one window, you may go to another window and type:  "more  *.tmp" 
    to see these intermediate results.


Q.    When I save my search results in file "X", I also have a file in my 
    directory called "X.ids".  What is the purpose of this file?

A.    This "X.ids" file contains only the ID codes and protein names from the 
    results file.  Some of these results files can get pretty big and so, to 
    save you some time, SEQSEE provides a truncated listing of this file.  
    This "X.ids" file may be particularly useful if you are only interested in 
    viewing the names of proteins (as opposed to complete alignments) 
    which appeared in a particular search.


Q.    Now that I have my results, how do I print them out?

A.    The standard UNIX command to print a text file is "lpr <filename>".  
    However you should check with your system administrator to be sure 
    you know how to print text files.  Many facilities have their own 
    printing macros or are connected to certain specialized printers or 
    plotters which may require very specific commands.


Q.    Can I run SEQSEE in the background?

A.    Yes.  Running SEQSEE in the background allows you to start a search 
    and to continue that search after you have logged out or while other 
    users are logged in.  Once you have decided your search is running 
    properly and you wish to put the search into the background, press 
    the "control" and "z" keys simultaneously to temporarily stop the job.  
    Then type "bg".  This command restarts the program and sets it 
    running in the background.  You are now safe to log out and go home.


Q.    Can I change the priority at which SEQSEE is running?  For example, I 
    want my exhaustive alignment job to run only if no one else needs the 
    computer.

A.    Yes.  However, you can only change the priority after the job starts 
    running.  If you startan exhaustive alignment and wish to lower its 
    priority, issue the command:

                ps -ux | grep nw_align


    If you are running on a Silicon Graphics machine, use the following:

                ps | grep nw_align

    Then issue the UNIX command "renice 19 PID" where PID is the 
    process ID of the job that you wish to have the priority changed.  The 
    PID number can be found in the first column.  Please ask your system 
    administrator if you are unsure how this works.

Q.    What does a "core dumped" message mean?

A.    This means that the program has crashed either due to a programming 
    bug or to a boundary limit being exceeded.  This may also happen if 
    the system has run out of "swap space" (See the UNIX system manual 
    for details).  Sometimes a swap space problem will be indicated by an 
    "out of memory" error message as well.


Q.    What can I do if I get a "core dumped" message?

A.  You may do two things.  First, try to check your seqsee control file 
    ("seqsee.parms") to see if any values have been altered or if they 
    differ substantially from the default parameters presented in the 
    manual.  Second, you may try varying your input to see if the problem 
    only occurs with your particular set of data.  If you are the system 
    administrator and have some programming knowledge and you find 
    that none of the above suggestions work, you may wish to re-compile 
    the corrupted module and to attempt to debug the program using 
    "dbx" to identify which line caused the program to crash.


B. SEQSEE CHECKLIST (VERSION 1.2)

In addition to the HELP features offered on-line, here's a list of items that 
should be checked if, for some reason, you have any difficulty in obtaining 
results from SEQSEE.  This little list is not guaranteed to solve all of your 
problems but it should be quite helpful -- especially for first-time users.

1) Have you read the manual?
2) Are you in the right directory? (home/usr/seqsee or some similar 
variation)
3) Have you spelled "seqsee" correctly?
4) Have you pressed the <return> key after entering your response?
5) Have you answered the computer query correctly? (ie. entered a number 
when a number was requested and a filename when a filename was 
requested)
6) Have you typed "$" to end your sequence entry?
7) Have you typed "quit" to end your filename or pattern entries?
8) Have you typed ":q" or ":wq" or ":q!" to exit the "vi" editor?
9) Are you using the proper "vi" editor commands?
10) Have you checked that your input filename is spelled correctly?
11) Does you input file exist or has it been deleted or placed in another 
directory?
12) Does your sequence contain any unusual or non-standard characters?
13) Is your input sequence file in the standard SEQFILE format?
14) Have you or someone else changed something in the "seqsee.parms" file 
that wasn't supposed to have been changed?
15) Are you in the right program?


C. NOTES FOR THE SYSTEM ADMINISTRATOR/PROGRAMMER
 REGARDING SEQSEE


1) Each function in SEQSEE is its own separate program with its own 
directory and Makefile.  The program called "seqsee" is only a driver 
program which calls other programs and which shows or saves the results.  
The source code for the driver is contained in "init.c", "calc.c" and "main.c".

2) ALEXIS is the program which performs the comprehensive analysis of 
secondary structure.  Just like SEQSEE it, too, calls all the modules which 
begin with "a_" in order to compile its results.

3) The source code which was found to be common to most modules was 
placed in a directory called "libc".  UNIX dependent routines are found in 
"libc/unix.lib.c".  Most modules will compile independently of UNIX if the call 
to "get_date" is taken out.

4) The following naming conventions were adopted for the source code 
within each of the modules:

    1) *.h - global variables for the module.
    2) main.c - main program for the given function.
    3) init.c - routine to read in parameters for "seqsee.parms".
    4) menu.c - routine to produce menu I/O specific to the function.
    5) calc.c - routine to perform the calculations specific to the function.
    6) print.c - routines which handle the function output.
    7) dbase.c - routines to read local databases.


5) The following naming conventions were adopted for the non-source code 
files within the modules:

    1) test.run - sample input data (ie. function < test.run)
    2) output - output from "test.run"
    3) output.ids - terse version of the output
    4) seqsee.parms - parameters required by the function
    5) *.seq - input sequence file
    6) wt.* - similarity scoring matrix


6) Most of the important boundary limitations for any particular algorithm 
can be found in the ".h" file of the corresponding directory.  For example 
there is a limit to the size of an input sequence (2000 residues).  It should be 
a fairly simple matter to change a boundary and then to type "make" to re-
compile that particular module.

7) Each source code directory has its own "seqsee.parms" file for testing 
purposes.

8) Most of the source code should be fairly straight forward to read and/or 
understand.  The one exception appears to be the "align.c" routine.  In 
attempting to make this algorithm as efficient as possible, we ended up 
sacrificing some of its programming clarity.


Recommended Readings


Fasman, G.D. (ed.) "Prediction of Protein Structure and the Principles of 
Protein Conformation". New York (Plenum), 1989.
The most comprehensive treatise on protein structure prediction available.  
Filled with dozens of contributions and reviews from many of the foremost 
experts in the field.  An excellent introduction to the subject.  Highly 
recommended for both novice and expert alike.


Doolittle, R.F. (ed.) "Molecular Evolution: Computer Analysis of Protein 
and Nucleic Acid Sequences". Methods in Enzymology, Vol. 183, 1990.
A fine complement to Fasman's work.  This is an equally comprehensive 
review with in depth descriptions and useful assessments of numerous 
sequence analysis algorithms and programs.  Provides good summaries of 
how the field has developed and where the field is likely to go.  An 
excellent source-book for methods and ideas for sequence alignment, 
sequence assessment and cladistic deconvolution.


Doolittle, R.F., "Of URFs and ORFs: A Primer of How to Analyze Derived Amino 
Acid Sequences". California (University Science Books), 1987
A gem of a book.  One of the easiest to understand "how-to" references you 
can find.  Among the most informative texts on the subject of protein 
sequence analysis.  Get it before it goes out of print.


Gribskov, M.R. and Devereux, J. (ed.) "Sequence Analysis Primer", New York 
(W.H. Freeman and Co.) 1992.
An excellent, up-to-date account of both DNA and protein sequence analysis.  
It is filled with hundreds of illustrations and dozens of "real-life" examples.  
It also provides a very useful appendix with information on software, 
databases, terminology and extensive references.  This book should be in 
everyone's personal library.


Schulz, G.E., A Critical Evaluation of Methods for Prediction of Protein 
Secondary Structures, Ann. Rev. Biophys. and Biophys. Chem. 17, 1-22 
(1988).
A very fair-minded critique of secondary structure prediction methods.  
Offers a quick and easy-to-read introduction to the potential applications 
and probable short-comings of protein structure prediction.


Taylor, W.R., Pattern Matching Methods in Protein Sequence Comparison and 
Structure Prediction, Protein Eng. 2, 77-86 (1988).
An extremely informative review of the field as it stood in 1988.  Well 
written and easy to read.  Offers some excellent insights into a number of 
newer (and older) methods in protein sequence analysis.  Highly 
recommended.


General References


Altschul, S.F., Gish, W., Miller, W., Myers, E.W., & Lipman, D.J., Basic Local 
Alignment Search Tool, J. Mol. Biol. 215, 403-410 (1990).

Barton, G.J. & Sternberg, M.J.E., A Strategy for the Rapid Multiple Alignment 
of Protein Sequences, J. Mol. Biol. 198, 327-337 (1987).

Chiche, L., Gregoret, L.M., Cohen, F.E. & Kollman, P.A., Protein Model Structure 
Evaluation Using the Solvation Free Energy of Folding, Proc. Natl. Acad. Sci. 
(USA) 87, 3240-3243 (1990).

Chothia, C., Structural Invariants in Protein Folding, Nature 254, 304-308 
(1975).

Chothia, C., The Nature of the Accessible and Buried Surfaces in Proteins, J. 
Mol. Biol. 105, 1-14 (1976).

Chou, K.-C. and Zhang, C.-T., A Correlation-Coefficient Method to Predicting 
Protein-Structural Classes from Amino Acid Compositions, Eur. J. Biochem. 
207, 429-433 (1992)

Chou, P.Y. & Fasman, G.D., Empirical Predictions of Protein Conformation, Ann. 
Rev. Biochem. 47, 251-276 (1978).

Chou, P.Y. & Fasman, G.D., Prediction of Protein Conformation, Biochemistry 
13, 222-245 (1974).

Cornette, J.L., Cease, K.B., Margalit, H., Spouge, J.L., Berzofsky, J.A. & DeLisi, C. 
Hydrophobicity Scales and Computational Techniques for Detecting 
Amphipathic Structure in Proteins, J. Mol. Biol. 195, 659-685 (1987).

Creighton, T.E., "Proteins: Structures and Molecular Properties", W.H. 
Freeman, New York (1984).

Dayhoff, M.O., Barker, W.C. & Hunt, L.T., Establishing Homologies in Protein 
Sequences, Methods in Enzymology  91, 524-545 (1983)

Dayhoff, M.O., Schwartz, R.M. & Orcutt, B.C., A Model of Evolutionary Change 
in Proteins, Atlas of Protein Structure 5 (Suppl. 3) 345-352 (1979).

Eisenberg, D., Weiss, R.M. & Terwilliger, R.C., The Hydrophobic Moment 
Detects Periodicity in Protein Hydrophobicity, Proc. Nat. Acad. Sci. (USA) 81, 
140-144 (1984).

Fasman, G.D. & Gilbert, W.A., The Prediction of Transmembrane Protein 
Sequences and Their Conformation: an Evaluation, TIBS 15, 89-92 (1990).

Fisher, H.F., A Limiting Law Relating the Size and Shape of Protein Molecules 
to Their Composition, Proc. Natl. Acad. Sci. (USA) 51, 1285-1290 (1964).

Garnier, J., Ogusthorpe, D.J. & Robson, B., Analysis of the Accuracy and 
Implementation of Simple Methods for Predicting the Secondary Structure of 
Globular Proteins, J. Mol. Biol. 120, 97-120 (1978).

Gibrat, J.F., Garnier, J. & Robson, B., Further Development of Protein 
Secondary Structure Prediction Using Information Theory, J. Mol. Biol. 198, 
425-443 (1987).

Gribskov, M., McLachlan, A.D. & Eisenberg, D., Profile Analysis: Detection of 
Distantly Related Proteins, Proc. Nat. Acad. Sci. (USA) 84, 4355-4358 (1987).

Janin, J., Surface and Inside Volumes in Globular Proteins, Nature 277, 491-
493 (1979).

Karplus, P.A. & Schulz, G.E., Prediction of Chain Flexibility in Proteins, 
Naturewissenschaften 72, 212-213 (1985).

Klein, P., Kanehisa, M. & DeLisi, C., The Detection and Classification of 
Membrane-Spanning Proteins, Biochim. Biophys. Acta 815, 468-476 (1985).

Kyte, J. & Doolittle, R.F., A Simple Method for Displaying the Hydropathic 
Character of a Protein, J. Mol. Biol. 157, 105-132 (1982).

Lesk, A.M., Levitt, M. & Chothia, C., Alignment of the Amino Acid Sequences 
of Distantly Related Proteins Using Variable Gap Penalties, Protein Eng. 1, 77-
78 (1986).

Levin, J.M. & Garnier, J., Improvements in a Secondary Structure Method 
Based on a Search for Local Sequence Homologies and its use as a Model 
Building Tool, Biochim. Biophys. Acta 955, 283-295 (1988).

Levin, J.M., Robson, B. & Garnier, J., An Algorithm for Secondary Structure 
Determination in Proteins Based on Sequence Similarity, FEBS Lett. 205, 303-
308 (1986).

Lipman, D.J. & Pearson, W.R., Rapid and Sensitive Protein Similarity Searches, 
Science, 227, 1435-1441 (1985).

McLachlan, A.D., Tests for Comparing Related Amino-acid Sequences: 
Cytochrome C & Cytochrome C551, J. Mol. Biol. 61, 409-423 (1971).

Miller, S. Janin, J., Lesk, A.M. & Chothia, C., Interior and Surface of Monomeric 
Proteins, J. Mol. Biol. 196, 641-656 (1987).

Needleman, S.B. & Wunsch, C.D., A General Method Applicable to the Search 
for Similarities in the Amino Acid Sequence of Two Proteins, J. Mol. Biol. 48, 
443-453 (1970).

Nishikawa, K. & Ooi, T., Amino Acid Sequence Homology Applied to Protein 
Secondary Structures and Joint Prediction with Existing Methods, Biochim. 
Biophys. Acta 871, 45-54 (1986).

Parker, J.M.R., Guo, D., & Hodges, R.S., New Hydrophobicity Scale Derived from 
HPLC Peptide Retention Data, Biochemistry 25, 5425-5431 (1986).

Pearson, W.R. & Lipman D.J., Improved Tools for Biological Sequence 
Comparison, Proc. Nat. Acad. Sci. (USA) 85, 2444-2448 (1988).

Richards, F.M., Areas, Volumes, Packing and Protein Structure, Ann. Rev. 
Biophys. Bioeng. 6, 151-175 (1977).

Rooman, M.J. & Wodak, S.F., Identification of Predictive Sequence Motifs 
Limited by Protein Structure Database Size, Nature 335, 45-49 (1988).

Rooman, M.J. & Wodak, S.J., Weak Correlation Between Predictive Power of 
Individual Sequence Patterns and Overall Prediction Accuracy in Proteins, 
Proteins: Struct. Func. Gen. 9, 68-78 (1991).

Rooman, M.J., Rodriguez, J. & Wodak, S.J., Relations Between Protein Sequence 
and Structure and Their Significance, J. Mol. Biol. 213, 337-350 (1990).

Schwartz, R.M. & Dayhoff, M.O., Matrices for Detecting Distant Relationships, 
Atlas of Protein Structure 5 (Suppl. 3) 353-358 (1979).

Sonnichsen, F.D., Sykes, B.D., Chao, H. & Davies, P.L., The Nonhelical Structure 
of Antifreeze Protein Type III. Science 259, 1154-1157 (1992).

Sweet, R.M., Evolutionary Similarity Among Peptide Segments is a Basis for 
Predicting Protein Folding, Biopolymers 25, 1566-1577 (1986).

Upton, C., Mossman, K. & McFadden, G., Encoding of a Homolog of the IFN-g 
Receptor by Myxoma Virus. Science 258, 1369-1372 (1992).

Upton, C., Stuart, D. & McFadden, G., Identification of a Pox Virus Gene 
Encoding a Uracyl DNA Glycosylase. Proc. Natl. Acad. Sci. USA (in press).

Williams, R.W., Chang, A., Juretic, D. & Loughram, S., Secondary Structure 
Predictions and Medium Range Interactions, Biochim. Biophys. Acta 916, 
200-204 (1987).

Zamayatnin, A.A., Protein Volume in Solution, Prog. Biophys. Mol. Biol. 24, 
107-123 (1972).


APPENDIX 1
THE SEQSEE CONTROL FILE

In attempting to provide the user with as much operational flexibility as 
possible we have chosen to make the SEQSEE control file completely "user" 
accessible.  The control file contains default values of all the library 
filenames, parameters, penalties, matrices and other variables which are 
called whenever a function on SEQSEE is implemented.  By allowing free 
access to the control file we hope that the user will find it conducive to 
"experimenting" with different alignment matrices, hydrophobicity scales or 
sequence patterns to discover what values best suite his or her needs.  The 
control file may be accessed and altered through the "File Viewer" command 
while in SEQSEE or it may be altered outside SEQSEE by editing the file 
named "seqsee.parms" in the directory "/usr/local/seqsee".  A complete 
listing of the SEQSEE control file and all of its parameter options is provided 
below. 

        **** Parameter List for SEQSEE ****

Users should feel free to copy this file to their own directoryand make any 
changes they feel appropriate.  Parameter entries arepreceded by 2 
consecutive angle brackets, the order of the parameters must be maintained!  
Comments and blank lines can be placed anywhere.

*****************************************************************

Id code for main seqsee driver.
>> SEQSEE_V1.2

Location of programs that the seqsee driver will be calling
>> /canopus/rbo/seqsee/seqhelp/seqhelp
>> /canopus/rbo/seqsee/seqed/seqed
>> /canopus/rbo/seqsee/seqret/seqret
>> /canopus/rbo/seqsee/stats/stats
>> /canopus/rbo/seqsee/alexis/alexis
>> /canopus/rbo/seqsee/seqsearch/seqsearch
>> /canopus/rbo/seqsee/fleqsee/fleqsee
>> /canopus/rbo/seqsee/moment/moment
>> /canopus/rbo/seqsee/hydro/hydro
>> /canopus/rbo/seqsee/fast_align/fast_align
>> /canopus/rbo/seqsee/sb_align/sb_align
>> /canopus/rbo/seqsee/nw_align/nw_align
>> /canopus/rbo/seqsee/mult_align/mult_align
>> /canopus/rbo/seqsee/psearch/psearch
>> /canopus/rbo/seqsee/hsearch/hsearch
>> /canopus/rbo/seqsee/dotplot/dotplot
>> /canopus/rbo/seqsee/refscan/refscan
>> /canopus/rbo/seqsee/browse/browse

Automatically enter vi editor when results found (1=yes, 0=no).
>> 1

*****************************************************************
Id code for help function. Do not change this line.
>> HELP

Number of help files
>> 9

Location of each of the help files
>> /canopus/rbo/seqsee/docs/help.authors
>> /canopus/rbo/seqsee/docs/help.intro
>> /canopus/rbo/seqsee/docs/help.recom
>> /canopus/rbo/seqsee/docs/help.menu.brief
>> /canopus/rbo/seqsee/docs/help.menu.details
>> /canopus/rbo/seqsee/docs/help.tutorial
>> /canopus/rbo/seqsee/docs/help.ques
>> /canopus/rbo/seqsee/docs/help.seqfile
>> /canopus/rbo/seqsee/docs/help.aa.info

*****************************************************************
Id code for seqret. Do not change this line.
>> SEQRET

What format is the sequence database?
>> 4            1 = SWISS-PROT,  2 = PIR,
                3 = SWISS-PROT (intelligenetics version),
                4 = PIR (intelligenetics version)
>> 6             Number of files that compose the database

Location of each sequence database file
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED2.PDB

Update output file every 'x' proteins which are processed.
>> 1000

*****************************************************************
Id code for stats function. Do not change this line.
>> STATS

Location of SEQBANK database
>> /canopus/rbo/seqsee/databases/SEQBANK.db

hydrophobicity table
>> /canopus/rbo/seqsee/lib/kyte.parms

These thresholds are dependent on the hydrophobicity table used.
>>  0.10        hydrophobic proteins threshold
>> -6.00        hydrophilic proteins threshold
>>  0.85        protein insoluble threshold
>>  1.90        protein generally does not fold threshold
>>  0.77        protein insoluble threshold
>>  1.43        protein generally does not fold threshold

Hydrophobic Amino Acids
>> ACFGHILMVWY    one letter codes
>> 52.44        average percent of these amino acids in a protein

Hydrophilic Amino Acids
>> DEKNPQRST    one letter codes
>> 47.56        average percent of these amino acids in a protein

molecular weight table
>> /canopus/rbo/seqsee/lib/mol.weights

molecular volume table
>> /canopus/rbo/seqsee/lib/mol.volume

molecular surface area table
>> /canopus/rbo/seqsee/lib/mol.surfarea

molecular partial specific volume table
>> /canopus/rbo/seqsee/lib/mol.parspecvol

molecular polar, nonpolar, surface area table
>> /canopus/rbo/seqsee/lib/mol.asa

molecular fraction buried
>> /canopus/rbo/seqsee/lib/mol.fracbur

fraction of amino acids buried
>> /canopus/rbo/seqsee/lib/fracbur.parms

*****************************************************************
Function code for alexis
>> ALEXIS

Location of programs alexis will be running
>> /canopus/rbo/seqsee/a_membrane/a_membrane
>> /canopus/rbo/seqsee/a_motif/a_motif
>> /canopus/rbo/seqsee/a_homol/a_homol
>> /canopus/rbo/seqsee/a_moment/a_moment
>> /canopus/rbo/seqsee/a_gor/a_gor
>> /canopus/rbo/seqsee/a_cfas/a_cfas

Correlation tables for predicting protein structural classes
>> /canopus/rbo/seqsee/lib/alexis.norm        (for most sequences)
>> /canopus/rbo/seqsee/lib/alexis.cys        (for heavy cys sequences)

Remove intermediate results files?
>> 1            (1=yes, 0=no)

*****************************************************************
Identification code for the following set of parameters.
Do not change this line.
>> A_MEMBRANE

Location of membrane spanning hydrophobicity parms
>> /canopus/rbo/seqsee/lib/membrane.parms

Nature of membrane spanning test (scaling constants)
>> -9.02  170.00    14.27    

*****************************************************************
Identification code for the following set of parameters.
Do not change this line.
>> A_HOMOL

Enter the location of the SEQBANK database.
>> /canopus/rbo/seqsee/databases/SEQBANK.db

Tell program the location of the similarity scoring matrix.
>> /canopus/rbo/seqsee/lib/wt.rbo

Homologous segments must have a certain minimum test stat
before the secondary structure they represent is counted.
>> 3.20

Improve prediction by weighting of scores because of unequal
representation of secondary stucture in the database.
>> 1.000        /* betastrand represent 28% of seqbank */
>> 0.820        /* coil represent almost 37% of seqbank */
>> 0.780        /* helix represent almost 35% of seqbank */

Offset and multiplier needed to normalize prediction
scores to mean=1000 and stddev=200.
>> 494.00   0.89

Improve prediction by applying smoothing function
>> 1            /* number of times to apply smoothing function */

Improve prediction by biasing random coils at sequence ends.
>> 1            /* 1=yes, 0=no */

Improve prediction by class weighting.
>> 1            /* 1=yes, 0=no */
>> 1.10         /* beta Scores */
>> 1.30         /* helix Scores */

Improve prediction by smoothing the predicted structure.
>> 1            /* 1=yes, 0=no */

*****************************************************************
Identification code for the following set of parameters
Do not change this line.
>> A_MOMENT

Tell program the location of the chou-fasman parameters.
>> /canopus/rbo/seqsee/lib/moment.cfas

Tell program the location of the hydrophobicity parms which
are biased for BetaStrands.
>> /canopus/rbo/seqsee/lib/moment.bhydro

Tell program the location of the hydrophobicity parms which
are biased for Helices.
>> /canopous/rbo/seqsee/lib/moment.hhydro

Beta Strand Prediction Parameters
>> 7                    /* window size */
>> 1 2 3 4 3 2 1        /* cfas weighting factors */
>> 2                    /* number of periodicity tests */
>> 160 180              /* preiodicity angles */

Coil Prediction Parameters
>> 5                    /* window size */
>> 2 3 4 3 2            /* cfas weighting factors */

Helix Prediction Parameters
>> 11                   /* window size */
>> 2 3 3 3 3 3 3 3 3 3 2  /* cfas weighting factors */
>> 2                    /* number of periodicity tests */
>> 100 110              /* periodicity angles */

Offset and multiplier needed to normalize prediction
scores to mean=1000 and stddev=200.
>> 831.00   13.30

Improve prediction by applying smoothing function
>> 1            /* number of times to apply smoothing function */

Improve prediction by biasing random coils at sequence ends.
>> 1            /* 1=yes, 0=no */

Improve prediction by class weighting.
>> 1            /* 1=yes, 0=no */
>> 0.95         /* beta Scores */
>> 1.05         /* helix Scores */

Improve prediction by smoothing the predicted structure.
>> 1            /* 1=yes, 0=no */

*****************************************************************
Identification code for the following set of parameters.
Do not change this line.
>> A_GOR

Location GOR parms
>> /canopus/rbo/seqsee/lib/gor.data

Offset and multiplier needed to normalize prediction
scores to mean=1000 and stddev=200.
>> 966.0    13.10

Improve prediction by applying smoothing function
>> 0            /* number of times to apply smoothing function */

Improve prediction by biasing random coils at sequence ends.
>> 1            /* 1=yes, 0=no */

Improve prediction by class weighting.
>> 1            /* 1=yes, 0=no */
>> 0.95         /* beta Scores */
>> 1.05         /* helix Scores */

Improve prediction by smoothing the predicted structure.
>> 1            /* 1=yes, 0=no */

*****************************************************************
Identification code for the following set of parameters.
Do not change this line.
>> A_CFAS

Tell program the location of the wieghting parameters
See the default listed here to understand the input format.
>> /canopus/rbo/seqsee/lib/cfas.data

BetaStrand window size
>> 7

Weighting factors within this window for BetaStrand
>> 1 2 3 4 3 2 1

Coil Window Size
>> 5

Weighting factors within this window for Coil
>> 1 2 3 2 1

Helix Window Size
>> 9

Weighting factors within this window for Helix
>> 1 2 3 4 5 4 3 2 1

Offset and multiplier needed to normalize prediction
scores to mean=1000 and stddev=200.
>> 953.0    13.50

Improve prediction by applying smoothing function
>> 1            /* number of times to apply smoothing function */

Improve prediction by biasing random coils at sequence ends.
>> 1            /* 1=yes, 0=no */

Improve prediction by class weighting.
>> 1            /* 1=yes, 0=no */
>> 1.02         /* beta Scores */
>> 1.00         /* helix Scores */

Improve prediction by smoothing the predicted structure.
>> 1            /* 1=yes, 0=no */

*****************************************************************
Function ID code for motif searching program (motifs from literature)
>> LIT_MOTIF

Location of motifs databases
>> /canopus/rbo/seqsee/databases/seqmotif1.db

Printing Parameters
>> 100        Print stats summary every 'x' motifs processed
>> 1            Print individual motifs which match (1=yes, 0=no)

*****************************************************************
Function ID code for motif searching program (computer generated 
dbase)
>> COMP_MOTIF

Location of motifs databases
>> /canopus/rbo/seqsee/databases/seqmotif2.db

Printing Parameters
>> 100        Print stats summary every 'x' motifs processed
>> 1            Print individual motifs which match (1=yes, 0=no)

*****************************************************************
ID code for seqsite function. Do not change this line.
>> SEQSEARCH

Number of seqsite databases
>> 3

Location of seqsite databases
>> /canopus/rbo/seqsee/databases/SEQSITE.db  (general sequence motifs)
>> /canopus/rbo/seqsee/databases/PHOSITE.db  (general phosphorylation sites)
>> /canopus/rbo/seqsee/databases/EPISITE.db  (antigenic sites)

*****************************************************************
Function ID code. Do not change this line.
>> FLEQSEE

Type of output, 0 = weighted scores, 1 = raw scores
>> 1

Location of flexibility parameters
>> /canopus/rbo/seqsee/lib/fleqsee.parms

Manipulating Flexibility Scores
>> 7                Window size
>> 1  2  3  4  3  2  1  Weighting constants based on window size

*****************************************************************
Function ID code. Do not change this line.
>> MOMENT

Type of output, 0 = weighted scores, 1 = raw scores
>> 1

Location of hydrophobicity parameters (hmom.* files)
>> /canopus/rbo/seqsee/lib/hmom.cornet

Nature of periodicity tests
>> 8             number of tests
>> 0     5    0     type(0=beta, 1=coil, 2=helix), window size, periodicity angle
>> 0     5  160
>> 0     5  170
>> 0     5  180
>> 2     9   90
>> 2     9  100
>> 2     9  110
>> 2     9  120

smoothing function to be applied 'x' times
>> 2

*****************************************************************
Function ID code. Do not change this line.
>> HYDRO

Type of output, 0 = weighted scores, 1 = raw scores
>> 1

Location of hydrophobicity parameters (hphob.* files)
>> /canopus/rbo/seqsee/lib/hphob.kyte

Manipulating hydrophobicity scores
>> 7                Window size
>> 1  2  3  4  3  2  1  Weighting constants based on window size

*****************************************************************
Function ID code. Do not change this line.
>> FAST_ALIGN

What format is the sequence database?
>> 4            1 = SWISS-PROT,  2 = PIR,
                3 = SWISS-PROT (intelligenetics version),
                4 = PIR (intelligenetics version)
>> 6             Number of files that compose the database

Location of each sequence database file
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED2.PDB

Tell program the location of the similarity scoring matrix.
>> /canopus/rbo/seqsee/lib/wt.align

What minimum value from the similarity scoring matrix would 
constitute a near match?
>> 5

Cut-off score for similar tuples. Note that this score 
depends on the matrix selected above. For example in the
matrix 'wt.align', FYE is similar to YFN if the cutoff score
is 50 or less. 
>> 48

Update output file every 'x' proteins which are processed.
>> 1000

Penalize the alignment score 'x' points every time a gap
needs to be introduced. The value of 'x' depends on the 
similarity scoring matrix, a typical value being the 3rd or
4th highest number in the matrix.
>> 20

Penalize the alignment score 'x' points for each entry in
the gap. This will keep the gap from getting too large.
>> 5

*****************************************************************
ID code for exhaustive alignment on seqsee database.
>> SB_ALIGN

Enter the location SEQBANK database.
>> /canopus/rbo/seqsee/databases/SEQBANK.db

Tell program the location of the similarity scoring matrix.
See the default listed here to understand the input format.
>> /canopus/rbo/seqsee/lib/wt.rbo

What minimum value from the similarity scoring matrix would
constitute a near match? 
>> 5

Random number seed used to jumble sequences.
>> 13791

sorting alignment scores
0 = sort by raw score (tends to overlook smaller sequences)
1 = sort by raw score / sequence len (fast, generally more accurate)
2 = sort by jumbling (very slow but most accurate)
>> 1

These parameters are only used if sort by jumbling option chosen.
Number of jumbles based on current test stat. (6 entries only!)
(eg, if after 18 jumbles the test stat exceeds 2 std dev, keep going).
   jumbles    std dev
>>       3         0.00
>>       8         1.00
>>      18         2.00
>>      50         3.00
>>     150         4.00
>>     500      9999.00 (this tstat value is ignored here)

Update output file every 'x' proteins processed.
>> 10

Penalize the alignment score 'x' points every time a gap
needs to be introduced. The value of 'x' depends on the
similarity scoring matrix, a typical value being the 3rd or
4th highest number in the matrix.
>> 10

Penalize the alignment score 'x' points for each entry in
the gap. This will keep the gap from getting too large.
>> 2

Penalty for a gap within a random coil region
>> 0

Penalty for a gap at the end of a helix or beta strand structure
>> 1

Penalty for a gap in the middle of a helix or beta strand structure
>> 4

*****************************************************************
Identification code for the following set of paramaters.
>> NW_ALIGN

What format is the sequence database?
>> 4            1 = SWISS-PROT,  2 = PIR,
                3 = SWISS-PROT (intelligenetics version),
                4 = PIR (intelligenetics version)
>> 6             Number of files that compose the database

Location of each sequence database file
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED2.PDB

Tell program the location of the similarity scoring matrix.
Matrices such as Dayhoff can be used.
See the default listed here to understand the input format.
>> /canopus/rbo/seqsee/lib/wt.rbo

What minimum value from the similarity scoring matrix would 
constitute a near match?
>> 5

Random number seed used to jumble sequences
>> 13791

sorting alignment scores
0 = sort by raw score (tends to overlook smaller sequences)
1 = sort by raw score / sequence len (fast, generally more accurate)
2 = sort by jumbling (very slow but most accurate)
>> 1

These parameters are only used if sort by jumbling option chosen.
Number of jumbles based on current test stat. (6 entries only!)
(eg, if after 18 jumbles the test stat exceeds 2 std dev, keep going).
   jumbles    std dev
>>       3         0.00
>>       8         1.00
>>      18         2.00
>>      50         3.00
>>     150         4.00
>>     500      9999.00 (this tstat value is ignored here)

Update output file every 'x' proteins processed.
>> 50

Penalize the alignment score 'x' points every time a gap
needs to be introduced. The value of 'x' depends on the
similarity scoring matrix, a typical value being the 3rd or
4th highest number in the matrix.
>> 10

Penalize the alignment score 'x' points for each entry in
the gap. This will keep the gap from getting too large.
>> 2

*****************************************************************
ID function code. Do not change this line
>> MULT_ALIGN

Tell program the location of the similarity scoring matrix. 
>> /canopus/rbo/seqsee/lib/wt.rbo

What minimum value from the similarity scoring matrix would 
constitute a near match?
>> 5

Random number seed used to jumble sequences
>> 13791

sorting alignment scores
0 = sort by raw score (tends to overlook smaller sequences)
1 = sort by raw score / sequence len (fast, generally more accurate)
2 = sort by jumbling (very slow but most accurate)
>> 0

These parameters are only used if sort by jumbling option chosen.
Number of jumbles based on current test stat. (6 entries only!)
(eg, if after 18 jumbles the test stat exceeds 2 std dev, keep going).
   jumbles    std dev
>>       3         0.00
>>       8         1.00
>>      18         2.00
>>      18         3.00
>>      18         4.00
>>      18      9999.00 (this tstat value is ignored here)

Print pairwise alignments? (1=yes, 0=no)
>> 1

Consensus percent - Print the amino acid in the consensus sequence
if it is found above the consensus percent threshold.
>> 70

Penalize the alignment score 'x' points every time a gap
needs to be introduced. The value of 'x' depends on the
similarity scoring matrix, a typical value being the 3rd or
4th highest number in the matrix.
>> 10

Penalize the alignment score 'x' points for each entry in
the gap. This will keep the gap from getting too large.
>> 2

*****************************************************************
Identification code for this function. Do not change this line.
>> PSEARCH

What format is the sequence database?
>> 4            1 = SWISS-PROT,  2 = PIR,
                3 = SWISS-PROT (intelligenetics version),
                4 = PIR (intelligenetics version)
>> 6             Number of files that compose the database

Location of each sequence database file
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED2.PDB

Location of the structurally-determined database.
>> /canopus/rbo/seqsee/databases/SEQBANK.db

Allow multiple matches for a search string in a sequence
>> 0         1 = yes, 0 = no

Update output file every 'x' proteins which are processed.
>> 1000

*****************************************************************
Identification code for this function. Do not change this line.
>> HSEARCH

What format is the sequence database?
>> 4            1 = SWISS-PROT,  2 = PIR,
                3 = SWISS-PROT (intelligenetics version),
                4 = PIR (intelligenetics version)
>> 6             Number of files that compose the database

Location of each sequence database file
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED2.PDB

Location of structurally determined database.
>> /canopus/rbo/seqsee/databases/SEQBANK.db

Tell program the location of the similarity scoring matrix.
Matrices such as Dayhoff can be used.
>> /canopus/rbo/seqsee/lib/wt.align

What minimum value from the similarity scoring matrix would 
constitute a near match?
>> 5

Update output file every 'x' proteins which are processed.
>> 1000

*****************************************************************
Function ID code. Do not change this line.
>> DOTPLOT

What format is the sequence database?
>> 4            1 = SWISS-PROT,  2 = PIR,
                3 = SWISS-PROT (intelligenetics version),
                4 = PIR (intelligenetics version)
>> 6             Number of files that compose the database

Location of each sequence database file
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED2.PDB

Tell program the location of the similarity scoring matrix.
See the default listed here to understand the input format.
>> /canopus/rbo/seqsee/lib/wt.align

What minimum value from the similarity scoring matrix would 
constitute a near match?
>> 5

Length Penalty Value: subtract n*lenPenalty from our score where 
'x' is the number of amino acids.
>> 5

Threshold Score (homologous segments must score above)
>> 80

msearchFlag - Does multiple scans down diagonals
Only turn this flag on if database is small. (0 = off, 1 = on)
>> 0

Update output file every 'x' proteins which are processed.
>> 200


*****************************************************************
Identification code for this function. Do not change this line.
>> REFSCAN

What format is the sequence database?
>> 4            1 = SWISS-PROT,  2 = PIR,
                3 = SWISS-PROT (intelligenetics version),
                4 = PIR (intelligenetics version)
>> 6             Number of files that compose the database

Location of each sequence database file
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_ANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNANNOTATED2.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED1.PDB
>> /canopus/rbo/seqsee/databases/pir.IG/PIR_UNREVIEWED2.PDB

Update output file every 'x' proteins which are processed.
>> 1000

*****************************************************************
Identification code for browse function. Do not change this line.
>> BROWSE

Location of SEQBANK database
>> /canopus/rbo/seqsee/databases/SEQBANK.db

Location of pirsee databases (Pir Titles + ID codes)
>> /canopus/rbo/seqsee/databases/PIRSEE.db

Location of swissee databases (Swiss-prot Titles + ID codes)
>> /canopus/rbo/seqsee/databases/SWISSEE.db

Location of Default Parameters file for SEQSEE
>> /canopus/rbo/seqsee/seqsee.parms


APPENDIX 2
EXPLANATION OF STATS OUTPUT


The Stats output can be divided into 8 sections describing estimates and 
predictions of molecular weight, amino acid composition, hydrophobicity, pH, 
surface area, volume, aggregation potential, radius and estimated solvation 
free energy of folding.  These are described in more detail below:

1) Molecular Weight Calculations:  These are accurate to 0.1 amu and are 
based on amino acid weights derived from Creighton (1984).  These values 
may be used in Mass Spectrometry calculations and calibrating other weight 
dependent physical methods.

2) Amino Acid Composition Calculations:  The estimated frequencies are 
derived from amino acid frequencies obtained from the SEQBANK database.  
Both the weight percent and numeric percent values may be used to identify 
unusually high or unusually low frequencies of certain types of amino acids.  
This data can be important in understanding certain physical characteristics 
of proteins.

3) Hydrophobicity Calculations:  Averages, percentages and ratios are 
calculated using the Kyte-Doolittle hydrophobicity values (Kyte and Doolittle, 
1982).  These values can be used to estimate the solubility, stability and 
"foldability" of peptides and proteins.  These are estimates and should only 
be considered as potential indicators of the physical properties of a query 
sequence.

4) Charge Calculations:  These are calculated using standard equations for 
charge and pH of linear sequences.  The estimated pI, pH and charge values 
do not take into account the actual tertiary structure of the protein molecule 
and so these will tend to be slightly different than what is actually 
measured.  The linear charge density can be used to estimate the potential 
solubility of a peptide (or protein).  A linear charge density of 0.20 is usually 
required for a peptide to be soluble.

5) Surface Area Calculations:  These are estimates based on amino acid 
composition, molecular weight and the assumption that the protein folds into 
a globular shape (Miller et al., 1987).  They are not based on actual three-
dimensional structures.  However, these estimates have been found to be 
quite accurate and can be useful when one wishes to compare "predicted" 
values with "actual" values of known or modelled X-ray structures.  These 
comparisons can be used to assess the quality of a structure or structural 
model.  They may also give some indication of the potential that a peptide 
sequence will fold using the theories developed by Ken Dill  and others.

6) Volume Calculations:  These can give some indication of the expected 
compactness of a peptide or protein.  The values given are estimates based 
on amino acid composition and molecular weight.  The estimate of partial 
specific volume may be useful in ultra-centrifugation studies (Zamayatnin, 
1972).

7) Solubility and Aggregation Calculations:  These calculations are based on 
relatively simple statistical theories and correlations regarding the 
propensity of some peptides and proteins to fold, to aggregate or to fall out 
of solution (Fisher, 1964).  It is important to note that the predictions are not 
based on actual three-dimensional structures and that there are no 
guarantees on their accuracy.  These predictions can be used to identify 
"problem" peptides and proteins that are about to be synthesized on a 
peptide synthesizer or expressed in bacteria.

8) Radius and Free Energy Calculations:  These are based on standard 
formulae found in most biochemistry texts.  The "Folded" value is based on 
the assumption that the sequence represents that of a water-soluble, 
monomeric, globular protein.  Free energy calculations are based on the 
paper by Chiche et al. (1991).


    Following is a more detailed description of the "stats" output file.  We 
have tried to provide algebraic expressions for as many of the statistical 
results as possible.  Most of these equations represent approximations or 
estimates -- they should not be considered "infallible".  The error associated 
with these approximations is typically +/-5 or +/-10%.  References for many 
of these expressions and the theory behind them can be found in the 
reference list provided at the end of this appendix.


*************************************************************

                    DEFINITIONS

        mw(i)   = molecular weight of amino acid type i
        a(i)    = number of amino acids of type i
        A(i)    = number of amino acids of type i in SEQBANK
        num     = number of residues
        NUM     = number of residues in SEQBANK
        hp(i)   = hydropathy of amino acid type i
        pk(i)   = pka of side chain of amino acid type i
        asa(i)  = accessible surface area of amino acid type i
        pasa(i) = polar asa of amino acid type i
        nasa(i) = non-polar asa of amino acid type i
        v(i)    = volume of amino acid type i
        f(i)    = fractional buried surface area of amino acid type i
        fb(i)   = fraction of amino acids of type i found buried
        sv(i)   = specific volume for amino acid type i
        w(i)    = weight percent of amino acid type i

**************************************************************

Molecular Weight......: MW  = Sa(i)*mw(i)
Amino acids...........:    num = Sa(i)
Mean residue weight...:    MRW = num/MW


                *** Amino Acid Content ***

    Amino    Freq           Freq        E(Freq)    Weight     E(weight)
     Acid   (total)       (percent)    (percent)  (percent)    (percent)
    

             a(i)            a(i)         A(i)    a(i)*mw(i)   A(i)*mw(i)
                              num         NUM         num         NUM


Note: E(x) are expected values based on average
    amino acid content of soluble proteins.

**************************************************************

Hydrophobicity Parameters: /canopus/rbo/seqsee/lib/kyte.parms

Average Hydrophobicity (ah)...................:   AH = Sa(i)*hp(i)
Notes: ah = -2.67 --> Average Protein
     ah >  0.10 --> Hydrophobic Protein
     ah < -6.00 --> Hydrophilic Protein

Ratio of Hydrophilicity to Hydrophobicity (rh):      RH = 
|hydrophilic/hydrophobic|
Notes: rh =  1.22 --> Average Protein          hydrophilic = neg. comp. of
     rh >  1.90 --> Non-folding Protein                    
hydropathy
     rh <  0.85 --> Insoluble Protein          hydrophobic = pos. comp. of
                                            
hydropathy

Percentage of Hydrophobic residues............:   %HB = (#A + #C + #F +...)/num
Notes: Average percentage is 52.44
     Hydrophobic Amino Acids are ACFGHILMVWY

Percentage of Hydrophilic residues............:   %HL = (#D + #E + #K +...)/num
Notes: Average percentage is 47.56
     Hydrophilic Amino Acids are DEKNPQRST

Ratio of %Hydrophilic to %Hydrophobic.........:   %HL/%HB
Notes: rhp = 0.91 --> Average Protein
       rhp > 1.43 --> Non-folding Protein
       rhp < 0.77 --> Insoluble Protein

**************************************************************

Number of  Basic amino acids:        NB   = #K + #R
Number of Acidic amino acids:        NA   = #D + #E
Estimated pI for protein....:          PI   = 0  = S{�1/(1 + 10**[pKi - 
pHi])}

pH:        3    4    5    6    7    8    9    10    11
Charge:     charge = S{�1/(1 + 10**[pKi - pHi])}

Total linear charge density.:       LIND = {#K + #R + #D + #E + 2}/num

**************************************************************

Polar Area of Extended Chain...............: PAEC = Spasa(i)*a(i)
Non-Polar Area of Extended Chain...........: NAEC = Snasa(i)*a(i)
Total Area of Extended Chain ..............: AEC  = PAEC + NAEC

Polar ASA of Folded Protein................: APFC = AFC - ANFC
Non-Polar ASA of Folded Protein............: ANFC = [NAEC*(-6.21 + 
118*RFE)]/100
ASA of folded protein......................: AFC  = 7.11*MW**0.718

Ratio of Folded to Extended Area...........: RFE  = AFC/AEC

*************************************************************

Buried Polar Area of Folded Protein........: ABP  = 0.35*AB
Buried Non-polar Area of Folded Protein....: ABN  = 0.61*AB
Buried Charge Area of Folded Protein.......: ABC  = 0.04*AB
Total Buried Surface.......................: AB   = AEC - AFC


Expected Number and Fraction of Residues 95% Buried

        EFB(i)  = (f(i)*NB)/[num - NB + (F(i)*NB)]
        NUMB(i) = a(i)*EFB(i)

Number of buried Amino Acids...............: NB   = (num**0.333 - 
2.0)**3.0

*************************************************************

Packing Volume (estimate)..................: VP   = 1.245*MW
Packing Volume (actual)....................: VP   = Sa(i)*v(i)
Interior Volume of Protein.................: VIN  = Sa(i)*fb(i)*v(i)
Exterior Volume of Protein.................: VEXT = Sa(i)*(1-fb(i))*v(i)

Partial Specific Volume....................: PSV  = Ssv(i)*w(i)

*************************************************************

Fisher Volume Ratio (actual)...............: VR   = VEXT/VIN
Fisher Volume Ratio (idealized)............: VRT  = [RAD**3/(RAD - 4.0)**3] 
- 1

 If VR > VRT then molecule likely forms soluble monomer
 If VR >> VRT then molecule likely doesn't fold into compact structure
 If VR < VRT then molecule likely aggregates

Protein Solubility.........................: SOL  = RH*100 + LIND*100 + AH*5
Notes: solubility = 1.6 --> Average Protein
       solubility < 1.1 --> Insoluble Protein

*************************************************************

Radius of Protein..........................: RAD  = 3.875*(num**0.333)
RMS end to end distance of Ext. chain......: RMS  = (110*num)**0.5
Radius of Gyration of Extened chain........: RG   = RMS/2.45

*************************************************************

Solvation Free Energy of Folding...........: SFE  = 16.02 - 0.99*num


References

Chiche, L., Gregoret, L.M., Cohen, F.E. & Kollman, P.A., Protein Model Structure 
Evaluation Using the Solvation Free Energy of Folding, Proc. Natl. Acad. Sci. 
(USA) 87, 3240-3243 (1990).

Chothia, C., Structural Invariants in Protein Folding, Nature 254, 304-308 
(1975).

Chothia, C., The Nature of the Accessible and Buried Surfaces in Proteins, J. 
Mol. Biol. 105, 1-14 (1976).

Creighton, T.E., "Proteins: Structures and Molecular Properties", W.H. 
Freeman, New York (1984).

Fisher, H.F., A Limiting Law Relating the Size and Shape of Protein 
Molecules to Their Composition, Proc. Natl. Acad. Sci. (USA) 51, 1285-1290 
(1964).

Janin, J., Surface and Inside Volumes in Globular Proteins, Nature 277, 491-
493 (1979).

Kyte, J. & Doolittle, R.F., A Simple Method for Displaying the Hydropathic 
Character of a Protein, J. Mol. Biol. 157, 105-132 (1982).

Miller, S. Janin, J., Lesk, A.M. & Chothia, C., Interior and Surface of Monomeric 
Proteins, J. Mol. Biol. 196, 641-656 (1987).

Richards, F.M., Areas, Volumes, Packing and Protein Structure, Ann. Rev. 
Biophys, Bioeng. 6, 151-176 (1977).

Zamayatnin, A.A., Protein Volume in Solution, Prog.. Biophys. Mol. Biol. 24, 
107-123 (1972).