Example of Type-2 analysis

BMERC : psa-request : Server results : Type-2 example

This example uses the same sequence as in the "Example of an E-Mail Request" section, but requests Type-2 analysis instead of Type-1 analysis (which is shown on the "Example of Type-1 analysis" page). Click each plot to view a PDF version locally at higher resolution.

  1. Example of Type-2 analysis
    1. Requesting e-mail message
    2. E-mail acknowledgement from the server
    3. E-mail results cover letter
    4. Secondary-Structure Probabilities
    5. Strand/Turn/Helix Probabilities


Requesting e-mail message

This shows the e-mail message as it would be composed by the user. The
WWW interface also generates something that looks like an email message internally, but the user only sees this as an attachment to the acknowledgement message.
    To: psa-request@darwin.bu.edu
    Subject: Seq 23
    Analysis-assumptions: minimal

    ; psa-plot-format: postscript
    ; Wilson Brandlesnarf
    ; BMERC
    ; Boston MA
    ; 617-353-7123
    Sequence 23
    GWEIPEPYVWDESFRVFYEQLDEEHKKIFKGIFDCIRDNSAPNLATLVKV
    TTNHFTHEEAMMDAAKYSEVVPHKKMHKDFLEKIGGLSAPVDAKNVDYCK
    EWLVNHIKGTDFKYKGKL

    Regards, Wilson
Except for the :Analysis-assumptions:" line, this is the same message as illustrated in the "Example of an E-Mail Request" section; see there for an explanation of the syntax of e-mail messages. The first three lines are part of the e-mail header (it probably looks different in every system ever written for composing e-mail, so your system is unlikely to be an exception).


E-mail acknowledgement from the server

The acknowledgement consists mostly of an echo of the original mail message (together with whatever e-mail headers were added in transit).

    From: psa@darwin.bu.edu (Protein Structure Analysis server)
    To: wb@darwin.bu.edu
    Subject: Received request 14757: [Seq 23]
    Date: Wed, 18 Nov 1998 17:59:24 -0500

    We have received your request dated "Wed, 18 Nov 1998 17:58:25 -0500"
    containing an amino acid sequence of 118 residues labelled "Sequence
    23" for a protein structure analysis run; it has been queued as
    request number 14757.  There are no requests ahead of it in the queue.

    Note: You are getting the new Type 2 models because you requested them
    explicitly.  If this is not what you expected, change the
    "analysis-assumptions:" line in the message header to read:

            analysis-assumptions: monomeric-soluble

    starting in column one.  (Alphabetic case does not matter.)

    --------------------------- Original message ---------------------------
    Date: Wed, 18 Nov 1998 17:58:25 -0500
    Message-Id: <199811182258.RAA15982@gamow>
    From: Wilson Brandlesnarf <wb@darwin.bu.edu>
    To: psa-request@darwin.bu.edu
    Subject: Seq 23
    Analysis-assumptions: minimal

    ; psa-plot-format: postscript
    ; Wilson Brandlesnarf
    ; BMERC
    ; Boston MA
    ; 617-353-7123
    Sequence 23
    GWEIPEPYVWDESFRVFYEQLDEEHKKIFKGIFDCIRDNSAPNLATLVKV
    TTNHFTHEEAMMDAAKYSEVVPHKKMHKDFLEKIGGLSAPVDAKNVDYCK
    EWLVNHIKGTDFKYKGKL

    Regards, Wilson

It also includes the request ID assigned by the server upon receipt, a statement of the fact that the request was for a Type-2 analysis, and an indication of the server queue size. If the sequence length and/or label are not as expected, it could mean that the server had trouble parsing the message; in that case, please recheck and try again. If the message does not explicitly state that the request is for a Type-2 analysis run (i.e. it looks like an e-mail acknowledgement for a Type-1 analysis request), then the server was unable to parse the "Analysis-Assumptions:" field and started a Type-1 analysis by default. (This sort of confusion should never happen for requests submitted via Web.)


E-mail results cover letter

This section covers the first e-mail message returned to the user when the analysis is complete. Since it is fairly large, we abbreviate and break it into pieces for purposes of discussion; click here to see the full text of the cover letter.


    From: psa@darwin.bu.edu (Protein Structure Analysis server)
    To: wb@darwin.bu.edu
    Subject: Request 14757 result (1 of 3): [Seq 23]
    Date: Wed, 18 Nov 1998 18:00:19 -0500

    The analysis of your protein sequence has been completed.  A search of
    the Protein Data Bank, using Blast, indicates that your sequence is
    similar to the proteins 1A7D (length 118), 2MHR (length 118), 1A7E
    (length 118), 1HRB (length 113), 2HMQA (length 113), 2HMZA (length
    113), 1HMDA (length 113), and 1HMOA (length 113), which all have known
    structures.  The following analysis results were generated without
    reference to these known structures or any of their known homologs.
Note how the server has caught the fact that we have submitted a sequence of known structure in order to test the server; "Sequence 23" is in fact the sequence of PDB locus 2mhr.

After the initial "announcement" paragraph, there are several paragraphs explaining the other messages, and how to view the plots; we have omitted those here.


    ------------------------------- Sequence -------------------------------
    ; This is the actual sequence used.
    Sequence 23
    GWEIP EPYVW DESFR VFYEQ LDEEH KKIFK GIFDC IRDNS APNLA TLVKV
    TTNHF THEEA MMDAA KYSEV VPHKK MHKDF LEKIG GLSAP VDAKN VDYCK
    EWLVN HIKGT DFKYK GKL1
Following the text, the sequence is echoed in the form used by the server software.

Finally, the transcript from the compute engine is included.


    Analyzing Sequence 23. This is 18-Nov-98 (17:57:38).

    The sequence contains 118 residues.

    Using the Type-2 DSM library. 
    2 Type-2 DSMs are available for analyzing this sequence.

    FILTERING RESULTS:

    Model generic    has probability 0.96541
    Model mem_span   has probability 0.034587


    Secondary-Structure Probabilities:

                 RESIDUE      LOOP     HELIX      TURN    STRAND
                       1     0.400     0.205     0.140     0.255
                       2     0.419     0.226     0.062     0.292
                       3     0.415     0.187     0.126     0.272
                       4     0.525     0.132     0.209     0.133
                       5     0.490     0.148     0.261     0.101
                       6     0.570     0.112     0.211     0.107
                       7     0.448     0.199     0.120     0.233
                       8     0.404     0.235     0.094     0.267
                       9     0.313     0.375     0.089     0.222
                      10     0.349     0.396     0.126     0.129
                      11     0.275     0.486     0.125     0.115
                      12     0.252     0.516     0.097     0.135
                      13     0.177     0.594     0.056     0.173
                      14     0.173     0.624     0.024     0.180
                      15     0.155     0.643     0.016     0.186
                      16     0.185     0.648     0.019     0.148
                      17     0.128     0.734     0.027     0.112
                      18     0.128     0.771     0.035     0.066
                      19     0.107     0.819     0.031     0.043
                      20     0.090     0.852     0.022     0.036
                     . . .
                     110     0.536     0.100     0.126     0.238
                     111     0.404     0.110     0.096     0.390
                     112     0.374     0.114     0.073     0.439
                     113     0.323     0.111     0.123     0.444
                     114     0.324     0.110     0.192     0.373
                     115     0.386     0.110     0.194     0.310
                     116     0.364     0.190     0.117     0.329
                     117     0.347     0.238     0.069     0.346
                     118     0.360     0.199     0.052     0.389

    End of Log file for Sequence 23.
The transcript gives exact values (as opposed to reading the plot) for the "generic" versus the "mem_span" model probabilities and secondary-structure probabilities (shown graphically in the
secondary-structure probability plot). The models themselves are described in more detail on the "Description of Type-2 DSMs" page.

For brevity, we omit the secondary-structure probabilities for residues 21 through 109.

Note that, unlike Type-1 analysis, there is no structural class probability plot, as there are only two models, neither of which denotes a specific tertiary structure.


Secondary-Structure Probabilities

Contour plot of probabilities (21K)

This plot provides a detailed view of secondary-structure probabilities. Each row corresponds to a different secondary structural state, and each column corresponds to a different residue position. (See the "Secondary-Structure Probabilities" section of the Type-1 example for a detailed discussion of the secondary structural state used by the psa-request DSMs.) The probabilities of each residue being in each of the structural states are depicted using contour lines of constant probability in increments of 0.1. Areas surrounded by many contour lines are regions of high probability, while areas outside of the contours have low probabilities of less than 0.1.

For example, the 40th residue has probability slightly over 0.5 of being in a loop, because there are five contour lines surrounding the point on the loop row for this residue, though it lies very close to the innermost line.

For reference, here are the DSSP secondary structure assignments for PDB locus 2mhr:

SSstartend
Helix 11937
Helix 24164
Helix 37085
Helix 493109

Notice how this sequence's secondary structure prediction for Type-2 analysis is noticeably different from the secondary-structure probabilities in the Type-1 example. This is because all secondary structure predictions are made in light of the most probable DSM; since the "generic" DSM supports nearly arbitrary combinations of secondary structure, while the Type-1 DSMs are constructed for known folding classes, the Type-1 secondary structure prediction is necessarily much more constrained. In this case, since the answer is known, the Type-1 prediction is also much more nearly correct. In the Type-2 plot, we see two out of the four amphipathic helices, the first and third, depicted clearly with their buried and exposed residues. The second and fourth helices are visible, with the start of the second helix apparently shifted left somewhat. This shift would represent an improvement if not for the fact that the loop state gets greater weight for the first three-quarters of the length of the second helix, and hence nearly misses the second helix altogether.

In contrast, the strand state is still considered rather improbable, except for an odd suggestion of a single amphipathic strand at the very end of the sequence. This serves as another example of how Type-1 analysis is in general preferable, since the Type-2 models do not describe specific structures and hence it is not possible to rule out such spurious isolated strands. However, if the requirements for Type-1 analysis as described in the "Overview" section are not met, it is quite possible, even likely, that no Type-1 DSM is correct for the sequence, in which case the secondary structure predictions would be made with the wrong underlying structural assumption.


Strand/Turn/Helix Probabilities

3 x-y plots (12K)

These plots show the probabilities for each residue position being in a strand, turn, or helix. This is the same information as in the transcript section of the cover letter, which provides exact values for all residues, and a subset of the information in the secondary-structure probabilities plot, which also breaks the helix and strand probabilities down by exposure.

For example, the 20th residue has a probability of greater than 0.8 of being in a helix, and negligible (<= 0.03) of being in a turn or strand. The remaining probability, about 0.09, is the probability of being in a loop state. (The exact values for all residues are included in the cover letter.) The undulating helical probabilities in the third graph support the presence of at least three helices; strands are not likely, but not completely ruled out, either.


Go to:


Please direct your questions and comments about these Web pages and the PSA e-mail server to:

Bob Rogers <rogers@darwin.bu.edu>
BioMolecular Engineering Research Center
Boston University, Boston Massachusetts
Last modified: Mon Mar 12 13:30:03 EST 2001