This is similar to the example in the "Example of an E-Mail Request" section, but uses a different sequence that does in fact contain a WD repeat (it is the SwissProt locus CAFA_HUMAN, which has a a region of four WD repeats) and requests a WD-repeat analysis.
Requesting e-mail message
This shows the e-mail message as it would be composed by the user. The
WWW interface also generates something that
looks like an email message internally, but the user only sees this as
an attachment to the acknowledgement message.
To: psa-request@darwin.bu.edu
Subject: Seq 42
; analysis-assumptions: wd-repeat
; Wilson Brandlesnarf
; BMERC
; Boston MA
; 617-353-7123
Sequence 42
MKVITCEIAWHNKEPVYSLDFQHGTAGRIHRLASAGVDTNVRIWKVEKGP
DGKAIVEFLSNLARHTKAVNVVRFSPTGEILASGGDDAVILLWKVNDNKE
PEQIAFQDEDEAQLNKENWTVVKTLRGHLEDVYDICWATDGNLMASASVD
NTAIIWDVSKGQKISIFNEHKSYVQGVTWDPLGQYVATLSCDRVLRVYSI
QKKRVAFNVSKMLSGIGAEGEARSYRMFHDDSMKSFFRRLSFTPDGSLLL
TPAGCVESGENVMNTTYVFSRKNLKRPIAHLPCPGKATLAVRCCPVYFEL
RPVVETGVELMSLPYRLVFAVASEDSVLLYDTQQSFPFGYVSNIHYHTLS
DISWSSDGAFLAISSTDGYCSFVTFEKDELGIPLKEKPVLNMRTPDTAKK
TKSQTHRGSSPGPRPVEGTPASRTQDPSSPGTTPPQARQAPAPTVIRDPP
SITPAVKSPLPGPSEEKTLQPSSQNTKAHPSRRVTLNTLQAWSKTTPRRI
NLTPLKTDTPPSSVPTSVISTPSTEEIQSETPGDAQGSPPELKRPRLDEN
KGGTESLDP
See the "Example of an E-Mail
Request" section for an explanation of the syntax of e-mail
messages. The first two lines are the e-mail header (it probably looks
different in every system ever written for composing e-mail, so your
system is unlikely to be an exception).
It also includes the request ID assigned by the server upon receipt,
a statement of the fact that the request was for a WD-repeat analysis,
and an indication of the server queue size. If the sequence length
and/or label are not as expected, it could mean that the server had
trouble parsing the message; in that case, please try again. If the
message does not explicitly state that the request is for a WD-repeat
analysis run (i.e. it looks like an e-mail acknowledgement for a
Type-1 analysis request), then the server was unable to parse the "Analysis-Assumptions:"
field and started a Type-1 analysis by
default. (This sort of confusion should never happen for requests
submitted via Web.)
This section covers the first e-mail message returned to the user
when the analysis is complete. Since it is fairly large, we break it
into pieces for purposes of discussion; click here to see the full text of the cover letter.
The next two sections are included only if the sequence is found to
contain a WD repeat. The first such section gives the predicted
structure of the WD repeat in the same tabular format as on the pages
describing members of the WD repeat
family. (In fact, since this is really the SwissProt
locus CAFA_HUMAN, this very alignment, in slightly
different form, appears there on the CAFA_HUMAN WD
repeat page.)
Finally, an attempt is made to find sequence similarities that do not
involve the WD-repeat motif itself between the submitted sequence and
other known WD repeat proteins.
The conserved portions of the WD repeat region itself were shadowed
with X's in order to search for sequences with similar loops; the loops
within the WD-repeat region itself were therefore sought collectively,
rather than independently.
The BLASTP search is done in the hope that knowing other
homologous domains (in the case of leader and trailer sequences) or of
similarities in potential active sites or protein-protein interaction
sites (in the case of intra-repeat loops) will aid in characterizing the
protein.
For more information on BLASTP itself, see either WU-BLAST (Washington
University BLAST) version 2.0 or the earlier NCBI-BLAST version 1.4. At
BMERC, we use WU-BLAST because it can handle gaps, but NCBI-BLAST is in
the public domain.
Finally, the transcript from the compute engine is included. Once
again, this has been excerpted for space; see the full text of the cover letter for
the complete transcript.
E-mail acknowledgement from the server
The acknowledgement consists mostly of an echo of the original mail
message (together with whatever e-mail headers were added in transit).
From: psa@darwin.bu.edu (Protein Structure Analysis server)
To: wb@darwin.bu.edu
Subject: Received request 425: [Seq 42]
Date: Tue, 5 Jan 1999 18:02:28 -0500
We have received your request dated "Tue, 5 Jan 1999 18:02:16 -0500"
containing an amino acid sequence of 559 residues labelled "Sequence
42" for a WD repeat analysis run; it has been queued as request number
425. There are no requests ahead of it in the queue.
--------------------------- Original message ---------------------------
Date: Tue, 5 Jan 1999 18:02:16 -0500
Message-Id: <199901052302.SAA15340@gamow>
From: Wilson Brandlesnarf <wb@darwin.bu.edu>
To: psa-request@darwin.bu.edu
Subject: Seq 42
; analysis-assumptions: wd-repeat
; Wilson Brandlesnarf
; BMERC
; Boston MA
; 617-353-7123
Sequence 42
MKVITCEIAWHNKEPVYSLDFQHGTAGRIHRLASAGVDTNVRIWKVEKGP
DGKAIVEFLSNLARHTKAVNVVRFSPTGEILASGGDDAVILLWKVNDNKE
PEQIAFQDEDEAQLNKENWTVVKTLRGHLEDVYDICWATDGNLMASASVD
NTAIIWDVSKGQKISIFNEHKSYVQGVTWDPLGQYVATLSCDRVLRVYSI
QKKRVAFNVSKMLSGIGAEGEARSYRMFHDDSMKSFFRRLSFTPDGSLLL
TPAGCVESGENVMNTTYVFSRKNLKRPIAHLPCPGKATLAVRCCPVYFEL
RPVVETGVELMSLPYRLVFAVASEDSVLLYDTQQSFPFGYVSNIHYHTLS
DISWSSDGAFLAISSTDGYCSFVTFEKDELGIPLKEKPVLNMRTPDTAKK
TKSQTHRGSSPGPRPVEGTPASRTQDPSSPGTTPPQARQAPAPTVIRDPP
SITPAVKSPLPGPSEEKTLQPSSQNTKAHPSRRVTLNTLQAWSKTTPRRI
NLTPLKTDTPPSSVPTSVISTPSTEEIQSETPGDAQGSPPELKRPRLDEN
KGGTESLDP
E-mail results cover letter
From: psa@darwin.bu.edu (Protein Structure Analysis server)
To: wb@darwin.bu.edu
Subject: Request 425 result (1 of 3): [Seq 42]
Date: Tue, 5 Jan 1999 18:05:46 -0500
The analysis of your protein sequence has been completed.
The tertiary class and profile probabilities we computed for your
sequence are appended below. The profile probabilities "profile1" and
"profile2" shown below identify residues that match the two diagnostic
profiles, to which the regular expression on the WD-repeat protein
webpage at http://bmerc-www.bu.edu/wdrepeat/ is an approximation. A
page of additional graphical output, in PostScript format, is being
sent to you in an additional e-mail message. This plot shows the
tertiary-class probability distributions, indicating the degree to
which the psa-request server believes that the sequence you submitted
could be a WD-repeat and how many repeats it believes the sequence
has. The final message contains a core format file with backbone
coordinates for the sequence as a beta propeller. For more
information, please see the WD repeat example on the
http://bmerc-www.bu.edu/psa/wd-example.htm page.
The initial "announcement" paragraph would have warned of PDB homologs
if any had been found; see the "E-mail results cover letter"
section of the "Example of Type-1 analysis"
page. After the announcement paragraph, there are several paragraphs
explaining the other messages, and how to view the plots; we have
omitted some of those here.
-------------------------- Original sequence ---------------------------
; This is the actual sequence used.
Sequence 42
MKVITCEIAW HNKEPVYSLD FQHGTAGRIH RLASAGVDTN VRIWKVEKGP
DGKAIVEFLS NLARHTKAVN VVRFSPTGEI LASGGDDAVI LLWKVNDNKE
PEQIAFQDED EAQLNKENWT VVKTLRGHLE DVYDICWATD GNLMASASVD
NTAIIWDVSK GQKISIFNEH KSYVQGVTWD PLGQYVATLS CDRVLRVYSI
QKKRVAFNVS KMLSGIGAEG EARSYRMFHD DSMKSFFRRL SFTPDGSLLL
TPAGCVESGE NVMNTTYVFS RKNLKRPIAH LPCPGKATLA VRCCPVYFEL
RPVVETGVEL MSLPYRLVFA VASEDSVLLY DTQQSFPFGY VSNIHYHTLS
DISWSSDGAF LAISSTDGYC SFVTFEKDEL GIPLKEKPVL NMRTPDTAKK
TKSQTHRGSS PGPRPVEGTP ASRTQDPSSP GTTPPQARQA PAPTVIRDPP
SITPAVKSPL PGPSEEKTLQ PSSQNTKAHP SRRVTLNTLQ AWSKTTPRRI
NLTPLKTDTP PSSVPTSVIS TPSTEEIQSE TPGDAQGSPP ELKRPRLDEN
KGGTESLDP1
Following the text, the sequence is echoed in the form used by the
server software.
--------------------- Predicted WD-repeat structure --------------------
Sequence42 [ 1] MKVITCEIAW
------ ------ ------
Sequence42.1 [ 11] HNKEPV YSLDFQ HGTAGRIH RLASAG VDT NVRIWK VEKGPDGKAI
[ 56] VEFLSNLA
Sequence42.2 [ 64] RHTKAV NVVRFS PTGE ILASGG DDA VILLWK VNDNKEPEQI
[105] AFQDEDEAQLNKENWTVVKTLR
Sequence42.3 [127] GHLEDV YDICWA TDGN LMASAS VDN TAIIWD VSKGQKISIF
[168] N
Sequence42.4 [169] EHKSYV QGVTWD PLGQ YVATLS CDR VLRVYS IQKKRVAFNV
[210] SKMLSGIGAEGEARSYRMFHDDSMKSFFRRLSFTPDGSLLLTPAGCVESG
[260] ENVMNTTYVFSRKNLKRPIAHLPCPGKATLAVRCCPVYFELRPVVETGVE
[310] LMSLPYRLVFAVASEDSVLLYDTQQSFPFGYVSNIHYHTLSDISWSSDGA
[360] FLAISSTDGYCSFVTFEKDELGIPLKEKPVLNMRTPDTAKKTKSQTHRGS
[410] SPGPRPVEGTPASRTQDPSSPGTTPPQARQAPAPTVIRDPPSITPAVKSP
[460] LPGPSEEKTLQPSSQNTKAHPSRRVTLNTLQAWSKTTPRRINLTPLKTDT
[510] PPSSVPTSVISTPSTEEIQSETPGDAQGSPPELKRPRLDENKGGTESLDP
------ ------ ------
Residues in columns marked with "------" are predicted to fold into
beta-strands.
Each line starting with the sequence name followed by
".n" describes the nth
repeat. Each repeat sequence is broken into columns that describe the
structure of the repeat; leading, trailing, and inter-repeat loops are
wrapped as necessary to maintain readability. The first three predicted
strands of each blade are shown in the second, fourth, and sixth
sequence columns, identified by sets of six dashes ("------")
placed above and below the columns. The columns are related to the
profiles as follows:
The exact position of the fourth strand varies within the seventh
column, and is not predicted by the profiles.
----------------------- Similar WD repeat proteins ---------------------
Identification of the WD repeat 'domain' implicitly divides the
protein into three fragments: the WD repeat region itself, and the
subsequences before and after it. Regions that were at least 40
residues in length were used independently to search a BLASTP database
of corresponding regions in known WD repeat proteins. The conserved
portions of the WD repeat region itself were shadowed with X's in
order to search for sequences with similar loops, disregarding the
conserved repeat region. For more information, please see the WD
repeat example on the http://bmerc-www.bu.edu/psa/wd-example.htm page.
The BLASTP search results are as follows:
* The amino subsequence is too small for BLASTP searching (less
than 40 residues).
* The loop subsequence (length 189) has no BLASTP matches.
* The carboxy subsequence (length 349) matches CAFA_HUMAN (length
360) with a score of 1825 (P=1.7e-191).
This may help to characterize the protein.
Once the sequence is identified as a WD repeat, one can splice out the
non-WD-repeat leader, trailer, and loop portions, and use those to
search a database made from the corresponding pieces of known WD-repeat
sequences. This is done for subsequences that are at least 40 residues
in length; shorter sequences are presumed not to represent independent
domains, and therefore to be of little use for searching. Raw scores
and P values are reported for all hits with at least 40% equivalent
identities over the match region.
------------------------------ Transcript ------------------------------
Analyzing Sequence 42. This is 5-Jan-99 (18:0:25).
Using the Type-3 DSM library mdatawd.
The sequence contains 559 residues.
FILTERING RESULTS:
3 Most Probable Super Classes:
1st Superclass wd repeat has probability 1
2nd Superclass generic has probability 2.414e-34
3 Most Probable Macro Classes:
1st Macroclass wd4 has probability 1
2nd Macroclass wd7 has probability 4.2808e-08
3rd Macroclass wd5 has probability 1.0328e-08
Profile probabilities
seq profile1 profile2
M 0 0
K 0 0
V 0 0
I 0 0
T 0 0
C 0 0
E 0 0
I 0 0
A 0 0
W 0 0
H 1 0
N 1 0
K 1 0
E 1 0
P 1 0
V 1 0
Y 1 0
S 1 0
L 1 0
D 1 0
F 1 0
Q 1 0
H 0 0
G 0 0
T 0 0
A 0 0
G 0 0
R 0 0
I 0 0
H 0 0
R 0 1
L 0 1
A 0 1
S 0 1
A 0 1
G 0 1
V 0 1
D 0 1
T 0 1
N 0 1
V 0 1
R 0 1
I 0 1
W 0 1
K 0 1
V 0 0
E 0 0
K 0 0
G 0 0
. . .
T 0 0
E 0 0
S 0 0
L 0 0
D 0 0
P 0 0
O 0 0
End of Log file for Sequence 42.
The transcript gives exact values (as opposed to reading the plots) for
both of the two superclasses that are considered, and the
highest-scoring macroclasses (shown graphically in the structural class probability plot). Afterwards, if
the sequence has been predicted to be a WD repeat, the profile
probabilities are shown in tabular format. The number in each column
shows the probability of that residue appearing in any position in the
corresponding profile. The DSMs themselves and the profiles they use
are described in more detail on the "Description
of WD-repeat DSMs" page.
For brevity, we show only enough of the sequence to illustrate the relative placement of the two profiles over the first repeat. Notice that the sequence between the two profile hits (i.e. the sequence of residues with double zeros between the "1/0" and "0/1" residues) is "HGTAGRIH", as it appears in the "Sequence42.1" line in the "Predicted WD-repeat structure" section of the cover letter.
In the structural class probability plot, we see that the generic
superclass has essentially zero probability, and the wd repeat
superclass has a probability near unity. This means that
psa-request is quite confident that the protein sequence is a
WD repeat.
Looking at the macroclass probabilities, we see a strong (not to say
exclusive) preference for wd4, a WD-repeat domain with four
repeats.
In this example, the server discovered four repeats, so the model
contains four beta sheets of four strands each. The sheets are
designated E1 through E4; each strand is preceded by
the secondary structure designator for the sheet to which it belongs.
Here we show the first strand only; the
full text of the core file message has sixteen such strands.
The model is produced by starting with a beta-propeller model
constructed with the appropriate number of blades. At present, there is
a fixed set of models, one for each number of blades from four to ten.
These models were each constructed by selecting a representative blade
from an actual beta propeller structure and replicating it with the
requisite geometry. [need reference, hypertext or otherwise. -- rgr,
16-Dec-98.]
The psa-request server inserts the amino acid names and
sequence indices from the query sequence into the model at the positions
dictated by the profile matches. Accordingly, we see that the sequence
for this strand, "YSLDFQ", is the first strand of the first
blade (the second column on the line labeled "Sequence42.1") in
the "Predicted WD-repeat structure"
section of the cover letter. Atom numbers are assigned arbitrarily from
1. No additional processing (e.g. CHARMm relaxation) is done; users who
submit multiple queries that have the same number of repeats will find
that the returned numeric coordinates are identical.
Go to:
Please direct your questions and comments about these Web pages and
the PSA e-mail server to:
Structural Class Probabilities
(Click on the plot to view a PDF version locally at higher resolution.)
Beta propeller core model
If the server determines that the sequence contains a WD repeat with
four to ten repeats, then the final e-mail message of the set consists
of a skeleton structure for the
sequence.
E1
ATOM 1 N TYR 17 0.87 2.22 8.89 1.00 0.00
ATOM 2 CA TYR 17 0.88 3.40 7.94 1.00 0.00
ATOM 3 C TYR 17 1.23 3.06 6.45 1.00 0.00
ATOM 4 O TYR 17 1.07 3.93 5.55 1.00 0.00
ATOM 5 CB TYR 17 1.87 4.52 8.44 1.00 0.00
ATOM 6 N SER 18 1.68 1.80 6.20 1.00 0.00
ATOM 7 CA SER 18 2.05 1.39 4.82 1.00 0.00
ATOM 8 C SER 18 2.33 -0.14 4.68 1.00 0.00
ATOM 9 O SER 18 2.72 -0.84 5.67 1.00 0.00
ATOM 10 CB SER 18 3.30 2.24 4.34 1.00 0.00
ATOM 11 N LEU 19 2.08 -0.66 3.45 1.00 0.00
ATOM 12 CA LEU 19 2.33 -2.08 3.10 1.00 0.00
ATOM 13 C LEU 19 2.71 -2.11 1.59 1.00 0.00
ATOM 14 O LEU 19 2.38 -1.16 0.83 1.00 0.00
ATOM 15 CB LEU 19 1.07 -2.99 3.37 1.00 0.00
ATOM 16 N ASP 20 3.47 -3.15 1.18 1.00 0.00
ATOM 17 CA ASP 20 3.88 -3.29 -0.24 1.00 0.00
ATOM 18 C ASP 20 4.09 -4.77 -0.63 1.00 0.00
ATOM 19 O ASP 20 4.93 -5.49 -0.01 1.00 0.00
ATOM 20 CB ASP 20 5.21 -2.52 -0.51 1.00 0.00
ATOM 21 N PHE 21 3.31 -5.21 -1.66 1.00 0.00
ATOM 22 CA PHE 21 3.38 -6.61 -2.23 1.00 0.00
ATOM 23 C PHE 21 4.73 -6.88 -2.99 1.00 0.00
ATOM 24 O PHE 21 5.30 -5.97 -3.66 1.00 0.00
ATOM 25 CB PHE 21 2.23 -6.82 -3.28 1.00 0.00
ATOM 26 N GLN 22 5.23 -8.14 -2.90 1.00 0.00
ATOM 27 CA GLN 22 6.48 -8.51 -3.61 1.00 0.00
ATOM 28 C GLN 22 6.09 -8.97 -5.05 1.00 0.00
ATOM 29 O GLN 22 5.00 -9.58 -5.24 1.00 0.00
ATOM 30 CB GLN 22 7.18 -9.70 -2.86 1.00 0.00
The model coordinates are for the backbone atoms and beta carbons only
of the blade strands. No attempt is made to place loop residues or
sidechain atoms beyond the beta carbon.
Bob Rogers
<rogers@darwin.bu.edu>
Last modified: Wed Sep 27 21:24:15 EDT 2000
BioMolecular Engineering Research
Center
Boston University, Boston Massachusetts