MoRFchibi SYSTEM

Overview

MoRFchibi SYSTEM [1] predicts Molecular Recognition Features (MoRFs) in an amino acid query sequence. For each residue, six propensity scores are generated:

  • MoRF Propensities:
    1. MoRFCHiBi_Web, MCW [2]: an overall MoRF prediction propensity score generated by incorporating (MC) and (MDC) scores.
    2. MoRFCHiBi_Light, MCL [1]: MoRF prediction propensity score generated by incorporating (MC) MoRF prediction and (IDP) protein disorder prediction scores. This score mainly target longer MoRFs.
    3. MoRFCHiBi, MC [3]: MoRF prediction solely based on the local physiochemical properties of the amino acid sequence.
    4. MoRFDC, MDC [2]: MoRF prediction based on the protein disorder prediction (IDP) and conservation information (ICS).
  • Disordered Propensity, IDP: IDP provides long trends protein disordered prediction based on ESpritz [4] with the DisProt option normalized [2].
  • Conservation Propensity, ICS: ICS provides a general conservation propensity score assembled by aligning the query sequence to the SwissProt and the UniRef90 databases [2].

Each of these scores is normalized to approximately fit a Gaussian probability density function specified by the normal distribution N(0.5, 0.01) and is limited to the range [0..1] as described in the article [2].

While our objective is to predict MoRF residues (i.e. the MCW propensities score), providing component scores enables researchers to improve their interpretation of the MoRF prediction when extra information is available. For example, let's say that a region has a moderate overall MCW propensity scores and the IDP component of that score is low, then if some experimental data is available showing that region to be disordered, one can infer that the MoRF propensity score could possibly be higher. Also, manually curated multiple sequence alignments can be used to improve the quality of the conservation and thus the interpretation of MoRF propensity scores.

Submitting A Job

To process amino acid sequences, enter (paste) sequences in FASTA format into the input box and click 'Submit Job'. When a job is submitted, each sequence is scanned for input errors and then entered into the queue structure and a job record is inserted into the 'Jobs' table for each sequence. Once a sequence is processed, results are made available through the results page.

Results Page

Each page holds the outcome of processing one query sequence and is saved in the server for 48 hours. Page life can be renewed manually by clicking the renew button    in the results page, or in the 'Saved For' column of the Jobs table.

The link to the results page is available through the [Ready/Not Ready] button in the 'Results' column of the Jobs table.

Note that in the case of losing the browser session, links to all results pages will also be lost. If an email address is provided, notification emails are sent upon job completion with an attached copy of the results file and a link to the results page. Otherwise, it is strongly recommended that links to result pages be saved manually or bookmarked.

Input

The HTML server process sequences in standard fasta format. Spaces in the input sequence are ignored. When the Case Sensitive option is selected (default case), amino acids can only be represented with uppercase letters, lowercase letters will generate an error. When not selected, both lower and upper case letters can be used, and numbers are ignored and removed from the input sequence.

An example sequence is available by clicking 'Input Example'. To clear the input box, click 'Clear'.

Graph Output

By default, graphs will display only the MoRFCHiBi_Web MoRF propensity score for the full sequence. Other values can be displayed by clicking on the appropriate label.

Obviously there is no ideal way to decide on a cut off for a binary MoRF/nonMoRF decision. Since predictors are not perfect, i.e AUC is less than 1, any cut-off value is likely to leave MoRF residues unidentified and/or to wrongly identify non-MoRF residues as MoRFs. Therefore, we recommend that each user (researcher) use a cut off-value that is best suited to his/her tolerance of false positive/negative. However, some users may prefer to use a generic cut-off, in this case we suggest a MoRFCHiBi_Web value around 0.725. At this cut-off, MoRFCHiBi_Web has a (TPR, FPR) of (0.567, 0.077) and (0.405, 0.057) on TEST_EXP53 and TEST_HT, respectively. The Toggle MoRF Bands option display/hide this binary MoRF/nonMoRF decision. Sections with less than 4 residues above this cut-off are not identified as MoRFs. Note that since propensity scores are not perfect, a longer MoRF might have only some of its residues above a given cut-off value.

The vertical Propensity scale is adjusted automatically to best fit the data, the Toggle Y-Axis Bounds can be used to change the Y-axis bounds to [0-1].

The menu provides printing or downloading the graph in PNG, JPEG, PDF, or SVG format.

One can select an area using the mouse to zoom-in.

Text Output Example

01 #

02 # MoRFchibi SYSTEM <Release 1.0 Dec. 15 2015>

03 #

04 # The University of British Columbia

05 # Michael Smith Laboratories - Center for High-Throughput Biology

06 #

07 # Column Data type
08 # 1 residue index
09 # 2 residue
10 # 3 MCW
11 # 4 MCL
12 # 5 MC
13 # 6 MDC
13 # 7 IDP
14 # 8 ICS

16 #

17 >Example

18 1 M 0.733845 0.647014 0.59774 0.6498 0.496945 0.754775
19 2 K 0.717923 0.628357 0.59755 0.63156 0.496565 0.485545
20 3 E 0.720699 0.63223 0.598595 0.633745 0.49837 0.52193
21 4 F 0.718552 0.629868 0.59869 0.63118 0.498592 0.670653
22 ... ...... ...... ........................
23 ... ...... ..............................

Lines 1 to 16: Results header.

Line 17: The sequence FASTA title

Lines 18 to the end of file:

  • Column 1: Sequence residue index.
  • Column 2: Sequence residue.
  • Column 3: MCW: MoRFchibi_Web propensity score.
  • Column 4: MCL: MoRFchibi_Light propensity score. This score target longer MoRFs
  • Column 5: MC : MoRFchibi prediction solely based on the local physiochemical properties of the amino acid sequence.
  • Column 6: MDC: MoRFdc propensity score based on protein long trends of disorder and conservation information.
  • Column 7: IDP: Long trends of protein disorder propensity score generated by ESpritz with the DisProt option.
  • Column 8: ICS: Residues initial conservation score, a general conservation propensity score assembled from the two PSSM generated by aligning the query sequence to the SwissProt and the UniRef90 databases.

... All scores are normalized as described in [2]...

Notification Email

If an email address is provided (providing an email address is optional), a notification email will be sent once a job is processed. Notification emails includes an attached copy of the results file and a link to the results page.

Jobs Table

Each 'Job' record has the following fields:

  1. Id: a unique integer job id.
  2. Label: the job label is the FASTA sequence title. Job labels are not unique.
  3. Size [residues]: the size of the sequence in residues.
  4. Status: one of following five values:
    • <Processing> The sequence is currently been processed.
    • <Position: x> The job is in the server queue at position x.
    • <Pending> The job is in the private user queue.
    • <Completed in Xs> The job has completed in X seconds.
    • <Failed> Error Message.
  5. Results: Provides a link to the results page, and button [Graph] to display the graph.
  6. Save For: displays the number of hours left in the results page life. A renew button     renews that life to 48 hours.
The Queue Structure

A two tier queue system with a server queue and user queues is implemented to prevent a single user of dominating the server with a large number of jobs. In this structure, each user can place up to two jobs in the server queue. If a user submits more than two jobs, those extra jobs will be placed temporarily in that user private queue. Once a user's job in the server queue is completed, the job at the top of that user's queue (if exist) will be moved to the tail of the server queue. User queues are located on the server, thus, once the link to the result page is secured, users can safely close the browser.

References

Status