RSRE - RNA Structural Robustness Evaluator
 
   Home   |  References  |   Manual   |   License   |   Contact  
1974 sequences analyzed since June 6, 2006.
 
Contents:

Introduction
Randomization methods
Robustness evaluation
Input and options
Output
References

Introduction

    Many biologists have a long-standing interest in biological robustness, going back to Fisher's work on dominance (1-3), and to Waddington's developmental canalization research (4,5), which is defined as the ability to maintain stable functioning in the face of various perturbations. Robustness is a fundamental and ubiquitous observed phenomenon in biological systems, which has been found in RNA viruses (6-8), viroids (9,10) and microRNAs (11,12). Phenotype robustness appears at various levels of biological systems, including gene expression, protein folding, metabolic flux, physiological homeostasis, development, and even organism fitness (13). Depending on whether the perturbations are inheritable or not, robustness is characterized as genetic or environmental robustness (14). Genetic robustness describes insensitivity of a phenotype facing genetic mutations, and the insensitivity to environmental factors is called environmental robustness. A proper understanding of the origin and the principles of robustness in biological systems will catalyze our understanding of evolution (15). Although genetic and environmental robustness of phenotypes seem to be palpable phenomena in nature, robustness, genetic robustness in particular, is exceedingly difficult to measure and to prove in empirical work (16).

    The RSRE (RNA Structural Robustness Evaluator) we described here is a noncommercial web server that developed for RNA structural robustness analysis, both for genetic robustness and environmental robustness. RSRE use random and four types of shuffling sequences including (mono-shuffling, di-shuffling, shuffling based on zero- and first-markov) as control sets for robustness evaluation. Typical RNA structural distance measurement methods, including tree-edit distance, string distance and base-pair distance are taken for use in RSRE. The robustness of a given RNA and its control sequences can be evaluated quantitatively based on a generalized definition of neutrality. RSRE will finally give the statistical significance of the robustness difference between the given RNA and its control sequences.

    RSRE can be valuable for the exploration on the origin and mechanism of RNA robustness, and also be helpful for RNA evolution research.


Randomization methods

   Randomization methods are often used to generate random sequences for extracting statistical significance for properties from biological sequences. The random sequences present the "back-ground noise" from which it is possible to differentiate the real biological information (17). However, a simple randomization method of RNA sequence obscures the frequencies of the mononucleotides and dinucleotides, which are often biased and are crucial for the physical stability of the secondary structure (11,18-20). It is consequently essential to rule out the bias of base composition in the robustness analysis. To this end, we generated four types of shuffled sequences that preserved the exact or nearly exact mononucleotide and dinucleotide base composition as the native sequence, except the random sequences. These randomization methods have been widely used in the thermodynamic stability study of RNA secondary structure (11,18-22), which have been implemented in RSRE. The five randomization methods are described in detail as following:

  1. Random. This method produces random sequences with the same length as the original. The mononucleotide and dinucleotide frequencies are completely distorted using this method.

  2. Shuffling based on zero-order markov model. The mononucleotide frequencies for the initial RNA sequence were calculated and used to generate a random sequence in which bases were simply chosen at random from the mononucleotide frequencies until the length of the native sequence was reached (zero-order Markov process).

  3. Mono-shuffling. This type of shuffling is trivial and is done simply by permuting the nucleotides of the sequence at random. The dinucleotide frequencies are completely distorted using this method.

  4. Shuffling based on first-order markov model. This type of shuffling derives as first-order Markov model from the conditional probabilities found in the initial sequences. A random nucleotide is chosen as a seed for a new sequence after which nucleotides will be added according to the conditional probabilities (first-order Markov process). The process is stopped when the sequence has exactly the same length as the original. This method produces shuffled sequences with dinucleotide frequencies close to the original sequences but that do not have exactly the same values. Mononucleotide frequencies are not preserved.

  5. Di-shuffling. The goal here is to shuffle a sequence while keeping the dinucleotide distribution (or frequencies) constant. An similar implementation of the algorithm as described by Workman and Krogh (20) was used. The dinucleotide and mononucleotide frequencies are exactly preserved.

Robustness evaluation

    Experimental research has demonstrated that the secondary structure of some RNAs are tolerant to some structural changes, such as miRNAs (23-26). To reflect this flexibility in sequence/structure requirements, at a given threshold, Tj, we defined the robustness, R, as follows:

  Rj=<N(d)>,    j=0, 1, 2, ... 9,                   (1)

where d is the structure distance between the secondary structure of the original sequence and the secondary structure of the mutant, and Nj(d) is the number of mutants with structure distance lesser than or equal to the threshold  Tj. Rj  is the average of Nj(d) over all  3 x L  one-mutant neighbors at the threshold Tj. The maximum value of the structural distance between the secondary structure of the random sequences and their mutants was used as a baseline value to evaluate the threshold level of each distance metric. The threshold   Tj, j =0, 1, 2, ...., 9   was set to 0%, 10%, 20%, ..., 90% of the maximum value of the metric, respectively. At the threshold   T0  , robustness is reduced to the definition of neutrality (12). The larger value of the robustness Rj at threshold Tj indicated a relatively higher level of robustness. In order to mitigate the uncertainty of the MFE structure, suboptimal structures of mutants within 1 kcal/mol (the default setting of RNAsubopt) above the MFE are considered. The synthetic estimation method is used here to estimate the difference between the structures of the wild-type and possible structure set of the mutants. It is given by summing the contribution of all structures weighted by their Boltzmann probabilities, which is same as the methods used in some research (36).

Input and options  

    With a step-by-step style input interface, the RSRE web server is easy to use. Due to RSRE is time-consuming, only batch mode is realized now.

Step 1: Entering your Email address.

    For each job, a correct email address is required for notification after job completion.

Step 2: Inputting sequence.

  1. Sequence name. A sequence name may be typed or pasted (entered) within the Enter a name for your sequence: text field. Any characters of AZ and az and number of 09 may be used, but must begin with characters. The length of name is restricted within 40 characters, and any names over 40 characters will be truncated to 40 characters.

  2. Sequence format. The sequence of a RNA molecule can be input either by pasting raw sequence or by uploading sequence file in FASTA format. The sequence should be a string of unmodified RNA/DNA bases (A, U/T, G, and C), any other characters in the sequence will be edited out. MultiFASTA (MFA) format sequence file is also supported to facilitate users.

Step 3: Specifying parameters

  1. Suboptimal structures.Here two modes are supported by RSRE, Not considered and Boltzmann Weighted Distance. For detail, please see the section of robustness evaluation.

  2. Number of control sequences. For each inputting sequence, the limit of the number of control sequences is from 100 to 1,000.

  3. Randomization methods. The RSRE allows the users to select any one of the randomization methods described above according to their request.

Step 4: Evaluating robustness.

    Environmental robustness and genetic robustness are realized in RSRE. For environmental robustness, we only evaluate the thermodynamic stability of RNA sequences. For genetic robustness, we compared the secondary structure between WT and its mutant using a variety of distance measures for secondary structures (29,32-34), including tree-edit distance and string distance (29,35), and base-pair distance (27), which are realized by RNAdistance in Vienna RNA package (version 1.6) (28,29).

Output

    A notification email containing a URL linked to the output page will be sent to the user when the job has been completed. The URL remains valid for 48 hours. Served as an online interactive analysis interface, all the output result can be viewed as graphic representation by selecting the content item and clicking the "view" button on the output page. With a hyperlink located at the bottom of the output page, the output page offers download of the results as a single packed file in ".gz" format for off-line analysis. The result file name is in the form "yymmddhhmmss.no", where "yy" is year, "mm" is month, "dd" is day, "hh" is hour, "mm" is minute, "ss" is second and "no" is serial number. For example, the 1026th sequence submitted at 10:31:07 am local time on 29 October 2007, will be assigned a name of 20071029103107.1026. The analysis results contain:

  1. Robustness distribution. For environmental robustness and genetic robustness with different structural distance metric, we generated a folder separately. In each folder, the robustness distribution histograms at different threshold for genetic robustness and distribution of free energy for environmental are provided in "PNG" image format (Figure 1A ~ 1D). The corresponding free energy values named "MFE.data" and robustness values at different levels named "r.data" for inputting sequences and its control sequences (Figure 1E). In the file "r.data", the first row is the robustness value of the inputting sequence at ten levels (in ten columns). The following N rows are the robustness values of the corresponding control sequences.

  2. Z-scores and P-values. The corresponding Z-scores and P-values for the statistical significance analysis of robustness are provided by RSRE in a file named "z_p.data" (Figure 1F). For environmental robustness, the "z_p.data" file contains only one row. For genetic robustness, the "z_p.data" file contains ten rows for each threshold level. In the "z_p.data" file, the three columns are the corresponding minimum free energy, Z-score and P-value, respectively.

  3. The control sequences.The corresponding control sequences are provided in Multi-FASTA (MFA) format (Figure 1G).

Figure 1. Robustness analysis results of microRNA C. elegans let-7 precursor. The computation is based on the environmental robustness and genetic robustness with base-pair distance metric. The number of control sequences that preserver the mono-nucleotide frequency with let-7 is 1,000. (A) Free energy distribution histogram. (B) ~ (D) robustness value at level 1 ~ 3 distribution histogram of tree distance with coarse grained. (E) The robustness values at all the ten levels of let-7 and the corresponding 1,000 control sequences. (F) The Z-score and p-value of let-7. (G) The corresponding 1,000 control sequences in FASTA format.

References  

1. Fisher,R.A. (1928) The possible modifications of the response of the wild type to recurrent mutations. Amer. Nat., 62, 115-116.

2. Fisher,R.A. (1928) Two further notes on the origin of dominance. Amer. Nat., 62, 571-574.

3. Fisher,R.A. (1931) The evolution of dominance. Biological reviews, 6, 345-368.

4. Waddington,C.H. (1953) The genetic assimilation of an acquired charcter. Evolution, 7, 118-126.

5. Waddington,C.H. (1957) The strategy of the genes. MacMillan, New York.

6. Elena,S.F., Carrasco,P., Daros,J.A. and Sanjuan,R. (2006) Mechanisms of genetic robustness in RNA viruses. EMBO Rep., 7, 168-173.

7. Montville,R., Froissart,R., Remold,S.K., Tenaillon,O. and Turner,P.E. (2005) Evolution of mutational robustness in an RNA virus. PLoS. Biol., 3, e381.

8. Wagner,A. and Stadler,P.F. (1999) Viral RNA and evolved mutational robustness. J. Exp. Zool., 285, 119-127.

9. Sanjuan,R., Forment,J. and Elena,S.F. (2006) In silico predicted robustness of viroids RNA secondary structures. I. The effect of single mutations. Mol. Biol. Evol., 23, 1427-1436.

10. Sanjuan,R., Forment,J. and Elena,S.F. (2006) In Silico Predicted Robustness of Viroids RNA Secondary Structures. II. Interaction Between Mutation Pairs. Mol. Biol. Evol..

11. Bonnet,E., Wuyts,J., Rouze,P. and Van de,P.Y. (2004) Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics., 20, 2911-2917.

12. Borenstein,E. and Ruppin,E. (2006) Direct evolution of genetic robustness in microRNA. Proc. Natl. Acad. Sci. U. S. A, 103, 6593-6598.

13. de Visser,J.A., Hermisson,J., Wagner,G.P., Ancel,M.L., Bagheri-Chaichian,H., Blanchard,J.L., Chao,L., Cheverud,J.M., Elena,S.F., Fontana,W. et al. (2003) Perspective: Evolution and detection of genetic robustness. Evolution Int. J. Org. Evolution, 57, 1959-1972.

14. Wagner,G.P., Booth,G. and Bagheri-Chaichian,H. (1997) A population genetic theory of canalization. Evolution, v51, 329-347.

15. Kitano,H. (2004) Biological robustness. Nat. Rev. Genet., 5, 826-837.

16. Gibson,G. and Wagner,G. (2000) Canalization in evolutionary genetics: a stabilizing theory? Bioessays, 22, 372-380.

17. Ponty,Y., Termier,M. and Denise,A. (2006) GenRGenS: software for generating random genomic sequences and structures. Bioinformatics., 22, 1534-1535.

18. Clote,P., Ferre,F., Kranakis,E. and Krizanc,D. (2005) Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. RNA., 11, 578-591.

19. Katz,L. and Burge,C.B. (2003) Widespread selection for local RNA secondary structure in coding regions of bacterial genes. Genome Res., 13, 2042-2051.

20. Workman,C. and Krogh,A. (1999) No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. Nucleic Acids Res., 27, 4816-4822.

21. Rivas,E. and Eddy,S.R. (2000) Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics., 16, 583-605.

22. Seffens,W. and Digby,D. (1999) mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. Nucleic Acids Res., 27, 1578-1584.

23. Lee,Y., Ahn,C., Han,J., Choi,H., Kim,J., Yim,J., Lee,J., Provost,P., Radmark,O., Kim,S. et al. (2003) The nuclear RNase III Drosha initiates microRNA processing. Nature, 425, 415-419.

24. Zeng,Y., Wagner,E.J. and Cullen,B.R. (2002) Both natural and designed micro RNAs can inhibit the expression of cognate mRNAs when expressed in human cells. Mol. Cell, 9, 1327-1333.

25. Zeng,Y. and Cullen,B.R. (2004) Structural requirements for pre-microRNA binding and nuclear export by Exportin 5. Nucleic Acids Res., 32, 4776-4785.

26. Zeng,Y. and Cullen,B.R. (2003) Sequence requirements for micro RNA processing and function in human cells. RNA., 9, 112-123.

27. Wuchty,S., Fontana,W., Hofacker,I.L. and Schuster,P. (1999) Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers, 49, 145-165.

28. Hofacker,I.L. (2003) Vienna RNA secondary structure server. Nucleic Acids Res., 31, 3429-3431.

29. Hofacker,I.L., Fontana,W., Stadler,P.F., Bonhoeffer,L.S., Tacker,M. and Schuster,P. (1994) Fast folding and comparison of RNA secondary structures. Monatshefte fur Chemie / Chemical Monthly, 125, 167-188.

30. Zuker,M. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 31, 3406-3415.

31. Zuker,M. and Stiegler,P. (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res., 9, 133-148.

32. Fontana,W., Konings,D.A., Stadler,P.F. and Schuster,P. (1993) Statistics of RNA secondary structures. Biopolymers, 33, 1389-1404.

33. Shapiro,B.A. (1988) An algorithm for comparing multiple RNA secondary structures. Comput. Appl. Biosci., 4, 387-393.

34. Shapiro,B.A. and Zhang,K.Z. (1990) Comparing multiple RNA secondary structures using tree comparisons. Comput. Appl. Biosci., 6, 309-318.

35. Hogeweg,P. and Hesper,B. (1984) Energy directed folding of RNA sequences. Nucl. Acids Res., 12, 67-74.

36. Shu,W., Bo,X., Liu,R., Zhao,D., Zheng,Z. and Wang,S. (2006) RDMAS: a web server for RNA deleterious mutation analysis. BMC. Bioinformatics.,7, 404.

RSRE - RNA Structural Robustness Evaluator

Copyright © 2007 Beijing Institute of Radiation Medicine
Maintained by Wenjie Shu