J Genomics 2016; 4:19-22. doi:10.7150/jgen.16082 This volume
Short Research Communication
1. Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
2. Joint Initiative for Metrology in Biology, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA
*These authors contributed equally to this work.
Spike-in RNAs are valuable controls for a variety of gene expression measurements. The External RNA Controls Consortium developed test sets that were used in a number of published reports. Here we provide an authoritative table that summarizes, updates, and corrects errors in the test version that ultimately resulted in the certified Standard Reference Material 2374. We have noted existence of anti-sense RNA controls in the material, corrected sub-pool memberships, and commented on control RNAs that displayed inconsistent behavior.
Keywords: ERCC, spike-in controls, external RNA controls, NIST standard reference materials.
Advances in gene expression profiling technologies not only make it possible for individual groups to ask genome-wide questions, but properly controlled experiments with well-described metadata can be used over and over to make discoveries not envisaged by the data producers. Making these data robust and durable is greatly augmented by standard reference materials. The National Institute of Standards and Technology (NIST) as a part of the External RNA Controls Consortium (ERCC) developed 176 DNA plasmids that can be used as templates for RNA controls [1-3]. NIST Standard Reference Material (SRM) 2374 is a library composed of a subset of 96 plasmids. These same materials were used for commercially available ERCC RNA spike-in mixtures (Ambion/Thermo Fisher Scientific, Waltham, MA), which are formulations of 92 RNA molecules derived from the plasmids. The Commercial collection does not include ERCC-00007, -00018, -00023, and, -00128. One of the test versions that led to the SRM contained 96 RNA sequences transcribed from the plasmids, quantified, and mixed to form defined pools to be added to unknowns in transcription profiling experiments by array, sequencing, PCRs, or other assays. These test pools were widely distributed and were used by the human and model organisms Encyclopedia of DNA Elements projects [4, 5].
When “spiked” into an individual RNA sample, the readout from a single pool of ERCC controls can be used as a ruler. Each pool is designed to have dynamic range of 220. It is noteworthy that the actual linear range of their measurement depends on experimental platforms. Distribution of spike-in measurement fits to straight linear line in RNA-Seq and a monotonic sigmoidal pattern against actual abundance in microarrays or bead-arrays , consistent with data compression in hybridization-based techniques .
Addition of a single pool of ERCC controls generates useful information, but their use can be enhanced when different pools of spike-in controls from different samples are directly compared. The “pools” of ERCC controls were mixed from multiple “subpools”, such that comparisons between “subpools” that belong to different “pools” generate abundance ratios that can be used as differential expression standards. There were two distinct sets of pools in the test version. Pools 12-15 follow a modified Latin-square design, using 5 different subpools (A-E). The numbers of RNA molecules in subpool A are equal in pools 12-15, and thus subpool A molecules generate a constant 1-to-1 proportion between the pools. Subpools B-E have differing molar concentrations that produce a trend in relative abundance across the pools of 1, 1.5, 2.5, 4-fold. For example, if pool 12 and pool 13 were used for two different samples, the log2-transformed ratios between different subpools will be 0, -0.585, -0.687, -0.737, and 2 . The second set of pools, 78A and 78B, provide a pair of samples with reciprocal changes in relative abundance, i.e. 1.5-fold up and down, producing log2 transformed ratios of 0, 0.585, and -0.585.
While production of the spike-in control RNAs was tightly controlled, it was a test set, and there are multiple cases where measurements of spike-in molecules do not match the original description and/or expectations. In this short note, we summarize data outlining problematic ERCC spike-ins. This information should be used in re-evaluating datasets using the test version, as well as any future work that may use remaining aliquots in circulation (Table 1, and Supplementary Material for more details).
The plasmid DNAs were sequenced and deposited in GenBank, however, the in vitro transcribed RNAs were not sequenced except during testing in RNA-Seq experiments. These experiments made it clear that seven ERCC controls had the complementary sequence indicating that the transcripts were from the other strand (ERCC-00009, -00014, -00057, -00059, -00099, -00108, and -00116). As a result, these spike-in controls would not be measurable in hybridization-based assays . Similarly, they would not be aligned in a strand-specific RNA-Seq analysis unless strand specificity was “turned off” in read quantification steps, or complementary sequences were provided for alignment. Additionally, plasmids are replicated in bacteria, where errors can be introduced. Differences in the sequences of the actual RNAs and the plasmids used for transcript templates are known , suggesting that such mutations occurred during plasmid propagation in the test set. The certified values of SRM 2374 are the sequences of the plasmids as distributed in the final set, and were determined by exhaustive sequencing .
There were instances of pooling errors in the test set. From multiple experiments that used 78A and 78B, we recognized that ERCC-00085 behaves like Subpool “C”, rather than the intended Subpool “B”. Therefore, when pools 78A and 78B were compared, ERCC-00085 displayed 33.3% increased fold changes than the original description. We have not detected ERCC-00084 in our experiments and it is possible that this RNA was prepared from ERCC-00085 plasmid DNA, effectively increasing the measurement of ERCC-00085. Similarly, we have corrected pool membership of ERCC-00113 from Subpool C to Subpool D from pools 12-15. ERCC-00073 and ERCC-00144 did not provide accurate measurements [4, 6]. One reason for poor measurement may be due to the molecular properties of individual spike-in RNA species (e.g. size and secondary structure). Additionally, a previous study pointed out discrepancy in ERCC-00116 measurements between poly-dT based mRNA enrichment and rRNA depletion protocols [3, 9]. The polyA tails on the ERCC spike-ins are not optimal for PolyA+ selection, and using them prior to library production is not recommended . While there could well be additional instances of unexpected behavior of ERCC spike-in measurements, the information we provide here explains the unexpected ERCC behaviors that we have encountered to date.
Summarized information on NIST distributed ERCC spike-in control test version.
|ERCC Control||GenBanka||DNAb||Length (nt)c||% GCc||MW||Subpool in pool 12 to 15||Subpool in pool 78|
(a) Sequence mismatches between the GenBank entries and the resequenced RNAs (see ).
(b) Syn: De novo synthetic design, Mjan: Methanocaldococcus jannaschii, Bsub: Bacillus subtilis.
(c) Length and GC content include poly(A) sequence.
(d) Reversed (anti-sense) in Pools 12 -15.
(e) ERCC-00084 is not detected. E.g. ERCC-00084 and ERCC-00085, may have both been prepared from ERCC-00085 plasmid. ERCC-00085 behaves as C in some batches of Pool 78A and 78B.
(f) Corrected Pool membership to D and corrected Pool concentrations accordingly.
(g) Poor performing.
(h) Consistently under-reports abundance.
(i) Consistently over-reports abundance in Pools 78A and 78B.
(j) Particularly unsuitable for polyA+ isolation.
(k) ERCC-00104 has a length of either 2202 nt or 2203 nt.
(l) Not present in current commercial collections.
Supplemental file 1.
ERCC - External RNA Controls Consortium, NIST - National Institute of Standards and Technology, SRM - Standard Reference Material, ENCODE - Encyclopedia of DNA Elements.
The authors would like to acknowledge the careful experimental work by Sarah Helber to prepare the RNA and complex mixtures required for the test pools. This work supported in part by the Intramural Research program of the National Institutes of Health, NIDDK.
Certain commercial equipment, instruments, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology (NIST), nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.
The authors declare no competing interests.
1. Baker SC, Bauer SR, Beyer RP, Brenton JD, Bromley B, Burrill J, Causton H, Conley MP, Elespuru R, Fero M. et al. The External RNA Controls Consortium: a progress report. Nature methods. 2005;2(10):731-734
2. ERCC. Proposed methods for testing and selecting the ERCC external RNA controls. BMC genomics. 2005;6:150
3. Munro SA, Lund SP, Pine PS, Binder H, Clevert DA, Conesa A, Dopazo J, Fasold M, Hochreiter S, Hong H. et al. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures. Nature communications. 2014;5:5125
4. Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B. Synthetic spike-in standards for RNA-seq experiments. Genome research. 2011;21(9):1543-1551
6. Pine PS, Munro SA, Parsons JR, McDaniel J, Lucas AB, Lozach J, Myers TG, Su Q, Jacobs-Helber SM, Salit M. Evaluation of the External RNA Controls Consortium (ERCC) reference material using a modified Latin square design. BMC Biotechnol. 2016;16(1):54
7. Malone JH, Oliver B. Microarrays, deep sequencing and the true measure of the transcriptome. BMC biology. 2011;9:34
8. Standard reference material 2374; DNA sequence library for external RNA controls. https://www-s.nist.gov/srmors/certificates/view_certGIF.cfm?certificate=2374
9. Qing T, Yu Y, Du T, Shi L. mRNA enrichment protocols determine the quantification characteristics of external RNA spike-in controls in RNA-Seq studies. Science China Life sciences. 2013;56(2):134-142
Corresponding author: Hangnoh Lee, Ph.D. Email: hangnoh.leegov Phone: 301-594-1716/ Fax: 301-496-5239.