J Genomics 2016; 4:19-22. doi:10.7150/jgen.16082

Short Research Communication

External RNA Controls Consortium Beta Version Update

Hangnoh Lee1* Corresponding address, P. Scott Pine2*, Jennifer McDaniel2, Marc Salit2, Brian Oliver1

1. Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
2. Joint Initiative for Metrology in Biology, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA
*These authors contributed equally to this work.

This is an open access article distributed under the terms of the Creative Commons Attribution (CC BY-NC) License. See http://ivyspring.com/terms for full terms and conditions.
How to cite this article:
Lee H, Pine PS, McDaniel J, Salit M, Oliver B. External RNA Controls Consortium Beta Version Update. J Genomics 2016; 4:19-22. doi:10.7150/jgen.16082. Available from http://www.jgenomics.com/v04p0019.htm

Abstract

Spike-in RNAs are valuable controls for a variety of gene expression measurements. The External RNA Controls Consortium developed test sets that were used in a number of published reports. Here we provide an authoritative table that summarizes, updates, and corrects errors in the test version that ultimately resulted in the certified Standard Reference Material 2374. We have noted existence of anti-sense RNA controls in the material, corrected sub-pool memberships, and commented on control RNAs that displayed inconsistent behavior.

Keywords: ERCC, spike-in controls, external RNA controls, NIST standard reference materials.

 

Advances in gene expression profiling technologies not only make it possible for individual groups to ask genome-wide questions, but properly controlled experiments with well-described metadata can be used over and over to make discoveries not envisaged by the data producers. Making these data robust and durable is greatly augmented by standard reference materials. The National Institute of Standards and Technology (NIST) as a part of the External RNA Controls Consortium (ERCC) developed 176 DNA plasmids that can be used as templates for RNA controls [1-3]. NIST Standard Reference Material (SRM) 2374 is a library composed of a subset of 96 plasmids. These same materials were used for commercially available ERCC RNA spike-in mixtures (Ambion/Thermo Fisher Scientific, Waltham, MA), which are formulations of 92 RNA molecules derived from the plasmids. The Commercial collection does not include ERCC-00007, -00018, -00023, and, -00128. One of the test versions that led to the SRM contained 96 RNA sequences transcribed from the plasmids, quantified, and mixed to form defined pools to be added to unknowns in transcription profiling experiments by array, sequencing, PCRs, or other assays. These test pools were widely distributed and were used by the human and model organisms Encyclopedia of DNA Elements projects [4, 5].

When “spiked” into an individual RNA sample, the readout from a single pool of ERCC controls can be used as a ruler. Each pool is designed to have dynamic range of 220. It is noteworthy that the actual linear range of their measurement depends on experimental platforms. Distribution of spike-in measurement fits to straight linear line in RNA-Seq and a monotonic sigmoidal pattern against actual abundance in microarrays or bead-arrays [6], consistent with data compression in hybridization-based techniques [7].

Addition of a single pool of ERCC controls generates useful information, but their use can be enhanced when different pools of spike-in controls from different samples are directly compared. The “pools” of ERCC controls were mixed from multiple “subpools”, such that comparisons between “subpools” that belong to different “pools” generate abundance ratios that can be used as differential expression standards. There were two distinct sets of pools in the test version. Pools 12-15 follow a modified Latin-square design, using 5 different subpools (A-E). The numbers of RNA molecules in subpool A are equal in pools 12-15, and thus subpool A molecules generate a constant 1-to-1 proportion between the pools. Subpools B-E have differing molar concentrations that produce a trend in relative abundance across the pools of 1, 1.5, 2.5, 4-fold. For example, if pool 12 and pool 13 were used for two different samples, the log2-transformed ratios between different subpools will be 0, -0.585, -0.687, -0.737, and 2 [6]. The second set of pools, 78A and 78B, provide a pair of samples with reciprocal changes in relative abundance, i.e. 1.5-fold up and down, producing log2 transformed ratios of 0, 0.585, and -0.585.

While production of the spike-in control RNAs was tightly controlled, it was a test set, and there are multiple cases where measurements of spike-in molecules do not match the original description and/or expectations. In this short note, we summarize data outlining problematic ERCC spike-ins. This information should be used in re-evaluating datasets using the test version, as well as any future work that may use remaining aliquots in circulation (Table 1, and Supplementary Material for more details).

The plasmid DNAs were sequenced and deposited in GenBank, however, the in vitro transcribed RNAs were not sequenced except during testing in RNA-Seq experiments. These experiments made it clear that seven ERCC controls had the complementary sequence indicating that the transcripts were from the other strand (ERCC-00009, -00014, -00057, -00059, -00099, -00108, and -00116). As a result, these spike-in controls would not be measurable in hybridization-based assays [6]. Similarly, they would not be aligned in a strand-specific RNA-Seq analysis unless strand specificity was “turned off” in read quantification steps, or complementary sequences were provided for alignment. Additionally, plasmids are replicated in bacteria, where errors can be introduced. Differences in the sequences of the actual RNAs and the plasmids used for transcript templates are known [4], suggesting that such mutations occurred during plasmid propagation in the test set. The certified values of SRM 2374 are the sequences of the plasmids as distributed in the final set, and were determined by exhaustive sequencing [8].

There were instances of pooling errors in the test set. From multiple experiments that used 78A and 78B, we recognized that ERCC-00085 behaves like Subpool “C”, rather than the intended Subpool “B”. Therefore, when pools 78A and 78B were compared, ERCC-00085 displayed 33.3% increased fold changes than the original description. We have not detected ERCC-00084 in our experiments and it is possible that this RNA was prepared from ERCC-00085 plasmid DNA, effectively increasing the measurement of ERCC-00085. Similarly, we have corrected pool membership of ERCC-00113 from Subpool C to Subpool D from pools 12-15. ERCC-00073 and ERCC-00144 did not provide accurate measurements [4, 6]. One reason for poor measurement may be due to the molecular properties of individual spike-in RNA species (e.g. size and secondary structure). Additionally, a previous study pointed out discrepancy in ERCC-00116 measurements between poly-dT based mRNA enrichment and rRNA depletion protocols [3, 9]. The polyA tails on the ERCC spike-ins are not optimal for PolyA+ selection, and using them prior to library production is not recommended [4]. While there could well be additional instances of unexpected behavior of ERCC spike-in measurements, the information we provide here explains the unexpected ERCC behaviors that we have encountered to date.

 Table 1 

Summarized information on NIST distributed ERCC spike-in control test version.

ERCC ControlGenBankaDNAbLength (nt)c% GCcMWSubpool in pool 12 to 15Subpool in pool 78
ERCC-00002aDQ459430Syn106151341,162BB
ERCC-00003aDQ516784Mjan102333327,530AA
ERCC-00004aDQ516752Mjan52334167,216CC
ERCC-00007lEF011068Bsub113546362,636DA
ERCC-00009dDQ668364Bsub98447316,584EC
ERCC-00012DQ883670Syn99451320,263AA
ERCC-00013aEF011062Bsub80843261,415BB
ERCC-00014a,dDQ875385Mjan195744631,409CB
ERCC-00016DQ883664Syn84448271,684DA
ERCC-00017aDQ459420Syn113651367,042EC
ERCC-00018a,lEF011065Bsub102643330,493CC
ERCC-00019DQ883651Syn64449207,543BB
ERCC-00022DQ855004Syn75147241,178CC
ERCC-00023lDQ516744Mjan2733188,186DA
ERCC-00024DQ854993Syn53646173,128EC
ERCC-00025aDQ883689Syn199450640,941AA
ERCC-00028aDQ459419Syn113051364,285BB
ERCC-00031aDQ459431Syn113848365,732EC
ERCC-00033DQ516796Mjan202233651,534DB
ERCC-00034aDQ855001Syn101949328,139EA
ERCC-00035aDQ459413Syn113051364,378AA
ERCC-00039DQ883656Syn74049238,322BB
ERCC-00040aDQ883661Syn74453239,738CB
ERCC-00041EF011069Bsub112345363,007DC
ERCC-00042aDQ516783Mjan102339325,750EB
ERCC-00043aDQ516787Mjan102333330,122AC
ERCC-00044aDQ459424Syn115650372,347BB
ERCC-00046aDQ516748Mjan52235168,087CC
ERCC-00048DQ883671Syn99248320,110DB
ERCC-00051DQ516740Mjan2743488,356CA
ERCC-00053aDQ516785Mjan102331327,971AC
ERCC-00054DQ516731Mjan2743788,966BB
ERCC-00057dDQ668366Bsub102150328,287CA
ERCC-00058aDQ459418Syn113650366,548DC
ERCC-00059dDQ668356Bsub52548168,750EA
ERCC-00060aDQ516763Mjan52331168,195AC
ERCC-00061aDQ459426Syn113650366,454BB
ERCC-00062aDQ516786Mjan102331328,505CA
ERCC-00067DQ883653Syn64447207,451DA
ERCC-00069aDQ459421Syn113750366,664EA
ERCC-00071DQ883654Syn64248206,115AC
ERCC-00073gDQ668358Bsub60347193,958BB
ERCC-00074aDQ516754Mjan52235167,539CA
ERCC-00075aDQ516778Mjan102336325,442DB
ERCC-00076aDQ883650Syn64250206,436EB
ERCC-00077DQ516742Mjan2733387,694AA
ERCC-00078DQ883673Syn99350320,094BB
ERCC-00079DQ883652Syn64449207,757AC
ERCC-00081aDQ854991Syn53449172,323DA
ERCC-00083aDQ516780Mjan102335325,668EA
ERCC-00084eDQ883682Syn99450320,445AC
ERCC-00085eDQ883669Syn84449271,323BB
ERCC-00086aDQ516791Mjan102032328,632CB
ERCC-00092aDQ459425Syn112450361,716DB
ERCC-00095aDQ516759Mjan52137166,307EB
ERCC-00096a,iDQ459429Syn110751356,565AC
ERCC-00097aDQ516758Mjan52336167,189BB
ERCC-00098aDQ459415Syn114351368,970CC
ERCC-00099a,dDQ875387Bsub135041434,408DA
ERCC-00104a,kDQ516815Mjan202233647,370EC
ERCC-00108dDQ668365Bsub102249328,424AA
ERCC-00109aDQ854998Syn53646172,925BB
ERCC-00111DQ883685Syn99447319,359CA
ERCC-00112aDQ459422Syn113647364,932DC
ERCC-00113a,fDQ883663Syn84050270,697DA
ERCC-00116d,jDQ668367Bsub199150639,986BB
ERCC-00117aDQ459412Syn113651365,757CA
ERCC-00120aDQ854992Syn53648172,605DA
ERCC-00123aDQ516782Mjan102236324,911EC
ERCC-00126aDQ459427Syn111951359,790AC
ERCC-00128a,lDQ459428Syn113348364,405BB
ERCC-00130EF011072Bsub105946342,268CC
ERCC-00131aDQ855003Syn77147248,276DA
ERCC-00134aDQ516739Mjan2743188,594EC
ERCC-00136aEF011063Bsub103342333,363AC
ERCC-00137aDQ855000Syn53750173,218BB
ERCC-00138aDQ516777Mjan102233327,949CC
ERCC-00142aDQ883646Syn49350159,090DC
ERCC-00143DQ668362Bsub78449251,705EA
ERCC-00144hDQ854995Syn53846173,404AC
ERCC-00145DQ875386Bsub104244336,179BB
ERCC-00147aDQ516790Mjan102336331,125CA
ERCC-00148DQ883642Syn49449159,911DB
ERCC-00150DQ883659Syn74347239,128EA
ERCC-00154aDQ854997Syn53750173,317AC
ERCC-00156DQ883643Syn49449159,199BB
ERCC-00157aDQ839618Syn101950328,635CC
ERCC-00158aDQ516795Mjan102134328,797DA
ERCC-00160aDQ883658Syn74346239,437EC
ERCC-00162aDQ516750Mjan52336166,409AA
ERCC-00163aDQ668359Bsub54347174,949BB
ERCC-00164aDQ516779Mjan102237324,758CA
ERCC-00165DQ668363Bsub87250279,788DC
ERCC-00168aDQ516776Mjan102434326,399EA
ERCC-00170aDQ516773Mjan102434330,808AB
ERCC-00171DQ854994Syn50548163,022BB

(a) Sequence mismatches between the GenBank entries and the resequenced RNAs (see [4]).

(b) Syn: De novo synthetic design, Mjan: Methanocaldococcus jannaschii, Bsub: Bacillus subtilis.

(c) Length and GC content include poly(A) sequence.

(d) Reversed (anti-sense) in Pools 12 -15.

(e) ERCC-00084 is not detected. E.g. ERCC-00084 and ERCC-00085, may have both been prepared from ERCC-00085 plasmid. ERCC-00085 behaves as C in some batches of Pool 78A and 78B.

(f) Corrected Pool membership to D and corrected Pool concentrations accordingly.

(g) Poor performing.

(h) Consistently under-reports abundance.

(i) Consistently over-reports abundance in Pools 78A and 78B.

(j) Particularly unsuitable for polyA+ isolation.

(k) ERCC-00104 has a length of either 2202 nt or 2203 nt.

(l) Not present in current commercial collections.

Supplementary Material

Attachment

Supplemental file 1.

Abbreviations

ERCC - External RNA Controls Consortium, NIST - National Institute of Standards and Technology, SRM - Standard Reference Material, ENCODE - Encyclopedia of DNA Elements.

Acknowledgements

The authors would like to acknowledge the careful experimental work by Sarah Helber to prepare the RNA and complex mixtures required for the test pools. This work supported in part by the Intramural Research program of the National Institutes of Health, NIDDK.

Disclaimer

Certain commercial equipment, instruments, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology (NIST), nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.

Competing Interests

The authors declare no competing interests.

References

1. Baker SC, Bauer SR, Beyer RP, Brenton JD, Bromley B, Burrill J, Causton H, Conley MP, Elespuru R, Fero M. et al. The External RNA Controls Consortium: a progress report. Nature methods. 2005;2(10):731-734

2. ERCC. Proposed methods for testing and selecting the ERCC external RNA controls. BMC genomics. 2005;6:150

3. Munro SA, Lund SP, Pine PS, Binder H, Clevert DA, Conesa A, Dopazo J, Fasold M, Hochreiter S, Hong H. et al. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures. Nature communications. 2014;5:5125

4. Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B. Synthetic spike-in standards for RNA-seq experiments. Genome research. 2011;21(9):1543-1551

5. https://genome.ucsc.edu/ENCODE/protocols/dataStandards/

6. Pine PS, Munro SA, Parsons JR, McDaniel J, Lucas AB, Lozach J, Myers TG, Su Q, Jacobs-Helber SM, Salit M. Evaluation of the External RNA Controls Consortium (ERCC) reference material using a modified Latin square design. BMC Biotechnol. 2016;16(1):54

7. Malone JH, Oliver B. Microarrays, deep sequencing and the true measure of the transcriptome. BMC biology. 2011;9:34

8. Standard reference material 2374; DNA sequence library for external RNA controls. https://www-s.nist.gov/srmors/certificates/view_certGIF.cfm?certificate=2374

9. Qing T, Yu Y, Du T, Shi L. mRNA enrichment protocols determine the quantification characteristics of external RNA spike-in controls in RNA-Seq studies. Science China Life sciences. 2013;56(2):134-142

Author contact

Corresponding address Corresponding author: Hangnoh Lee, Ph.D. Email: hangnoh.leegov Phone: 301-594-1716/ Fax: 301-496-5239.


Published 2016-7-26