Sci. Aging Knowl. Environ., 20 March 2002
Vol. 2002, Issue 11, p. pe4
[DOI: 10.1126/sageke.2002.11.pe4]


Deciphering the Gene Expression Profile of Long-Lived Snell Mice

Kevin G. Becker

The author is at the DNA Array Unit, National Institute on Aging, National Institutes of Health, Baltimore, MD 21224, USA. E-mail: beckerk{at};2002/11/pe4

Key Words: DNA array • microarray • gene expression profile

"Once upon a time, there was a prince who wanted to marry a princess, but only a real princess would do. There were plenty of princesses, but he could never be certain that they were really, truly princesses... Now the Queen knew that indeed this princess was a true princess, because only a true princess could feel a pea through twenty mattresses." (n = 20). --From The Princess and the Pea, Hans Christian Andersen, 1836. Introduction

How does one know what results are "real" or "not real" in large-scale gene expression analysis (also known as DNA microarray analysis)? This question is a matter of great interest and debate among many research groups studying complex biological processes ranging from yeast biology to cancer genetics to neuroscience to aging. All are struggling with the analysis and interpretation of large-scale gene expression studies. In a recent paper in the Journal of Gerontology (1), Dozmorov et al. report the results of such DNA array experiments performed with the long-lived Snell dwarf mouse. In this Perspective, I argue that although the authors have developed a solid analytical approach, further refinements are needed to produce robust conclusions.

The Snell dwarf mouse (genotype dw/dw) is an interesting model for the study of longevity and age-related physiological changes (2) (see Genetically Altered Mice entry tg13). Like many other complex biological problems, the challenge of characterizing this animal model can be approached with gene expression profiling. Dozmorov et al. identify transcriptional changes in the livers of Snell dwarf mice that they hope will give insight into the origins of the longevity and age-related effects of the dw/dw phenotype.

There is controversy in the field about the proper ways to generate and analyze microarray data, with the standard having been set by elegant work in yeast model systems (3). Many analytical approaches have been proposed based on yeast data sets. What might work well in simple systems, however, might or might not always work in complex multicellular systems.

There are multiple steps at which one could go awry along the path to interpretation of array results. These steps include experimental design, selection of high-quality arrays, isolation of high-quality messenger RNA (mRNA), fluorescent or radioactive labeling of the cDNA produced from the mRNA, development of hybridization parameters, data acquisition and normalization, statistical analysis, and validation of array data. As with any experiment, a number of methods can produce quality data and thus relevant results. Likewise, there are multiple ways to assess data quality. Proper experimental design (4), assay replication (5), appropriate statistical analysis (6), systematic parallel validation such as tissue microarrays (7), sharing of data (8), and replication by independent groups, as well as current realistic expectations of what microarrays can and cannot do, all may contribute to believable, reportable results.

View larger version (51K):
[in this window]
[in a new window]
Fig. 1. DNA array results--authentic or artifactual?

Use of Rigorous Statistical Analysis in Deciphering DNA Arrays

In the Snell dwarf paper, the authors emphasize the need for increasingly rigorous statistical analysis of replicates--an approach that was lacking in a number of older publications of microarray results. They promote statistical analysis as the dominant strategy for assessment of their microarray results and wisely use two different approaches to normalize data, as well as four increasingly conservative approaches to statistical assessment. The key to a robust statistical analysis of microarray data is the use of multiple statistical approaches.

The use of statistics is important, but it is unclear whether rigorous statistical thresholds improve the analysis of array data, especially in the context of small sample sizes (in Dozmorov et al., n = 4 animals per class). In microarray studies, it is hard to know whether a P < 0.00001 means more believable expression data than a P < 0.001. Adjusting statistical thresholds by an order of magnitude, as the authors did, might not produce results as reliable as would validating array data by doing Western blots or quantitative polymerase chain reaction (PCR) experiments on the same samples. Rather than performing more statistics, depositing all of the raw data on a Web site to allow independent groups to freely analyze and rework the experiments might be of greater value, because the ultimate results may be less dependent on the analysis technique. In short, does following Bonferroni's statistical advice, that one should make use of certain corrections for multiple comparisons (9, 10), really allow investigators to sleep a little easier?

Statistical methods are not themselves without assumptions. Controversies also plague the statistical assessment of genetic linkage studies that make use of whole-genome scans (11). Genome scans are systematic polymorphism-based studies. Quite often, these studies are conducted to make sense of complex human disorders and are performed on collections of patient DNA (this DNA contains evenly placed markers throughout the genome). Many statistical assessments of whole-genome scans take into account different assumptions and genetic models; not surprisingly, these studies are prone to false positives and a lack of reproducibility. These studies are essentially preliminary screens, none of which results in the identification of a gene. In part this is because statistical models of complex human disease are imperfect and simplistic. The proof of the pudding doesn't come until investigators roll up their sleeves, go into the lab, and ultimately clone the gene (12). Likewise, in microarray studies, placing too much weight on simple statistical assumptions in place of other validation approaches may be inappropriate.

Statistical False Positives Versus Molecular False Positives

Two classes of approaches can identify false positive results in array analysis: statistical and molecular. Dozmorov et al. use a statistical approach to identify false positives. They adhere to the commonly held belief that the number of false positives in a microarray experiment increases as the sample size (the number of genes on the microarray) increases and that the occurrence of false positives is random (so-called type I errors). They state that type I errors "are particularly likely to plague studies in which hundreds or thousands of genes are examined simultaneously..." In this paper, where 2352 genes are represented on the array, ~118 gene results would therefore be expected to be positive by random chance at a P = 0.05 level of statistical confidence.

This notion is a statistical estimate of false positives based on a single parameter: sample size. It is a useful approach, but quite often is misleading. It is not a direct measurement of false positive signals as reported by the array. The statistical estimate of type I errors does not take into account molecular explanations for positive signals that are ultimately determined to be inappropriately positive. By far the single most common molecular explanation for incorrect biological assignment of array-based gene expression results is cross-hybridization within gene families (13). Cross-hybridization generally occurs because standard wash conditions are used for the entire set of cDNA or oligonucleotide probes.

Fig. 2 lists molecular explanations for "incorrect" positive and negative signals on an array that might be reported as relevant biological results. These incorrect assignments generally result from problems with either array design (incorrect sequence identification), target quality (DNA contamination of RNA), target labeling (poor labeling with the fluorescent probe CY5 versus CY3), or hybridization parameters (imperfect wash conditions for certain probes). Such difficulties, which quite often lead to experimental variation that confounds array analysis, are unrelated to the number of oligonucleotides or cDNA elements on an array.

View larger version (18K):
[in this window]
[in a new window]
Fig. 2. Common reasons for false positive and false negative results encountered in microarray expression analysis.

Judging the false positive rate simply on the size of the array might be a useful benchmark in the analysis of microarrays. In some cases, however, it could lead to a false sense of security if a result happens to sneak in under a given statistical threshold. As researchers in all fields are painfully aware, false positive and false negative artifacts often reproduce quite well. Independent assay validation has historically been the most powerful approach to dealing with inherent errors in a primary screening assay. Although validation assays have their own types of error, individual and systematic errors from the primary screen are generally weeded out in the validation step.

Validation of Microarray Results

Dozmorov et al. used reverse transcription-PCR (RT-PCR) assays to validate the conclusion that 11 genes of interest were differentially expressed. They selected these 11 genes from an initial pool of 60 differentially expressed genes that were identified in their microarray analysis [see Table 2 in (1)]. Differential expression of 7 of the 11 genes (63.6%) was confirmed in the RT-PCR experiments. Of the seven genes that were marked confirmed, there were considerable quantitative differences between the dwarf/normal mouse expression ratios from the microarrays and those from the RT-PCR assays [for example, 0.23 (array) versus 0.02 (RT-PCR) for a phosphodiesterase gene; 0.04 (array) versus 0.00 (RT-PCR) for a hydroxysteroid dehydrogenase; and 0.09 (array) versus 0.01 (RT-PCR) for insulin-like growth factor IA]. These findings do not provide compelling evidence that increasingly rigorous statistical selection alone leads to "real" or reliable microarray results.

A Balanced Approach

Under ideal conditions, the analysis, organization, and presentation of microarray data require a balanced, multilayered approach, because the determination of whether certain data are "real" or reliable is a complex issue. Few researchers would dispute the fact that statistical assessment of microarray data is crucial and has a central role in microarray analysis. Clearly, rigorous statistical analysis is not done often enough in these studies. However, in these early days of microarray experiments, analysis should not rely simply on statistical assessment but also on other equally important experimental practices, such as replicating individual RNA samples; increasing the number of individual biological samples; and, in particular, emphasizing systematic validation of individual results. In addition, public access to complete microarray data sets (not just summary data) should be available for independent analysis, especially as it is easy to do, given Web-based science. It is not always possible to meet these objectives, quite often because of cost considerations or limitations on RNA availability. Efforts toward such goals, however, are well worthwhile. They're likely to be rewarded by robust conclusions that hold up to the tests of experimental scrutiny and time. And maybe, with those refinements to microarray analysis, we'll all sleep a little better.

March 20, 2002

  1. I. Dozmorov, A. Galecki, Y. Chang, R. Krzesicki, M. Vergara, R. A. Miller, Gene expression profile of long-lived Snell dwarf mice. J. Gerontol. A Biol. Sci. Med. Sci. 57, B99-B108 (2002).[Abstract/Free Full Text]
  2. F. Flurkey, J. Papaconstantinou, R. A. Miller, D. E. Harrison, Lifespan extension and delayed immune and collagen aging in mutant mice with defects in growth hormone production. Proc. Natl. Acad. Sci. U.S.A. 98, 6736-6741 (2001).[Abstract/Free Full Text]
  3. J. L. DeRisi, V. R. Iyer, P. O. Brown, Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680-686 (1997).[Abstract/Free Full Text]
  4. S. Draghici, A. Kuklin, B. Hoff, S. Shams, Experimental design, analysis of variance and slide quality assessment in gene expression arrays. Curr. Opin. Drug Discov. Dev. 4, 332-337 (2001).[Medline]
  5. M. L. Lee, F. C. Kuo, G. A. Whitmore, J. Sklar, Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. U.S.A. 97, 9834-9839 (2000).[Abstract/Free Full Text]
  6. V. G. Tusher, R. Tibshirani, G. Chu, Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98, 5116-5121 (2001).[Abstract/Free Full Text]
  7. S. Mousses, A. Kallioniemi, P. Kauraniemi, A. Elkahloun, O. P. Kallioniemi, Clinical and functional target validation using tissue and cell microarrays. Curr. Opin. Chem. Biol. 6, 97-101 (2002).[CrossRef][Medline]
  8. K. G. Becker, The sharing of cDNA microarray data. Nature Rev. Neurosci. 2, 438-440 (2001).[CrossRef][Medline]
  9. D. Curran-Everett, Multiple comparisons: philosophies and illustrations. Am. J. Physiol. Regul. Integr. Comp. Physiol. 279, R1-8 (2000).
  10. T. V. Perneger, What's wrong with Bonferroni adjustments. Br. Med. J. 316, 1236-1238 (1998).[Free Full Text]
  11. P. Visscher, C. Haley, True and false positive peaks in genomewide scans: The long and the short of it. Genet. Epidemiol. 20, 409-414 (2001).[CrossRef][Medline]
  12. J. P. Hugot, M. Chamaillard, H. Zouali, S. Lesage, J. P. Cezard, J. Belaiche, S. Almer, C. Tysk, C. A. O'Morain, M. Gassull, V. Binder, Y. Finkel, A. Cortot, R. Modigliani, P. Laurent-Puig, C. Gower-Rousseau, J. Macry, J. F. Colombel, M. Sahbatou, G. Thomas, Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 411, 599-603 (2001).[CrossRef][Medline]
  13. E. M. Evertsz, J. Au-Young, M. V. Ruvolo, A. C. Lim, M. A. Reynolds, Hybridization cross-reactivity within homologous gene families on glass cDNA microarrays. Biotechniques 31, 1182 (2001).

Science of Aging Knowledge Environment. ISSN 1539-6150