Рубрики

paint

Blend variances to achieve black

b Data is not available in GEO and downloaded from author’s website.


Mixture-model based estimation of gene expression variance from public database improves identification of differentially expressed genes in small sized microarray data

† The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First authors.

Associate Editor: John Quackenbush
Received 2009 Jun 10; Revised 2009 Dec 3; Accepted 2009 Dec 9.
Copyright © The Author(s) 2009. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Motivation: The small number of samples in many microarray experiments is a challenge for the correct identification of differentially expressed gens (DEGs) by conventional statistical means. Information from public microarray databases can help more efficient identification of DEGs. To model various experimental conditions of a public microarray database, we applied Gaussian mixture model and extracted bi- or tri-modal distributions of gene expression. Prior variance of Baldi’s Bayesian framework was estimate for the analysis of the small sample-sized datasets.

Results: First, we estimated the prior variance of a gene expression by pooling variances obtained from mixture modeling of large samples in the public microarray database. Then, using the prior variance, we identified DEGs in small sample-sized test datasets using the Baldi’s framework. For benchmark study, we generated test datasets having several samples from relatively large datasets. Our proposed method outperformed other benchmark methods in terms of detecting gold-standard DEGs from the test datasets. The results may be a challenging evidence for usage of public microarray databases in microarray data analysis.


1 INTRODUCTION

Differential expression analysis of large-scale microarray studies requires more than five replicates in each comparison group for stable results (Hwang et al., 2002; Pavlidis et al., 2003). Many microarray studies, however, are performed with fewer than five samples in each group due to high-cost limitation or scarcity of biological source materials. In the analysis of the small sample-sized microarray data, it is difficult to correctly identify differentially expressed genes using standard group-comparison statistics because estimation of gene-specific variances, with which to determine the statistical significance of observed changes in gene expression, becomes unstable with a small number of replicates.

Many methods have been introduced to address this variance estimation problem. A popular approach has been certain type of regularization of t-test. In the significance analysis of microarrays (SAM) (Tusher et al., 2001), a non-specific small constant is added to all variance estimates so that they are not to be too small. In Cyber-T (Baldi and Long, 2001), a posterior variance in Bayesian framework is used for the variance estimation of a gene combining a prior variance from neighboring genes and a data variance of the gene. Empirical Bayes methods compensate for the small number of replicates by combining information across arrays (Efron and Raftery, 2001; Kendziorski et al., 2003; Maureen et al., 2006). The Bayesian approaches have tried to improve the identification of differentially expressed genes by using information across other genes having similar expression.

On the other hand, a very different approach was suggested by Kim and Park (2004) to estimate the ‘natural’ variance of individual genes using a large number of experiments performed previously. This became possible with large public databases of microarray experiments such as the Gene Expression Omnibus (GEO) (Edgar et al., 2002) and ArrayExpress (Brazma et al., 2003). This approach has a natural strength over the Bayesian methods in that gene-specific variance is estimated not from the expression of other genes but from the prior values of expression of the same gene.

However, the GEO-adjusted method used the information in GEO database without considering any information in experimental data for estimating gene-specific variances. Moreover, the variance estimate is non-specific to the experimental dataset. Expression variance is not only gene-specific but also condition-specific. While one may want to obtain an estimation of gene-specific variance under certain condition that is comparable to that of the experimental dataset, direct computation over the whole GEO database returns the global variance rather than the variance within the desired condition.

Because GEO database is an aggregate of many experiments across many different conditions, we cannot assume that a gene has a single distribution across the whole GEO database. As demonstrated in Figure 1 , the distribution of expression of a gene in GEO database may be composed of multiple distributions. Therefore, it makes more sense to assume that a gene expression has a multi-distributional structure in GEO database, instead of single compositional structure.

Examples about distributions of GEO-wide gene expression. Expression density plots were obtained from ∼1400 microarrays present in GEO database. While the probe 1799_at seems to show uni-modal distribution, the probes 1737_s_at and 195_s_at seem to show bi- and tri-modal distributions, respectively. Without application of the Gaussian mixture model, the density R function generated these bi- and tri-modal distribution plots. It may not be sensible to estimate gene-specific variance assuming that a gene has a single expression distribution across GEO database. Using Gaussian mixture model, we decomposed the distributions of 1737_s_at and 195_s_at into two and three Gaussian distributions, respectively. In the Affymetrix U95A platform, using Gaussian mixture model, 6173 (48.9%) and 4384 (34.7%) among the 12 625 probes are modeled to have bi- and tri-modal distributions, respectively.

In the present study, we performed comparative study about estimating the gene-specific and condition-adjusted variances of gene expression for two group comparisons in microarray data having less than five samples in each group.

The Bayesian framework improves the previous GEO-adjusted method (Kim and Park, 2004) using both reference and experiment information. We found that GMixBayes outperforms the regularized t-test and the GEO-adjusted methods in the identification of estimating prior variances from GEO database using Gaussian mixture model, and then integrates the priors with the data variances from differentially expression genes (DEGs) in gene expression microarray studies. We propose GEO-MixtureBayesian method (GMixBayes in short), the experimental data into posterior variances. The Gaussian mixture model improves the prior variance estimation from GEO database in terms of performance exploiting the multi-distributional structure in GEO database.


000 Line – Black Buna Blend Squeegee

000 Line – Black Buna Blend Squeegee

A squeegee made with an ideal blend of performance and economy. This floor squeegee features standard-duty frame and is recommended for general maintenance. A great bid item for the price conscious. Made in the USA.

  • 16 Gauge plated steel frame
  • 7/32” x 2” black Buna blend rubber
  • All steel frame with rounded corners for protection and safety in shipping
  • Straight frames are equipped with a built in scraper
  • Socket accepts a standard 1 1/8” tapered handle
  • T-66, 66, and PCT-97 sockets are also available
  • The refill blade is the 200 Line

Lengths Available:

014 ─ 14 Inch Floor Squeegee
018 ─ 18 Inch Floor Squeegee
024 ─ 24 Inch Floor Squeegee
030 ─ 30 Inch Floor Squeegee
036 ─ 36 Inch Floor Squeegee
Dimension .250″x2″ (000 Line .219″x2″)
Durometer 55-60
Tensile Strength Fair
Elongation Good
Compression Set Fair to Good
Heat Resistance Poor
Resilience or Rebound Good
Abrasion Resistance Fair to Good
Tear Resistance Fair
Flame Resistance Good
Impermeability, Gas Fair
Weathering Resistance Poor
Low Temp Limit -10°F to -30°F
High Temp Limit 250°F
Colin Wynn
the authorColin Wynn

Leave a Reply