Difference in Goodness of fit when applying limma to one-colour vs. two-colour arrays
2
0
Entering edit mode
@andreasschuettler-8486
Last seen 5.9 years ago
European Union

Hello,

I have a question regarding the outcome of the limma workflow comparing one-colour studies with two-colour common reference studies. I stumbled across the observation that the R-squared seems to be in average much higher for the models in one-colour studies than in two colour studies and I do not have an explanation for this.

So just as an example I take two datasets from the limma usersguide and followed the proposed workflow. The r-squared I calculated as proposed here: limma eBayes: how to determine goodness of fit?

1. One Colour:

#####load data and libraries##########
source("http://bioconductor.org/biocLite.R")
biocLite("ecoliLeucine")
library("ecoliLeucine")
library(limma)
library(affy)
Data <- ecoliLeucine

#####limma workflow
eset <- rma(Data)
strain <- c("lrp-","lrp-","lrp-","lrp-","lrp+","lrp+","lrp+","lrp+")
design <- model.matrix(~factor(strain))
colnames(design) <- c("lrp-","lrp+vs-")
fit <- lmFit(eset, design)
fit <- eBayes(fit)
tabletop_0<-topTable(fit, coef=2, n=40, adjust="BH")

##Goodness of fit
sst<-rowSums(exprs(eset)^2)
ssr<-sst-fit$df.residual*fit$sigma^2
rsq<-ssr/sst
summary(rsq)

2. two-colour

load("../Apoa1.RData") ###downloaded from http://bioinf.wehi.edu.au/limma

MA <- normalizeWithinArrays(RG)
design <- cbind("Control-Ref"=1,"KO-Control"=MA$targets$Cy5=="ApoAI-/-")
fit <- lmFit(MA, design)
fit <- eBayes(fit)
tabletop_1<-topTable(fit,coef=2,number=15,genelist=fit$genes$NAME)
##Goodness of fit
sst<-rowSums(MA$M^2)
ssr<-sst-fit$df.residual*fit$sigma^2
rsq<-ssr/sst
summary(rsq)

So is there anything wrong with my calculation of r-squared (or anything else)? Or do the two-colour Arrays have a worse fit? Or do I miss something important?

I appreciate any comments and help...

Best

Andreas

limma two-colour common reference goodness of fit • 1.1k views
ADD COMMENT
4
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia

The reason why you are getting much higher Rsq for single channel platforms is that you are not computing Rsq correctly, in particular the expression for sst is not correct. The calculation you are using will give incorrectly large values for both platforms, slightly too large for the two colour platform and very much too large for the single channel platform.

I won't give you corrected formulas, because Rsq doesn't seem very useful to me. It just computes correlation between the predictor and the log-expression values, i.e., evaluates differential expression, and the topTable results from limma are a better way to achieve the same aim. You could even transform the moderated t-statistics to correlations if you wanted (but why would you?).

If you want to compare the precision of the different microarray platforms, see this article for a careful platform comparison:

 http://www.ncbi.nlm.nih.gov/pubmed/17118209

See the following article for a theoretical discussion of when a two colour common reference experiment will outperform a single channel version of the same platform:

 http://www.biomedcentral.com/1471-2105/14/165

But that doesn't mean that an early home-made two-colour array (as used for the ApoA1 case study) will perform better than a commercial single channel array with a completely different chemistry (as used for the E coli case study).

ADD COMMENT
0
Entering edit mode

Thanks a lot for this fast and clear answer!
 

ADD REPLY
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States

I'm not sure there is a real take-home message here. There are any number of things that could conspire to make the one-color array data have larger R-squared values than the two-color data.

For example, the E. coli data will tend to be more similar to technical replicates than the mouse data. Even though mice are highly inbred, taking several aliquots from the same solution of E. coli and growing in replicate flasks is not likely to impart much biological variability, so all things equal, I would expect lower intra-group variability for the E. coli data than the mouse data.

I didn't try to track down the provenance of the ApoA1 data, but it is highly likely that those data were generated by different people in a different lab at a different time than the E. coli data. Any one of those differences could impart higher intra-group variability to the ApoA1 data, which you are interpreting as platform differences.

If you really wanted to see if there is a difference between one and two color data, the MAQC has a big data set on GEO that you could play with (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5350 ), where they took pools of the same RNA and sent it to multiple different labs for analysis, using multiple different one and two-color arrays. That would be closer to an apples vs apples comparison.

ADD COMMENT

Login before adding your answer.

Traffic: 440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6