I have a question regarding the outcome of the limma workflow comparing one-colour studies with two-colour common reference studies. I stumbled across the observation that the R-squared seems to be in average much higher for the models in one-colour studies than in two colour studies and I do not have an explanation for this.
So just as an example I take two datasets from the limma usersguide and followed the proposed workflow. The r-squared I calculated as proposed here: limma eBayes: how to determine goodness of fit?
1. One Colour:
#####load data and libraries########## source("http://bioconductor.org/biocLite.R") biocLite("ecoliLeucine") library("ecoliLeucine") library(limma) library(affy) Data <- ecoliLeucine #####limma workflow eset <- rma(Data) strain <- c("lrp-","lrp-","lrp-","lrp-","lrp+","lrp+","lrp+","lrp+") design <- model.matrix(~factor(strain)) colnames(design) <- c("lrp-","lrp+vs-") fit <- lmFit(eset, design) fit <- eBayes(fit) tabletop_0<-topTable(fit, coef=2, n=40, adjust="BH") ##Goodness of fit sst<-rowSums(exprs(eset)^2) ssr<-sst-fit$df.residual*fit$sigma^2 rsq<-ssr/sst summary(rsq)
load("../Apoa1.RData") ###downloaded from http://bioinf.wehi.edu.au/limma MA <- normalizeWithinArrays(RG) design <- cbind("Control-Ref"=1,"KO-Control"=MA$targets$Cy5=="ApoAI-/-") fit <- lmFit(MA, design) fit <- eBayes(fit) tabletop_1<-topTable(fit,coef=2,number=15,genelist=fit$genes$NAME)
##Goodness of fit sst<-rowSums(MA$M^2) ssr<-sst-fit$df.residual*fit$sigma^2 rsq<-ssr/sst summary(rsq)
So is there anything wrong with my calculation of r-squared (or anything else)? Or do the two-colour Arrays have a worse fit? Or do I miss something important?
I appreciate any comments and help...