Entering edit mode
Koen Van den Berge
▴
180
@koen-van-den-berge-6369
Last seen 5 months ago
Ghent University, Belgium
Dear Bioconductors,
I am currently analysing gene expression data using microarrays. I
currently have a simple model which looks at plant development. The
design matrix can be replicated as follows:
devStadium <- factor(rep(c("Ovule" , "Seed24h" , "Globular" ,
"Cotyledon" , "MGreen" ,
"PMGreen" , "Seedling"),each=2))
design.null <-model.matrix(~ devStadium ,
contrasts.arg=list(devStadium=contr.sum))
Next, I fit the linear models and construct a contrast matrix:
fit.null <- lmFit(eset.par.diff,design=design.null)
contrast.matrix <- makeContrasts(
Intercept = (Intercept),
Ovule= devStadium1,
Seed24h= devStadium2,
Globular= devStadium3,
Cotyledon= devStadium4,
MGreen= devStadium5,
PMGreen= devStadium6,
Seedling= -c(devStadium1+devStadium2+devStadium3+
devStadium4+devStadium5+devStadium6),
levels = design.null)
fit.null <- contrasts.fit(fit.null, contrast.matrix)
fit.null <- eBayes(fit.null)
Since I have coded the design matrix using sum coding, the intercept
should indicate the average gene expression over all the seven
developmental stadia. I test how many genes show significant
expression during development through:
null.ict.table <-
topTable(fit.null,coef=1,n=nrow(fit.null),adjust.method="BH")
mean(null.ict.table$adj.P.Val < 0.05)
This output tells me that 80.5% of the genes show expression during
plant development.
Furthermore, I would like to perform an F test over all the
developmental stadia I have sampled. In total, there are seven stadia.
However, the application of this F test is uncertain to me.
According to the manual, the F-statistic gained from the eBayes step
tests whether any of the contrasts are non-zero for that gene.
However, since I have defined an intercept in my contrasts, this might
not be the way to go. Therefore, I thought of making a small contrast
matrix (L) myself, using the following code:
L <- matrix(0,nrow=8,ncol=1)
L[2:8,1] <- 1
null.stadia.table <-
topTable(fit.null,coef=L,n=nrow(fit.null),adjust.method="BH")
mean(null.stadia.table$adj.P.Val < 0.05)
My reasoning here is that the 1?s represent the coefficients to be
tested (thus the seven developmental stadia). The results of this
output show me that EXACTLY the same amount of genes turn out to be
significant than with testing the intercept.
I don?t know why this is, but it makes me believe my approach is
wrong. If anyone can explain me why, this would lighten things up for
me. Anyway, I proceeded using something else
null.stadia.table <-
topTable(fit.null,coef=c(2:8),n=nrow(fit.null),adjust.method="BH")
mean(null.stadia.table$adj.P.Val < 0.05)
This approach is similar to using the contrast matrix
L <- matrix(0,nrow=7,ncol=7)
diag(L) <- 1
rbind(rep(0,7),L)
Here, I state testing the seven developmental stadium coefficients,
and I get a lot less genes that turn out to be significant than with
the two approaches above.
My primary question aims at the differences between both contrast
matrices I have constructed in testing while using topTable.
Thank you in advance,
Koen