Question: Testing for DE genes
1
2.1 years ago by
Lennart.Vermander10 wrote:

Hello everyone

For a group project, we had to process a raw micro array data set using bioconductor and draw some conclusions. We chose this data set for our project. We looked a bit at the data and preprocessed it, but when we wanted to test for differential expressed genes, we got some strange results. Only a couple of genes showed up as DE, where we expected a lot more. The last few days we've searched through our code for errors and possible explanations, but we don't seem to find any differences in method of working from the examples we got. Can anyone could spot an obvious error in our code or give us another way to work? The code shows the part where we tested for DE genes.

#Obtain expression values
##########################
biocLite("affyPLM")
library("affyPLM")
eset<-threestep(affy.data,background.method="RMA.2",normalize.method="quantile", summary.method="median.polish")

#obtain the expression values on gene level with the function exprs()
e<-exprs(eset)
dim(e)

#construction of design matrix
library(limma)
treatment<-factor(c(rep("Tis11withLPS",3),rep("Tis11",3),rep("GFPwithLPS",3),rep("GFP",3)))
X<-model.matrix(~0+treatment)
colnames(X)<-c("GFP","GFPwithLPS","Tis11","Tis11withLPS")

#fit linear model
lm.fit<-lmFit(e,X)

#construction of contrasts
mc<-makeContrasts("Tis11withLPS-GFPwithLPS","Tis11-GFP","Tis11withLPS-Tis11","GFPwithLPS-GFP", levels=X)
c.fit<-contrasts.fit(lm.fit,mc)
eb<-eBayes(c.fit)

toptable(eb, sort.by = "logFC")
topTableF(eb)

modFpvalue <- eb\$F.p.value
indx <- p.adjust(modFpvalue, method = "bonferroni") < 0.05
sig <- modFpvalue[indx]
nsiggenes <- length(sig)
modified 2.1 years ago by Aaron Lun24k • written 2.1 years ago by Lennart.Vermander10
2
2.1 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

A Bonferroni adjustment for genome-wide data analysis is crazy. You should be using method="BH" instead, if you want to the correction manually via p.adjust. topTable does it automatically anyway, so why not let it do the job for you? Just set n=Inf to get statistics for all genes, and p.value=0.05 to get DE genes.

Most of the code is basically copy-paste from the examples we got, including the use of Bonferonni. But I don't really think the error lies in the correction method. With Bonferonni we got 10 DE genes and with BH that still was only a meager 20 genes. On such a large dataset, that just seems way too low.

4

In that case, there are several other possibilities:

• Have you filtered out low-abundance genes?
• Are there outlier or low-quality arrays to be removed or downweighted with arrayWeights?
• Is there a mean-variance trend that warrants the use of trend=TRUE in eBayes?
• If there are outlier variances, setting robust=TRUE in eBayes may also be useful.

Read the limma user's guide and documentation for more details on each of these functions/settings. If you still get few DE genes... well, maybe that's because you don't have many DE genes. It happens sometimes.

4

Had few spare minutes left.... so I quickly analyzed this data set. Regarding Aaron's final conclusion: if you create an MDS plot using limma's plotMDS function, you indeed see the treatment groups don't nicely separate. Hence, not surprising that only few probe sets are differentially expressed.... Even though LPS was used as stimulus....

Plot is available here:

http://imgur.com/Ykv97b0

Also, realize that by using topTable without speciifying a contrast, you basically check which probe sets are differentially regulated among all 4 tested contrasts. If you specify a coefficient, e.g. topTableF(eb, coef=4), you will get the probe sets that are differentially expressed by LPS  in the control-transfected cells. That may still provide some interesting results....

1

Guido's plot also suggests that there is a strong pair effect. It may be worth digging into the metadata and seeing if samples were generated in batches. If you can block on the batch, you can reduce the variance estimates and improve power to detect DE genes within batches.

Thanks for your replies! For the group task we had to analyze a similar RNAseq dataset as well. We finished this yesterday and the results showed little differential expression too. So we already kind of figured that it was indeed because of the data set, not because of some big mistake we made.

We will try reproducing the MDS-figure and write our conclusions from that, and draw what conclusions we can using the topTable function. Again, thanks for the replies and making time to take a deeper look into it.