Question: Testing for DE genes
gravatar for Lennart.Vermander
13 months ago by
Lennart.Vermander10 wrote:

Hello everyone

For a group project, we had to process a raw micro array data set using bioconductor and draw some conclusions. We chose this data set for our project. We looked a bit at the data and preprocessed it, but when we wanted to test for differential expressed genes, we got some strange results. Only a couple of genes showed up as DE, where we expected a lot more. The last few days we've searched through our code for errors and possible explanations, but we don't seem to find any differences in method of working from the examples we got. Can anyone could spot an obvious error in our code or give us another way to work? The code shows the part where we tested for DE genes.

#Obtain expression values 
eset<-threestep(,background.method="RMA.2",normalize.method="quantile", summary.method="median.polish") 

#obtain the expression values on gene level with the function exprs() 

#construction of design matrix 

#fit linear model<-lmFit(e,X)

#construction of contrasts 
mc<-makeContrasts("Tis11withLPS-GFPwithLPS","Tis11-GFP","Tis11withLPS-Tis11","GFPwithLPS-GFP", levels=X)<,mc) 

toptable(eb, = "logFC")

#Extract p-values & adjustment
modFpvalue <- eb$F.p.value
indx <- p.adjust(modFpvalue, method = "bonferroni") < 0.05
sig <- modFpvalue[indx]
nsiggenes <- length(sig)
ADD COMMENTlink modified 13 months ago by Aaron Lun19k • written 13 months ago by Lennart.Vermander10
gravatar for Aaron Lun
13 months ago by
Aaron Lun19k
Cambridge, United Kingdom
Aaron Lun19k wrote:

A Bonferroni adjustment for genome-wide data analysis is crazy. You should be using method="BH" instead, if you want to the correction manually via p.adjust. topTable does it automatically anyway, so why not let it do the job for you? Just set n=Inf to get statistics for all genes, and p.value=0.05 to get DE genes.

ADD COMMENTlink modified 13 months ago • written 13 months ago by Aaron Lun19k

Most of the code is basically copy-paste from the examples we got, including the use of Bonferonni. But I don't really think the error lies in the correction method. With Bonferonni we got 10 DE genes and with BH that still was only a meager 20 genes. On such a large dataset, that just seems way too low.


ADD REPLYlink written 13 months ago by Lennart.Vermander10

In that case, there are several other possibilities:

  • Have you filtered out low-abundance genes?
  • Are there outlier or low-quality arrays to be removed or downweighted with arrayWeights?
  • Is there a mean-variance trend that warrants the use of trend=TRUE in eBayes?
  • If there are outlier variances, setting robust=TRUE in eBayes may also be useful.

Read the limma user's guide and documentation for more details on each of these functions/settings. If you still get few DE genes... well, maybe that's because you don't have many DE genes. It happens sometimes.

ADD REPLYlink modified 13 months ago • written 13 months ago by Aaron Lun19k

Had few spare minutes left.... so I quickly analyzed this data set. Regarding Aaron's final conclusion: if you create an MDS plot using limma's plotMDS function, you indeed see the treatment groups don't nicely separate. Hence, not surprising that only few probe sets are differentially expressed.... Even though LPS was used as stimulus....


Plot is available here:


Also, realize that by using topTable without speciifying a contrast, you basically check which probe sets are differentially regulated among all 4 tested contrasts. If you specify a coefficient, e.g. topTableF(eb, coef=4), you will get the probe sets that are differentially expressed by LPS  in the control-transfected cells. That may still provide some interesting results....

ADD REPLYlink modified 13 months ago • written 13 months ago by Guido Hooiveld2.3k

Guido's plot also suggests that there is a strong pair effect. It may be worth digging into the metadata and seeing if samples were generated in batches. If you can block on the batch, you can reduce the variance estimates and improve power to detect DE genes within batches.

ADD REPLYlink written 13 months ago by Aaron Lun19k

Thanks for your replies! For the group task we had to analyze a similar RNAseq dataset as well. We finished this yesterday and the results showed little differential expression too. So we already kind of figured that it was indeed because of the data set, not because of some big mistake we made. 

We will try reproducing the MDS-figure and write our conclusions from that, and draw what conclusions we can using the topTable function. Again, thanks for the replies and making time to take a deeper look into it. 

ADD REPLYlink written 13 months ago by Lennart.Vermander10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 287 users visited in the last hour