Testing for DE genes
1
1
Entering edit mode
@lennartvermander-13038
Last seen 6.9 years ago

Hello everyone

For a group project, we had to process a raw micro array data set using bioconductor and draw some conclusions. We chose this data set for our project. We looked a bit at the data and preprocessed it, but when we wanted to test for differential expressed genes, we got some strange results. Only a couple of genes showed up as DE, where we expected a lot more. The last few days we've searched through our code for errors and possible explanations, but we don't seem to find any differences in method of working from the examples we got. Can anyone could spot an obvious error in our code or give us another way to work? The code shows the part where we tested for DE genes.

#Obtain expression values 
########################## 
biocLite("affyPLM") 
library("affyPLM") 
eset<-threestep(affy.data,background.method="RMA.2",normalize.method="quantile", summary.method="median.polish") 

#obtain the expression values on gene level with the function exprs() 
e<-exprs(eset) 
head(e) 
dim(e) 

#construction of design matrix 
library(limma) 
treatment<-factor(c(rep("Tis11withLPS",3),rep("Tis11",3),rep("GFPwithLPS",3),rep("GFP",3))) 
X<-model.matrix(~0+treatment) 
colnames(X)<-c("GFP","GFPwithLPS","Tis11","Tis11withLPS")

#fit linear model 
lm.fit<-lmFit(e,X)

#construction of contrasts 
mc<-makeContrasts("Tis11withLPS-GFPwithLPS","Tis11-GFP","Tis11withLPS-Tis11","GFPwithLPS-GFP", levels=X) 
c.fit<-contrasts.fit(lm.fit,mc) 
eb<-eBayes(c.fit)

toptable(eb, sort.by = "logFC")
topTableF(eb)

#Extract p-values & adjustment
modFpvalue <- eb$F.p.value
indx <- p.adjust(modFpvalue, method = "bonferroni") < 0.05
sig <- modFpvalue[indx]
nsiggenes <- length(sig)
affymetrix microarrays limma differential gene expression • 1.5k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

A Bonferroni adjustment for genome-wide data analysis is crazy. You should be using method="BH" instead, if you want to the correction manually via p.adjust. topTable does it automatically anyway, so why not let it do the job for you? Just set n=Inf to get statistics for all genes, and p.value=0.05 to get DE genes.

ADD COMMENT
0
Entering edit mode

Most of the code is basically copy-paste from the examples we got, including the use of Bonferonni. But I don't really think the error lies in the correction method. With Bonferonni we got 10 DE genes and with BH that still was only a meager 20 genes. On such a large dataset, that just seems way too low.

 

ADD REPLY
4
Entering edit mode

In that case, there are several other possibilities:

  • Have you filtered out low-abundance genes?
  • Are there outlier or low-quality arrays to be removed or downweighted with arrayWeights?
  • Is there a mean-variance trend that warrants the use of trend=TRUE in eBayes?
  • If there are outlier variances, setting robust=TRUE in eBayes may also be useful.

Read the limma user's guide and documentation for more details on each of these functions/settings. If you still get few DE genes... well, maybe that's because you don't have many DE genes. It happens sometimes.

ADD REPLY
4
Entering edit mode

Had few spare minutes left.... so I quickly analyzed this data set. Regarding Aaron's final conclusion: if you create an MDS plot using limma's plotMDS function, you indeed see the treatment groups don't nicely separate. Hence, not surprising that only few probe sets are differentially expressed.... Even though LPS was used as stimulus....

 

Plot is available here:

http://imgur.com/Ykv97b0

 

Also, realize that by using topTable without speciifying a contrast, you basically check which probe sets are differentially regulated among all 4 tested contrasts. If you specify a coefficient, e.g. topTableF(eb, coef=4), you will get the probe sets that are differentially expressed by LPS  in the control-transfected cells. That may still provide some interesting results....

ADD REPLY
1
Entering edit mode

Guido's plot also suggests that there is a strong pair effect. It may be worth digging into the metadata and seeing if samples were generated in batches. If you can block on the batch, you can reduce the variance estimates and improve power to detect DE genes within batches.

ADD REPLY
0
Entering edit mode

Thanks for your replies! For the group task we had to analyze a similar RNAseq dataset as well. We finished this yesterday and the results showed little differential expression too. So we already kind of figured that it was indeed because of the data set, not because of some big mistake we made. 

We will try reproducing the MDS-figure and write our conclusions from that, and draw what conclusions we can using the topTable function. Again, thanks for the replies and making time to take a deeper look into it. 

ADD REPLY

Login before adding your answer.

Traffic: 883 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6