Sub : Limma ebayes
3
0
Entering edit mode
ssrajan86 • 0
@ssrajan86-8563
Last seen 9.2 years ago
Italy

Dear All, 

  can some  please suggest me about the following problem and solution to this.I am not so familiar to gene expression studies,

Q. I have an gene expression data set containing more than 16,000 probes (whole data) after normalization and filtering. 

A subset of these probes (4,000 Probes) were independently considered for E Bayes test.  I performed DE analysis to both set containing whole data (16, 000 Probes) and subset data(4,000 Probes). 

When I compared both the dataset after DE analysis based on adj.p.value , Some Probes which are differentially expressed in the subset data were not found in the whole data. I would expect the same probes which are DE in the subset should be there in Whole data.  It would be great if i get to know whether my assumption is wrong ?

limma ebayes statistical inference • 1.9k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

The intention of the eBayes() function is that you will run it on all the genes, after normalization and filtering. The idea is to utilize information from the whole ensemble of genes. It is not usually correct to rerun eBayes on subsets of genes, and the results will obviously change if you do.

Similarly, you need to apply multiple testing adjustment to all the genes that you are considering in your analysis. For this reason, it is not usually correct to run topTable() on a subset of genes unless there was some a priori reason for focusing on that subset of genes.

ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 14 hours ago
The city by the bay

Without seeing the code, it's hard to tell for sure, but the most probable cause is that you've got fewer probes in the subset data. This means that the severity of the p-value adjustment for multiple testing (the BH method, in this case) is reduced in the subset. As a result, you can end up with probes in the subset that have adjusted p-values lower than the corresponding values from the full set. This would result in cases where probes are significant in the subset analysis but are not significant in the analysis with the full data.

In addition, if you ran eBayes separately on the full and subsetted data, there will be changes in the statistics due to the information being shared in empirical Bayes shrinkage, e.g., prior variance and degrees of freedom. The size of these changes will depend on how you selected the subset. For example, if the subset was formed by selecting high-abundance probes, you'd end up with more low-variance probes (assuming a decreasing mean-variance relationship) such that the estimated prior variance decreases. This would result in lower (unadjusted) p-values in the subset, as all probes are shrunk towards a smaller prior.

ADD COMMENT
0
Entering edit mode
ssrajan86 • 0
@ssrajan86-8563
Last seen 9.2 years ago
Italy

Dear  Aaron Lun, 

 Thanks for your suggestion , I agree that influence of  variance and degrees of freedom in subset data.Probes which have passed P value threshold (corrected ) in the subset data, have not passed  in the full data.Where as in the whole data i found few of these DE probes with adj.P.val "0.05" in other words, they were present but they are borderline. 

m= as.matrix(cbind(x1[,2:4],x1[,5:7]))
rownames(m) = x1[,1]
sam_group<-read.csv("samplebfile.csv")
clas<-sam_group$Batch
design <- model.matrix(~ -1+factor(clas))
colnames(design) <- c("group1","group2")
fit = lmFit(m, design, offset=0)
dim(fit)
contrast.matrix <- makeContrasts(group1-group2,levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit3 <-eBayes(fit2)
fit4<-topTable(fit3, coef=1, adjust="fdr",sort.by="P", number=4000)

            

ADD COMMENT

Login before adding your answer.

Traffic: 568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6