Search
Question: Is it valid to use only genes of interest before eBayes step in limma?
0
2.0 years ago by
minabashir0 wrote:

Hi all,

I used Agilent Microarrays to study gene expression, but actually I'm only interested in non-coding genes. When is the best time to get rid of all the coding genes? Currently, I did this after fitting the linear model, but before eBayes:

fit <- lmFit(E, design = design)
contr <- makeContrasts( A-B, C-B, A-C, levels=design)
fit <- contrasts.fit(fit, contrasts = contr)

nc <- fit[grepl("antisense|non-coding|pseudogene|vault|non-protein", fit$genes$Description), ]
nc.eb <- eBayes(nc)

Is this valid? If not, how would you proceed.

Thank you for you help,

Mina

modified 2.0 years ago by Aaron Lun20k • written 2.0 years ago by minabashir0
3
2.0 years ago by
Aaron Lun20k
Cambridge, United Kingdom
Aaron Lun20k wrote:

The idea of the EB step is to share information across genes to estimate the variance. Even if you aren't (biologically) interested in protein-coding genes, they still provide some useful (statistical) information required for variance estimation. In contrast, if you only have 5 non-coding genes in your nc object, there's not a lot of information to share. Using all genes improves the reliability of the shrinkage statistics (i.e., the estimate of the prior variance and degrees of freedom) and of the downstream DE analysis.

In summary, I would only filter out uninteresting genes after eBayes. Then you can have your cake (reliable variance estimates) and eat it too (fewer tests during multiplicity correction). Note that you should have already filtered out low-abundance genes, as these won't provide much information for variance estimation anyway.