limma-voom when testing differential expression of a subset of genes.
2
2
Entering edit mode
@ankurchakravarthy10-8368
Last seen 6.8 years ago
United Kingdom

Dear list users, 

I am trying to test if a given, predefined set of genes is differentially expressed between two groups in an RNA-seq experiment (count data, using voom) . I am therefore interested in only evaluating the hypothesis of differential expression within my set, so as to limit the burden of multiple-testing, for instance. 

In light of this, which of the following is likely to be the most appropriate approach. 

1) reduce the counts matrix to my gene set first, estimate mean-variance weights using voom, go on to use lmFit, eBayes and topTable to identify DEGs

or

2) model the mean-variance relationship across the whole matrix of counts with voom, then subset the voom object to only include my genes of interest, and then use lmFit, eBayes and topTable. 

or 

3) Carry out mean-variance modelling and shrinkage on the whole dataset, then subset the topTable to my gene set and re-estimate false discovery rates from the raw P values, since the only hypotheses I intended to test were those related to my subset of genes. 

In terms of numbers, I am looking at a subset of 153 genes out of a total of 20,000. 

 

Best wishes, 

Ankur Chakravarthy

limma voom differential gene expression • 2.5k views
ADD COMMENT
3
Entering edit mode
@james-w-macdonald-5106
Last seen 34 minutes ago
United States

You should use #3, as both voom() and eBayes() are intended to use all available data to make their estimates. And do note that MArrayLM objects can be subsetted as if they were data.frames or matrices. So if you do

v <- voom(dglst.object)
fit <- lmFit(v, design)
fit2 <- contrasts.fit(fit, contrast) ## if necessary
fit2 <- eBayes(fit2)
fit2 <- fit2[<indicator variable to subset>,]
topTable(fit2, 1)

then the BH adjustment for the genes in your topTable will be based on just the subset of genes you care about.

ADD COMMENT
3
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia

Just to add to Jim's answer, you should filter out unexpressed or very low expressed genes before running voom. So #3 is correct provided that the "whole dataset" is interpreted to mean all expressed genes.

ADD COMMENT

Login before adding your answer.

Traffic: 754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6