Question: Error when calculating DE using the classical approach in edgeR
0
2.7 years ago by
lynski0080 wrote:

I am trying to use edgeR to analyse differential gene expression from my RNA-seq dataset.  The experimental design is as follows: I had 3 different treatment conditions i.e. 1 factor with 3 different levels.  And there are 3 replicate RNA samples (counts libraries) for each of the 3 treatments i.e. 9 samples in total.  I have made a DGEList with the counts for the 9 samples.  I have filtered to remove features with metatags and low expression.  I have also made a design matrix giving the information that there are 3 groups with 3 samples assigned to each group.  The design matrix (called design) looks like this:

             Group1       Group2        Group3
Sample1         1           0              0
Sample2         1           0              0
Sample3         1           0              0
Sample4         0           1              0
Sample5         0           1              0
Sample6         0           1              0
Sample7         0           0              1
Sample8         0           0              1
Sample9         0           0              1

I have also estimated dispersion using the following
​
​
DG <- estimateDisp(DG, design, robust=TRUE)

where DG is my DGEList and design is my design matrix.  Because my experimental design is very simple I don't think I need to use a GLM approach to analyse differential expression and I only want to do pair-wise comparisons i.e. find DE between group 1 and group 2 and between group 1 and group 3.  So I think I want to use the edgeR classic approach.  However, when I try the following code:

> et <- exactTest(DG, pair=c("group1","group2"))

​ I get the following error:

At least one element of given pair is not a group.
Groups are: 1

Can someone please help explain where I am going wrong?  I think that my DGEList is not getting the information from my data matrix (design) or perhaps my data matrix is over-complicated for my experimental design and I need to somehow include the group information in my DGEList.  At the moment if I print my DGEList the $samples table looks like this: $samples
files group lib.size norm.factors
counts3D7_1-1 counts3D7_1-1     1 13251744    2.0632398
counts3D7_2-1 counts3D7_2-1     1 13809955    0.9912349
counts3D7_3-1 counts3D7_3-1     1 12328705    1.3071140
counts3D7_1-2 counts3D7_1-2     1 12605616    0.6537380
counts3D7_2-2 counts3D7_2-2     1 13392599    0.6290064

(with not all rows printed) so I think that the correct group information isn't there.  Any help is greatly appreciated.

modified 2.7 years ago by Gordon Smyth37k • written 2.7 years ago by lynski0080
Answer: Error when calculating DE using the classical approach in edgeR
4
2.7 years ago by
Aaron Lun23k
Cambridge, United Kingdom
Aaron Lun23k wrote:

To answer your immediate question, you need to specify the groupings in group:

DG$samples$group <- rep(c("group1", "group2", "group3", each=3))

... and then re-run estimateDisp and exactTest. The design matrix is not used in the classic method, so you can't expect the grouping information there to carry through to exactTest. Indeed, supplying design to estimateDisp will use the GLM approach for estimating the dispersions. While this is not wrong per se, it doesn't make much sense to use one approach for estimation and another for testing.

Now, at a more conceptual level, a simple experimental design doesn't preclude the use of the GLM approach in edgeR. I always use GLMs to analyze one-way layouts - and, in your case, since you've already set up the design matrix, why not go all the way? In fact, I dare say the QL-based methods (which use GLMs) would outperform the exact test as the former accounts for the uncertainty of the dispersion estimates and the latter does not. There's no trade-off in performance here; both the GLM-based approach and the exact test work well, but one works for a variety of designs and the other... doesn't.

Many thanks for your helpful suggestions.  I am now calculating differential expression using the GLM approach.  One further question: you say that supplying design to estimateDisp will use the GLM approach for estimating the dispersions.  Does this mean that I don't need to go back and use qlmQLFit to estimate dispersions prior to calculating differential expression using glmQLFTest ?

1

You still need to use both of them. estimateDisp estimates the negative binomial dispersions, while glmQLFit estimates the quasi-likelihood dispersions. The book chapter has more details on this: