I got interested in edgeR's capability of dealing with low counts and came across an error which turned out to be non-reproducible if you follow the function's body code line by line. I made a reproducible example for the non-reproducible error :-).
`counts = matrix(c(3,0,0,0,0,0, 0,0,0,0,4,0, 0,3,0,0,0,0, 0,0,0,3,0,0, 0,3,0,1,0,0, 0,0,0,0,3,0, 0,0,4,0,0,0, 0,3,0,0,0,0, 3,0,0,0,0,0, 3,0,0,0,0,0, 0,0,0,8,9,0),byrow=TRUE,nrow=11,ncol=6) dispersion= 1.077855e-07 design = model.matrix(~factor(rep(c("A","B"),each=3))`
Now if you run edgeR's base function for estimating tag wise dispersions
dispCoxReidInterpolateTagwise(y=counts, dispersion=1.077855e-07, design=model.matrix(~factor(rep(c("A","B"),each=3)))) Error in dispCoxReidInterpolateTagwise(y = counts, dispersion = 1.077855e-07, : no residual degrees of freedom
You get the error message saying it has no residual degrees of freedom. Curiously, this error is not printed when running over the function's body code using the same arguments.
Even more curious: the function works when you remove a seemingly problematic line, which is the last line in this case. It is similar when you have a matrix of a 100 genes: this one line seems to produce the error.
dispCoxReidInterpolateTagwise(y=counts[-11,], dispersion=1.077855e-07, design=model.matrix(~factor(rep(c("A","B"),each=3))))
Which returns the tag wise dispersions. I realise the dispersion is very low, but it is what is estimated from the larger matrix of low counts and I think this should not pose an error. I have been looking at the dispCoxReidInterpolateTagwise
function because the error was called when I ran estimateTagwiseDisp on the matrix.
<font face="monospace">How to fix this error?</font>
<font face="monospace">I have edgeR version 3.12.0 installed.</font>
Yes, it is a bug in the internal subsetting step. It's been fixed and it will be updated online soon.
Note that the
dispCoxReidInterpolateTagwise
function usesmin.row.sum=5
to filter out lowly expressed genes by default. Only genes (or rows) with total sum of counts above 5, which is the last row only in the example, are used for dispersion estimation. This bug affects the data of which only one gene is above the threshold. When the seemingly problematic line (the last line) is removed, it simply returns the input values ofdispersion
, i.e., it does nothing (or it works).Thank you Yunshun and Aaron, this cleared things up. Indeed, it works when using the
estimateDisp
rather than theestimateGLMTagwisdeDisp
function, this was a great help.Thank you for your reply. I realise the usefulness of the
estimateDisp function
though I need the estimateGLMTagwise function to work as I want the unshrunken tagwisde dispersion (i.e. in my estimateGLMTagwise function I set the prior.df to zero).You can do the same for
estimateDisp
, if so inclined. But you should be very careful about how you use the resulting estimate. The dispersion will not be stably estimated without shrinkage if you only have ~6 libraries, so the resulting inferences from any fitted model are likely to be quite unstable.