Search
Question: edgeR::estimateDisp gives Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1) (minimal working example included)
0
15 months ago by
endrebak8530
endrebak8530 wrote:
​require(edgeR) # edgeR_3.12.1

counts = "Sample1 Sample2 Sample3 Sample4
1 14 19 38 39
2 18 22 38 37
3 23 29 35 41
4 24 34 37 39
5 22 37 40 41"

design = "Name Sample
1 Sample1 1
2 Sample2 1
3 Sample3 0
4 Sample4 0"

totals = c(9490840, 9491499, 9483062, 9488195)

print(counts.df)
print(dim(counts.df))
print(rownames(counts.df))
## print(samples.df)

y <- DGEList(counts.df, lib.size=totals)

y <- estimateDisp(y, design.df)

If you copy paste the code above you should get the error:

Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1)
Calls: estimateDisp ... glmFit -> glmFit.default -> nonEstimable -> qr -> qr.default
In qr.default(x) : NAs introduced by coercion
Execution halted

Why does this happen? What can I do to fix it?

Ps. the traceback is

> traceback()
6: qr.default(x)
5: qr(x)
4: nonEstimable(design)
3: glmFit.default(y$counts[sel, ], design, offset = offset[sel, ], dispersion = 0.05, prior.count = 0) 2: glmFit(y$counts[sel, ], design, offset = offset[sel, ], dispersion = 0.05,
prior.count = 0)
1: estimateDisp(y, design.df)
modified 15 months ago • written 15 months ago by endrebak8530
0
15 months ago by
endrebak8530
endrebak8530 wrote:

My design matrix is wrong. The first column should be rownames, not a data column.

1

Even so, you should not be using Sample alone as a design matrix. If you did, you would get:

design <- design.df[,"Sample",drop=FALSE]

##   Sample
## 1      1
## 2      1
## 3      0
## 4      0

Now, in a log-link GLM, the fitted value is equal to the exponential of the offset plus the a linear sum of coefficients. However, all your predictors for samples 3 and 4 are zero, meaning that the fitted value is just the exponential of the offset, which is equal to the library size. In other words, no matter what the gene is, your design matrix is saying that the mean expression in samples 3 and 4 should be equal to the total number of reads in the library. This is surely nonsensical. What you should be doing is something like this:

sample <- factor(design.df$Sample) design <- model.matrix(~sample) ## (Intercept) sample1 ## 1 1 1 ## 2 1 1 ## 3 1 0 ## 4 1 0 ## attr(,"assign") ## [1] 0 1 ## attr(,"contrasts") ## attr(,"contrasts")$sample
## [1] "contr.treatment"

... which gives you an intercept term to account for gene-specific expression in all samples.