edgeR::estimateDisp gives Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1) (minimal working example included)
1
0
Entering edit mode
endrebak85 ▴ 40
@endrebak85-10660
Last seen 5.3 years ago
github.com/endrebak/
​require(edgeR) # edgeR_3.12.1

counts = "Sample1 Sample2 Sample3 Sample4
1 14 19 38 39
2 18 22 38 37
3 23 29 35 41
4 24 34 37 39
5 22 37 40 41"

design = "Name Sample
1 Sample1 1
2 Sample2 1
3 Sample3 0
4 Sample4 0"

totals = c(9490840, 9491499, 9483062, 9488195)

counts.df = read.table(text=counts, sep=" ")
design.df = read.table(text=design, sep=" ")

print(counts.df)
print(dim(counts.df))
print(rownames(counts.df))
## print(samples.df)

y <- DGEList(counts.df, lib.size=totals)

y <- estimateDisp(y, design.df)

 

If you copy paste the code above you should get the error:

 

Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1)
Calls: estimateDisp ... glmFit -> glmFit.default -> nonEstimable -> qr -> qr.default
In addition: Warning message:
In qr.default(x) : NAs introduced by coercion
Execution halted

 

Why does this happen? What can I do to fix it?

Ps. the traceback is

> traceback()
6: qr.default(x)
5: qr(x)
4: nonEstimable(design)
3: glmFit.default(y$counts[sel, ], design, offset = offset[sel,
       ], dispersion = 0.05, prior.count = 0)
2: glmFit(y$counts[sel, ], design, offset = offset[sel, ], dispersion = 0.05,
       prior.count = 0)
1: estimateDisp(y, design.df)
edgeR estimatedisp • 5.1k views
ADD COMMENT
0
Entering edit mode
endrebak85 ▴ 40
@endrebak85-10660
Last seen 5.3 years ago
github.com/endrebak/

My design matrix is wrong. The first column should be rownames, not a data column.

ADD COMMENT
1
Entering edit mode

Even so, you should not be using Sample alone as a design matrix. If you did, you would get:

design <- design.df[,"Sample",drop=FALSE]

##   Sample
## 1      1
## 2      1
## 3      0
## 4      0

Now, in a log-link GLM, the fitted value is equal to the exponential of the offset plus the a linear sum of coefficients. However, all your predictors for samples 3 and 4 are zero, meaning that the fitted value is just the exponential of the offset, which is equal to the library size. In other words, no matter what the gene is, your design matrix is saying that the mean expression in samples 3 and 4 should be equal to the total number of reads in the library. This is surely nonsensical. What you should be doing is something like this:

sample <- factor(design.df$Sample)
design <- model.matrix(~sample)

##   (Intercept) sample1
## 1           1       1
## 2           1       1
## 3           1       0
## 4           1       0
## attr(,"assign")
## [1] 0 1
## attr(,"contrasts")
## attr(,"contrasts")$sample
## [1] "contr.treatment"

... which gives you an intercept term to account for gene-specific expression in all samples.

ADD REPLY
0
Entering edit mode

Thanks for the reply. I did need an intercept :)

ADD REPLY

Login before adding your answer.

Traffic: 612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6