Search
Question: edgeR::estimateDisp gives Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1) (minimal working example included)
0
gravatar for endrebak85
3 months ago by
endrebak8510
endrebak8510 wrote:
​require(edgeR) # edgeR_3.12.1

counts = "Sample1 Sample2 Sample3 Sample4
1 14 19 38 39
2 18 22 38 37
3 23 29 35 41
4 24 34 37 39
5 22 37 40 41"

design = "Name Sample
1 Sample1 1
2 Sample2 1
3 Sample3 0
4 Sample4 0"

totals = c(9490840, 9491499, 9483062, 9488195)

counts.df = read.table(text=counts, sep=" ")
design.df = read.table(text=design, sep=" ")

print(counts.df)
print(dim(counts.df))
print(rownames(counts.df))
## print(samples.df)

y <- DGEList(counts.df, lib.size=totals)

y <- estimateDisp(y, design.df)

 

If you copy paste the code above you should get the error:

 

Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1)
Calls: estimateDisp ... glmFit -> glmFit.default -> nonEstimable -> qr -> qr.default
In addition: Warning message:
In qr.default(x) : NAs introduced by coercion
Execution halted

 

Why does this happen? What can I do to fix it?

Ps. the traceback is

> traceback()
6: qr.default(x)
5: qr(x)
4: nonEstimable(design)
3: glmFit.default(y$counts[sel, ], design, offset = offset[sel,
       ], dispersion = 0.05, prior.count = 0)
2: glmFit(y$counts[sel, ], design, offset = offset[sel, ], dispersion = 0.05,
       prior.count = 0)
1: estimateDisp(y, design.df)
ADD COMMENTlink modified 3 months ago • written 3 months ago by endrebak8510
0
gravatar for endrebak85
3 months ago by
endrebak8510
endrebak8510 wrote:

My design matrix is wrong. The first column should be rownames, not a data column.

ADD COMMENTlink written 3 months ago by endrebak8510
1

Even so, you should not be using Sample alone as a design matrix. If you did, you would get:

design <- design.df[,"Sample",drop=FALSE]

##   Sample
## 1      1
## 2      1
## 3      0
## 4      0

Now, in a log-link GLM, the fitted value is equal to the exponential of the offset plus the a linear sum of coefficients. However, all your predictors for samples 3 and 4 are zero, meaning that the fitted value is just the exponential of the offset, which is equal to the library size. In other words, no matter what the gene is, your design matrix is saying that the mean expression in samples 3 and 4 should be equal to the total number of reads in the library. This is surely nonsensical. What you should be doing is something like this:

sample <- factor(design.df$Sample)
design <- model.matrix(~sample)

##   (Intercept) sample1
## 1           1       1
## 2           1       1
## 3           1       0
## 4           1       0
## attr(,"assign")
## [1] 0 1
## attr(,"contrasts")
## attr(,"contrasts")$sample
## [1] "contr.treatment"

... which gives you an intercept term to account for gene-specific expression in all samples.

ADD REPLYlink modified 3 months ago • written 3 months ago by Aaron Lun14k

Thanks for the reply. I did need an intercept :)

ADD REPLYlink written 3 months ago by endrebak8510
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 152 users visited in the last hour