edgeR: Error in if (any(input.mean < 0)) stop("input.mean must be non-negative") :
2
0
Entering edit mode
@bastianhornung-8471
Last seen 9.4 years ago
Netherlands

Hi everyone,

I have a problem with edgeR right now, and I cannot figure out what the cause is.

I use these commands:

x <- read.delim("input.csv",row.names="Contigs")
group <- c(1,1,2,2,3,3,1,1,4,4,5,5,1,1,6,6,7,7,8,8,9,9,10,10,11,11,12,13,13,14,14,15,15,16,16)
library("edgeR")
d <- DGEList(counts=x, group=group)
d$samples$lib.size <- colSums(d$counts)
d <- calcNormFactors(d)
d <- estimateCommonDisp(d, verbose=TRUE)

And then get the following error.

Error in if (any(input.mean < 0)) stop("input.mean must be non-negative") :
  missing value where TRUE/FALSE needed
Calls: estimateCommonDisp -> equalizeLibSizes -> q2qnbinom

The file does not contain any negative values (but quite a few 0), so I don't understand where this error is coming from.

Any help would be appreciated :).

(I'd like to attach the input file, but right now can't find a way to do that, help o_O)

EDIT: The input is here: http://www.mediafire.com/view/1pv8ocwlll2cth6/input.test.csv

edger • 2.8k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 days ago
United States

That's a pretty crazy data set. Only 8 samples have counts for more than 50% of the contigs (27 have fewer than 50% of the contigs counted, and 19 have counts for fewer than 10% of the contigs). Anyway, when you run calcNormFactors(), you are converting all of your normalization factors to NaN, which will obviously keep any downstream analysis from functioning properly:

> dglst <- DGEList(dat[,-1])
> dglst$samples
    group lib.size norm.factors
X1      1     1357            1
X2      1     1554            1
X3      1       33            1
X4      1       84            1
X5      1       98            1
X6      1       41            1
X7      1     1597            1
X8      1      693            1
X9      1       30            1
X10     1       69            1
X11     1      116            1
X12     1      153            1
X13     1      740            1
X14     1      849            1
X15     1      123            1
X16     1      105            1
X17     1      119            1
X18     1      121            1
X19     1      143            1
X20     1      343            1
X21     1      131            1
X22     1      113            1
X23     1       76            1
X24     1       59            1
X25     1      132            1
X26     1      163            1
X27     1       94            1
X28     1        5            1
X29     1       13            1
X30     1       17            1
X31     1       13            1
X32     1       73            1
X33     1       93            1
X34     1       27            1
X35     1       35            1
> dglst <- calcNormFactors(dglst)
> dglst$samples
    group lib.size norm.factors
X1      1     1357          NaN
X2      1     1554          NaN
X3      1       33          NaN
X4      1       84          NaN
X5      1       98          NaN
X6      1       41          NaN
X7      1     1597          NaN
X8      1      693          NaN
X9      1       30          NaN
X10     1       69          NaN
X11     1      116          NaN
X12     1      153          NaN
X13     1      740          NaN
X14     1      849          NaN
X15     1      123          NaN
X16     1      105          NaN
X17     1      119          NaN
X18     1      121          NaN
X19     1      143          NaN
X20     1      343          NaN
X21     1      131          NaN
X22     1      113          NaN
X23     1       76          NaN
X24     1       59          NaN
X25     1      132          NaN
X26     1      163          NaN
X27     1       94          NaN
X28     1        5          NaN
X29     1       13          NaN
X30     1       17          NaN
X31     1       13          NaN
X32     1       73          NaN
X33     1       93          NaN
X34     1       27          NaN
X35     1       35          NaN

I don't have any suggestions for you - I am not sure you can do anything reasonable with such sparse data. Perhaps either Gordon Smyth or Aaron Lun will have some ideas that you can try.

ADD COMMENT
0
Entering edit mode

You could certainly use voom(), which will convert all those zeros to some non-zero value, and then run a conventional ANOVA analysis on your data. However, given the huge differences between samples, you end up with pretty much every contig being significantly different. But maybe this is just a subset of your data?

ADD REPLY
0
Entering edit mode

I had a similar sparse data set from some single-cell RNA-Seq data, and I used voom() precisely because of the NaNs from calcNormFactors(). If you have single-cell data, you might also try the monocle package. I spoke with Cole Trapnell at the BioC2105 conference, and he said it should work fine when you only have dozens of samples as opposed to hundreds, like I thought it needed. Haven't tried it myself on my data yet, but I've got it on my To Do list...

Jenny

ADD REPLY
0
Entering edit mode

Hmm... nothing obvious comes to mind. There just doesn't seem to be enough non-zero features (i.e., contigs) for reliable TMM normalization. Indeed, even in the best case where all the counts are non-zero, TMM normalization will still assume that the majority of contigs are non-DE. This may not be appropriate if you're expecting a large number of these contigs to be different between samples. Spike-ins might have been useful in this situation as they would have allowed us to avoid the problems above (with some caveats), but I don't see any in your test data.

Also, to add to James' comment; while voom will technically work, I don't anticipate that it will give sensible results. For starters, normalization in voom is traditionally computed from the normalization factors from edgeR. If we used the latter, we'd end up with the same problem as that in the original post. If we didn't normalize at all, that'll compromise the quantitative interpretation of the results. Modelling of the mean-variance trend in voom also struggles at low counts with high discreteness. This is exacerbated by the small numbers of features which precludes reliable trend fitting.

ADD REPLY
0
Entering edit mode

Thanks for all the answers :).

The dataset is indeed only a part. They are genes belonging to...er...a taxonomic entity (can't even call that OTU) from a metatranscriptomic dataset, therefore it's so sparse in some areas. I need to separate the "organisms" in there, because the population vastly changes between samples, and normalizing this as one will not give any results which make sense.

Using another method might be an idea, but I'd need to use that for the whole dataset.

I see that tmm might not be appropriate, but the problem is that it might be appropriate for some of the subsets, but not for others, and there are too many to deal with that manually, so I'd need a general approach.

Maybe using RPKM might be the best approach here. The advanced statistics are not really made for this type of data, and an approach which does not derive anything from the whole behaviour of the dataset is maybe the best.

Any other thoughts would be appreciated :), because my statistics are rather weak :/.

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 14 hours ago
WEHI, Melbourne, Australia

Yes, it's a crazy dataset, but you can still analyse it in edgeR.

Note you can get results out of calcNormFactors() specifying a refColumn or by filtering out low count Contigs before you run calcNormFactors().

But you don't need to run calcNormFactors() to use edgeR, and here it is better to omit because thereĀ aren't enough positive counts to make scaling normalization meaningful.

An MDS plot shows clear differences between the groups:

> counts <- read.delim("input.test.csv",row.names="Contigs")
> keep <- rowSums(counts)>4
> dge <- DGEList(counts=counts[keep,])
> group <- c(1,1,2,2,3,3,1,1,4,4,5,5,1,1,6,6,7,7,8,8,9,9,10,10,11,11,12,13,13,14,14,15,15,16,16)
> plotMDS(dge, labels=group)

The safest DE method would be QL in edgeR. For example, to find contigs DE in group 8 vs group 1:

> group <- factor(group)
> design <- model.matrix(~group)
> fit <- glmQLFit(dge,design,robust=TRUE)
> plotQLDisp(fit)
> results <- glmQLFTest(fit,coef="group8")
> topTags(results)
ADD COMMENT

Login before adding your answer.

Traffic: 457 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6