Search
Question: scran::trendVar throwing error
0
10 months ago by
zhalehsafikhani0 wrote:

Hello,

I want to use the trendVar function in screen packs to get the highly variant genes across my single cell samples.
But for some reason it is not working for me and throwing the following error:

Error in .trend_var(assay(x, i = assay.type), ..., subset.row = subset.row) :
need at least 4 values for non-linear curve fitting

I think it throws the error at lout <- .Call(cxx_fit_linear_model, QR$qr, QR$qraux, x, subset.row - 1L, FALSE)

Apparently doesn’t keep enough genes for some reason.

This is how I use it though:

dim(cells.counts)

[1] 18574  3483

sce <- SingleCellExperiment(list(counts=cells.counts))
sce <- scran::computeSumFactors(sce)
summary(sizeFactors(sce))

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.08161 0.61558 0.89761 1.00000 1.28977 3.65026

quantile(logcounts(sce))

0%        25%        50%        75%       100%
0.0000000  0.0000000  0.0000000  0.7415879 13.7971704

fit <- trendVar(sce, parametric=TRUE)

Error in .trend_var(assay(x, i = assay.type), ..., subset.row = subset.row) :
need at least 4 values for non-linear curve fitting

I put the dimension of the data and the range of the values and sum factors for you in case it might help figuring out the issue.

modified 10 months ago • written 10 months ago by zhalehsafikhani0
0
10 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

Firstly, I assume that you are using the latest versions of the relevant packages (see the posting guide); you did some quality control on the cells; and you ran scater::normalize at some point.

Secondly, a cursory examination of the scran:::.trend_var code will reveal the true origin of the error:

    if (parametric) {
if (length(kept.vars) <= 3L) {
stop("need at least 4 values for non-linear curve fitting")
}

Finally, if you look at ?trendVar, you will see a min.mean argument that specifies the minimum mean log-count for genes to be used for trend fitting. I had set this to 0.1, which I thought was low enough to never cause anyone problems. I am surprised that fewer than four genes in your data set had a rowMeans(logcounts(sce)) above 0.1, these does not seem likely even for sparse droplet data. Nonetheless, you can set min.mean=0 to turn off mean filtering completely.

0
10 months ago by
zhalehsafikhani0 wrote:

Hi Aaron,

[1] scran_1.6.5                scater_1.6.1               SingleCellExperiment_1.0.0

And yes I was doing sce <- scater::normalize(sce) right before calling quantile on logcounts(sce), apparently was dropped in my copy and paste.

I don't think the the threshold for gene expressions is an issue though as I should have enough by 0.1:

quantile(rowMeans(logcounts(sce)))
0%          25%          50%          75%         100%
0.0006372668 0.0148420739 0.1515419067 0.6003230574 9.4099977354
> length(which(rowMeans(logcounts(sce)) > 0.1))
[1] 10298

Secondly; well, I don't have any idea what's happening here. trendVar will automatically remove genes with zero variance and low mean, but it doesn't seem like the mean threshold is the problem. On the other hand, I can't imagine that your genes have zero variance across all cells, unless you're feeding trendVar an artificial matrix of some sort?

In any case, if you want to resolve this problem, you can either send me an anonymized copy of the data for me to look at, or you can try debugging it yourself. Run the following lines of code:

debug(scran:::.trend_var)
out <- scran:::.trend_var(sce, parametric=TRUE)

... step through each line of the function, and see what summary(vars > 1e-08) is after its creation. Also check out what subset.row is, it should be something like 1:10298 after its reassignment.

I shared the data with you on Google drive Aaron. I would appreciate if you take a look to see if you find anything odd with the data which makes it doesnt fit trendVar function criteria.

I realized what it was - you need to set use.spikes=FALSE, otherwise it will try to use the spike-in transcripts to fit the curve. You don't have any because you're working with 10X data, hence the error. This should have been obvious to me, my apologies; see also "Additional notes on gene selection" in ?trendVar.

More generally, note that you can store the count matrix as a sparse matrix, there's no need to coerce to a matrix any more. Also, I would suggest fiddling around with the trend fitting parameters, the following gives me a nice trend:

out <- trendVar(sce, parametric=TRUE, method="spline", df=10, use.spikes=FALSE)
plot(out$mean, out$var)
curve(out\$trend(x), col="red", add=TRUE)

The next release will allow even more flexibility in controlling the trend fitting, so you can get even better fits.