Hi!
I'm currently working on some scRNAseq data (UMI-based, very low counts) and following the workflow posted on Bioconductor (www.bioconductor.org/help/workflows/simpleSingleCell/). During the step of trend fitting to estimate technical variation (unfortunately without spike-ins), my attempt to run trendVar() fails for two out of my three samples. I still don't understand what exactly is causing the problem, but I noticed that the two samples that fail have some negative size estimations after running computeSumFactors(). It's only 7 out of ~2,500 cells though.
Anyway, here is the original error message:
In xy.coords(x, y, xlabel, ylabel, log) : 7 x values <= 0 omitted from logarithmic plot Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases Calls: trendVar ... .local -> .trend_var -> .get_nls_starts -> lm -> lm.fit
The list of commands I run is roughly this:
sce <- SingleCellExperiment(assays = list(counts = counts)) exprs(sce) <- log2(calculateCPM(sce, use.size.factors = FALSE) + 1) keep_feature <- rowSums(exprs(sce) > 0) > 0 sce <- sce[keep_feature,] sce <- computeSumFactors(sce, min.mean=0.1) sce <- normalize(sce) var.fit <- trendVar(sce, parametric=TRUE, span=0.2, use.spikes=FALSE) var.out <- decomposeVar(sce, var.fit)
Does anybody have an idea about what I can do to fix this? I tried adjusting the min.mean parameter but that didn't change anything. I'm a bit clueless about where to start looking for the problem.
Thanks,
Roman
Thank you for the quick and very helpful response! Being more stringent on the features actually didn't occur to me. I'll test that and in parallel check if it might make sense to also apply some kind of outlier rule on the number of total feature detected in a cell. There doesn't seem to be a direct link with total counts though as the cells with few features anyway have 3k-8k molecule counts and so might be actual cells.
Thanks again!