Search
Question: scran: trendVar() / lm.fit() fails in single-cell RNAseq experiment
0
8 months ago by
roman.hillje0 wrote:

Hi!

I'm currently working on some scRNAseq data (UMI-based, very low counts) and following the workflow posted on Bioconductor (www.bioconductor.org/help/workflows/simpleSingleCell/). During the step of trend fitting to estimate technical variation (unfortunately without spike-ins), my attempt to run trendVar() fails for two out of my three samples. I still don't understand what exactly is causing the problem, but I noticed that the two samples that fail have some negative size estimations after running computeSumFactors(). It's only 7 out of ~2,500 cells though.

Anyway, here is the original error message:

In xy.coords(x, y, xlabel, ylabel, log) :
7 x values <= 0 omitted from logarithmic plot
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
Calls: trendVar ... .local -> .trend_var -> .get_nls_starts -> lm -> lm.fit

The list of commands I run is roughly this:

sce <- SingleCellExperiment(assays = list(counts = counts))
exprs(sce) <- log2(calculateCPM(sce, use.size.factors = FALSE) + 1)
keep_feature <- rowSums(exprs(sce) > 0) > 0
sce <- sce[keep_feature,]
sce <- computeSumFactors(sce, min.mean=0.1)
sce <- normalize(sce)
var.fit <- trendVar(sce, parametric=TRUE, span=0.2, use.spikes=FALSE)
var.out <- decomposeVar(sce, var.fit)

Does anybody have an idea about what I can do to fix this? I tried adjusting the min.mean parameter but that didn't change anything. I'm a bit clueless about where to start looking for the problem.

Thanks,

Roman

modified 8 months ago by Aaron Lun21k • written 8 months ago by roman.hillje0
2
8 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

Any negative size factors will generate NA log-expression values after running scater::normalize. This will interfere with all of the downstream analyses. The solution is to remove the problematic cells prior to running computeSumFactors; the documentation for this function has a few suggestions about how to deal with negative size factors (in the section aptly named "Dealing with negative size factors").

My first guess would be to apply some more conservative quality control on the cells - the 7 offending cells probably have relatively low library sizes or total numbers of expressed features. We use computeSumFactors regularly for droplet-based scRNA-seq data so that shouldn't inherently be a problem.

Thank you for the quick and very helpful response! Being more stringent on the features actually didn't occur to me. I'll test that and in parallel check if it might make sense to also apply some kind of outlier rule on the number of total feature detected in a cell. There doesn't seem to be a direct link with total counts though as the cells with few features anyway have 3k-8k molecule counts and so might be actual cells.

Thanks again!