Question

scran: trendVar() / lm.fit() fails in single-cell RNAseq experiment

0

Entering edit mode

roman.hillje • 0

@romanhillje-13023

Last seen 4.7 years ago

Hi!

I'm currently working on some scRNAseq data (UMI-based, very low counts) and following the workflow posted on Bioconductor (www.bioconductor.org/help/workflows/simpleSingleCell/). During the step of trend fitting to estimate technical variation (unfortunately without spike-ins), my attempt to run trendVar() fails for two out of my three samples. I still don't understand what exactly is causing the problem, but I noticed that the two samples that fail have some negative size estimations after running computeSumFactors(). It's only 7 out of ~2,500 cells though.

Anyway, here is the original error message:

In xy.coords(x, y, xlabel, ylabel, log) :
  7 x values <= 0 omitted from logarithmic plot
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases
Calls: trendVar ... .local -> .trend_var -> .get_nls_starts -> lm -> lm.fit

The list of commands I run is roughly this:

sce <- SingleCellExperiment(assays = list(counts = counts))
exprs(sce) <- log2(calculateCPM(sce, use.size.factors = FALSE) + 1)
keep_feature <- rowSums(exprs(sce) > 0) > 0
sce <- sce[keep_feature,]
sce <- computeSumFactors(sce, min.mean=0.1)
sce <- normalize(sce)
var.fit <- trendVar(sce, parametric=TRUE, span=0.2, use.spikes=FALSE)
var.out <- decomposeVar(sce, var.fit)

Does anybody have an idea about what I can do to fix this? I tried adjusting the min.mean parameter but that didn't change anything. I'm a bit clueless about where to start looking for the problem.

Thanks,

Roman

scran • 1.3k views

ADD COMMENT • link updated 7.2 years ago by Aaron Lun ★ 28k • written 7.2 years ago by roman.hillje • 0

score 2 · Accepted Answer · 2018-01-08

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 3 hours ago

The city by the bay

Any negative size factors will generate NA log-expression values after running scater::normalize. This will interfere with all of the downstream analyses. The solution is to remove the problematic cells prior to running computeSumFactors; the documentation for this function has a few suggestions about how to deal with negative size factors (in the section aptly named "Dealing with negative size factors").

My first guess would be to apply some more conservative quality control on the cells - the 7 offending cells probably have relatively low library sizes or total numbers of expressed features. We use computeSumFactors regularly for droplet-based scRNA-seq data so that shouldn't inherently be a problem.

ADD COMMENT • link 7.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you for the quick and very helpful response! Being more stringent on the features actually didn't occur to me. I'll test that and in parallel check if it might make sense to also apply some kind of outlier rule on the number of total feature detected in a cell. There doesn't seem to be a direct link with total counts though as the cells with few features anyway have 3k-8k molecule counts and so might be actual cells.

Thanks again!

ADD REPLY • link 7.2 years ago roman.hillje • 0