Question: Scran normalization error with computeSumFactors
gravatar for hrishi27n
20 months ago by
hrishi27n0 wrote:

Hello All,

I am trying to analyze sc-RNA data using both scater and scran. While trying to run normalization using scran via the computeSumFactors method, I get the following error. Could someone please explain me what this error means and if there a way to fix it? Appreciate all your help and suggestions, thanks!!!


Code snippet: 

ScaterObject <- newSCESet(

    countData = mycountFrame,

    phenoData = pheno_data


keep_feature <- rowSums(counts(ScaterObject ) > 0) > 0

ScaterObject  <-ScaterObject [keep_feature,]

ScaterObject<- computeSumFactors(ScaterObject, sizes=c(20, 40, 60, 80))




Error in .local(x, ...) : 

  not enough cells in each cluster for specified 'sizes'

Calls: computeSumFactors ... .local -> computeSumFactors -> computeSumFactors -> .local

Execution halted


ADD COMMENTlink modified 20 months ago by Aaron Lun21k • written 20 months ago by hrishi27n0
gravatar for Aaron Lun
20 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

The problem is pretty much what the error message says; you don't have enough cells to use the pool sizes (i.e., in sizes) that you've specified to computeSumFactors. One option is to turn down the number of cells in each pool, e.g., from 10 - 40. This is probably fine as long as your genes are moderately to highly expressed. Otherwise you'll just end up with lots of zeroes, which defeats the purpose of pooling in the first place.

On a related note, it seems that you're keeping genes that are expressed at any level in any cell. This will probably result in a large number of genes that are only expressed in a handful (i.e., 1-2) of cells, which in turn leads to lots of zeroes in the pooled expression profiles. You might end up nonsensical size factors of zero when the median ratio is computed - at the very least, the use of the median ratio as a robust approximation to the mean won't be accurate for very low counts. I would suggest performing some more aggressive filtering (e.g., using only genes with a mean count above 1), for the sake of accurately calculating the size factors.

ADD COMMENTlink modified 20 months ago • written 20 months ago by Aaron Lun21k

Thanks for the reply Aaron. I will try reducing the pool sizes, you are also right that I need an aggressive filtering strategy.

I will add something like below to ensure that lowly expressed genes are removed.  

keep_feature <- rowMeans(counts(ScaterObject)) >= 5 
ADD REPLYlink written 20 months ago by hrishi27n0

Hi Aaron,

I have 12 samples with following groups.

Group1- Sample (1,10,11,12)

Group2- Sample(5,6,8,9)

Group3- Sample(2,3,4,7)

What should I select the numbers for the sizes option in the below command?

sce <- computeSumFactors(sce, sizes=c())


ADD REPLYlink modified 17 months ago • written 17 months ago by bioinforesearchquestions0

With such few samples, I don't think pooling would provide much benefit. You don't mention what type of data you've got, but if it's bulk RNA-seq data, you might as well use TMM normalization. If it's single-cell RNA-seq data... well, regardless of what statistical magic you use, there's not much you can do with 12 cells.

ADD REPLYlink written 17 months ago by Aaron Lun21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 164 users visited in the last hour