Question

Scran normalization error with computeSumFactors

0

Entering edit mode

hrishi27n ▴ 20

@hrishi27n-11821

Last seen 2.6 years ago

United States

Hello All,

I am trying to analyze sc-RNA data using both scater and scran. While trying to run normalization using scran via the computeSumFactors method, I get the following error. Could someone please explain me what this error means and if there a way to fix it? Appreciate all your help and suggestions, thanks!!!

Code snippet:

ScaterObject <- newSCESet(

countData = mycountFrame,

phenoData = pheno_data

)

keep_feature <- rowSums(counts(ScaterObject ) > 0) > 0

ScaterObject <-ScaterObject [keep_feature,]

ScaterObject<- computeSumFactors(ScaterObject, sizes=c(20, 40, 60, 80))

summary(sizeFactors(ScaterObject))

Error:

Error in .local(x, ...) :

not enough cells in each cluster for specified 'sizes'

Calls: computeSumFactors ... .local -> computeSumFactors -> computeSumFactors -> .local

Execution halted

scran scater • 1.8k views

ADD COMMENT • link updated 7.2 years ago by Aaron Lun ★ 28k • written 7.2 years ago by hrishi27n ▴ 20

score 3 · Accepted Answer · 2017-01-31

3

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 5 hours ago

The city by the bay

The problem is pretty much what the error message says; you don't have enough cells to use the pool sizes (i.e., in sizes) that you've specified to computeSumFactors. One option is to turn down the number of cells in each pool, e.g., from 10 - 40. This is probably fine as long as your genes are moderately to highly expressed. Otherwise you'll just end up with lots of zeroes, which defeats the purpose of pooling in the first place.

On a related note, it seems that you're keeping genes that are expressed at any level in any cell. This will probably result in a large number of genes that are only expressed in a handful (i.e., 1-2) of cells, which in turn leads to lots of zeroes in the pooled expression profiles. You might end up nonsensical size factors of zero when the median ratio is computed - at the very least, the use of the median ratio as a robust approximation to the mean won't be accurate for very low counts. I would suggest performing some more aggressive filtering (e.g., using only genes with a mean count above 1), for the sake of accurately calculating the size factors.

ADD COMMENT • link 7.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks for the reply Aaron. I will try reducing the pool sizes, you are also right that I need an aggressive filtering strategy.

I will add something like below to ensure that lowly expressed genes are removed.

keep_feature <- rowMeans(counts(ScaterObject)) >= 5

ADD REPLY • link 7.2 years ago hrishi27n ▴ 20

0

Entering edit mode

Hi Aaron,

I have 12 samples with following groups.

Group1- Sample (1,10,11,12)

Group2- Sample(5,6,8,9)

Group3- Sample(2,3,4,7)

What should I select the numbers for the sizes option in the below command?

sce <- computeSumFactors(sce, sizes=c())

ADD REPLY • link 6.9 years ago bioinforesearchquestions • 0

0

Entering edit mode

With such few samples, I don't think pooling would provide much benefit. You don't mention what type of data you've got, but if it's bulk RNA-seq data, you might as well use TMM normalization. If it's single-cell RNA-seq data... well, regardless of what statistical magic you use, there's not much you can do with 12 cells.

ADD REPLY • link 6.9 years ago Aaron Lun ★ 28k