Search
Question: Scran normalization error with computeSumFactors
0
gravatar for hrishi27n
9 months ago by
hrishi27n0
hrishi27n0 wrote:

Hello All,

I am trying to analyze sc-RNA data using both scater and scran. While trying to run normalization using scran via the computeSumFactors method, I get the following error. Could someone please explain me what this error means and if there a way to fix it? Appreciate all your help and suggestions, thanks!!!

 

Code snippet: 

ScaterObject <- newSCESet(

    countData = mycountFrame,

    phenoData = pheno_data

)

keep_feature <- rowSums(counts(ScaterObject ) > 0) > 0

ScaterObject  <-ScaterObject [keep_feature,]

ScaterObject<- computeSumFactors(ScaterObject, sizes=c(20, 40, 60, 80))

summary(sizeFactors(ScaterObject))

 

Error:

Error in .local(x, ...) : 

  not enough cells in each cluster for specified 'sizes'

Calls: computeSumFactors ... .local -> computeSumFactors -> computeSumFactors -> .local

Execution halted

 

ADD COMMENTlink modified 9 months ago by Aaron Lun17k • written 9 months ago by hrishi27n0
3
gravatar for Aaron Lun
9 months ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

The problem is pretty much what the error message says; you don't have enough cells to use the pool sizes (i.e., in sizes) that you've specified to computeSumFactors. One option is to turn down the number of cells in each pool, e.g., from 10 - 40. This is probably fine as long as your genes are moderately to highly expressed. Otherwise you'll just end up with lots of zeroes, which defeats the purpose of pooling in the first place.

On a related note, it seems that you're keeping genes that are expressed at any level in any cell. This will probably result in a large number of genes that are only expressed in a handful (i.e., 1-2) of cells, which in turn leads to lots of zeroes in the pooled expression profiles. You might end up nonsensical size factors of zero when the median ratio is computed - at the very least, the use of the median ratio as a robust approximation to the mean won't be accurate for very low counts. I would suggest performing some more aggressive filtering (e.g., using only genes with a mean count above 1), for the sake of accurately calculating the size factors.

ADD COMMENTlink modified 9 months ago • written 9 months ago by Aaron Lun17k

Thanks for the reply Aaron. I will try reducing the pool sizes, you are also right that I need an aggressive filtering strategy.

I will add something like below to ensure that lowly expressed genes are removed.  

keep_feature <- rowMeans(counts(ScaterObject)) >= 5 
ADD REPLYlink written 9 months ago by hrishi27n0

Hi Aaron,

I have 12 samples with following groups.

Group1- Sample (1,10,11,12)

Group2- Sample(5,6,8,9)

Group3- Sample(2,3,4,7)

What should I select the numbers for the sizes option in the below command?

sce <- computeSumFactors(sce, sizes=c())

 

ADD REPLYlink modified 6 months ago • written 6 months ago by bioinforesearchquestions0

With such few samples, I don't think pooling would provide much benefit. You don't mention what type of data you've got, but if it's bulk RNA-seq data, you might as well use TMM normalization. If it's single-cell RNA-seq data... well, regardless of what statistical magic you use, there's not much you can do with 12 cells.

ADD REPLYlink written 6 months ago by Aaron Lun17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 157 users visited in the last hour