Question: ERROR - Size factors should be positive real numbers (using normalize() function)
0
8 months ago by
kushshah10
University of North Carolina, Chapel Hill, USA
kushshah10 wrote:

I have a SingleCellExperiment object, and no matter what I do, when I run normalize(filtered.sce), I get the error: size factors should be positive real numbers.

It is my understanding that even though computeSumFactors() coerces to positive by default if necessary, it doesn't imply that normalize() will run automatically.

I have done many things to my pancreas dataset (Segerstolpe et. al., 2016) in terms of QC after starting with the 1308 high-quality cells specified in the metadata. Nothing seems to be working:

• libsize.drop <- isOutlier(sce$total_counts, nmads=3, type="lower", log=TRUE) • feature.drop <- isOutlier(sce$total_features_by_counts, nmads=3, type="lower", log=TRUE)
• spike.drop <- isOutlier(sce$pct_counts_ERCC, nmads=3, type="higher") • Together, these three methods removed 62, 73, and 143 cells, respectively, from the original 1308. This seems to be a lot. • After defining ave.raw.counts <- calcAverage(sce, use_size_factors=FALSE), I've reduced the sce object down to the genes with ave.raw.counts >= 1, which is about 14000 out of the original 25000 genes When running filtered.sce <- computeSumFactors(filtered.sce), it runs WITHOUT any warning of encountering negative size factor estimates. However, when running the following two commands, I get a warning and then an error: • filtered.sce <- computeSpikeFactors(filtered.sce, type="ERCC", general.use=FALSE) • Warning message: zero spike-in counts during spike-in normalization • filtered.sce <- normalize(filtered.sce) • Error in .local(object,...): size factors should be positive real numbers I even tried filtering by keep <- ave.raw.counts >= 50 just to see if there was any way I could get it to work, but my final error during normalization was still size factors should be positive real numbers. I would appreciate any help as to why this may be happening. I can also provide any more information that is required. Thank you so much. ADD COMMENTlink modified 7 months ago • written 8 months ago by kushshah10 Answer: ERROR - Size factors should be positive real numbers (using normalize() function 2 8 months ago by Aaron Lun25k Cambridge, United Kingdom Aaron Lun25k wrote: First, calm down. Secondly, let's have a look at the warning: zero spike-in counts during spike-in normalization Sounds pretty straightforward. If you don't have any spike-in counts for a cell, you can't compute a meaningful spike-in size factor for that cell. (Technically, the spike-in size factor is reported as zero, which is meaningless; hence the warning.) This then leads to the error in normalize, because otherwise it would divide the counts for that cell by zero. So, depending on what you aim to do, you can either: 1. If you must have the spike-ins for a downstream analysis step, remove the cells with zero spike-in size factors. 2. Otherwise, remove the spike-ins and proceed onward with all cells. Of course, you can do both of these steps, e.g., do 1 to estimate the technical mean-variance trend for feature selection, and then do 2 to use all cells for downstream analysis (possibly with the subset of features selected from 1). This is, in fact, exactly what I did with this same data set here. P.S. Together, these three methods removed 62, 73, and 143 cells, respectively, from the original 1308. This seems to be a lot. I lose about 10% of cells in routine experiments, so what you're seeing is not so bad. Keep in mind that the three methods will overlap, so the total number of removed cells is unlikely to be sum of 62, 73 and 143. Of course, what they consider to be "not-low-quality" may or may not be your definition of "high quality". It's all pretty arbitrary and there's a lot of wiggle room during quality control - I mean, what cell isn't damaged by getting dunked in a foreign buffer and shot through microfluidics? They're all going to be a bit screwed up, but the hope is that there's still something useful in there. Another factor is that there are strong patient-to-patient differences in sample processing (e.g., in the spike-in percentages if nothing else), which suggests that batch= should be used in isOutlier. Perhaps I should have done so in my code, but frankly, I was so tired from wrangling their "count" matrix into shape that I just moved on ASAP. ADD COMMENTlink modified 8 months ago • written 8 months ago by Aaron Lun25k This is extremely helpful, thank you so much. I've also batched isOutlier() by individual now. Had a quick question - does "remove cells with zero spike-in size factors" mean "remove cells whose read count for every spike-in is zero"? If so, I was looking at the code you linked to. Your for.hvg <- sce.emtab[,sizeFactors(sce.emtab, "ERCC") > 0 & sce.emtab$Donor!="AZ"] line seems to be accomplishing this?

Doing the same with my sce object (specifically, filtered.sce.spike <- filtered.sce[,sizeFactors(filtered.sce,"ERCC") > 0] results in filtered.sce.spike having zero columns (zero cells). I had defined 72 spike-ins earlier. Am I missing something simple here? Perhaps there is a way I need to denote spike-ins that I have not done properly?

Had a quick question - does "remove cells with zero spike-in size factors" mean "remove cells whose read count for every spike-in is zero"?

Yes.

Perhaps there is a way I need to denote spike-ins that I have not done properly?

You probably filtered them out in your calcAverage filtering step. I would suggest not filtering explicitly, but rather use subset.row to filter within each function as needed. See comments here.

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.