Hello! I am following the Aaron T Lun et al. 2016 paper to analyze my brain scRNA-seq data, but I am blocked in the step of gene filtering. I have my data in an sce object and I have already filtered outlier cells (isOutlier) and genes based on low-abundance genes (with threshold 0.1 because we used UMIs and this is what they recommended). The last part is like this:
ave.counts <- calcAverage(sce)
hist(log10(ave.counts), breaks=100, main="", col="grey",xlab=expression(Log~"average count"))
abline(v=log10(0.1), col="red", lwd=2, lty=2)
Here it happens that I already see that gene expression of my cells is very low (bad quality?), because almost all genes appeared around -1 in the log10-average-count X axis.
So I lose most genes with the 0.1 threshold. Nevertheless, I continued removing genes with 0 average counts:
rowData(sce)$ave.count <- ave.counts
to.keep <- ave.counts > 0
sce <- sce[to.keep,]
And finally I tried to calculate size factors with deconvolution with computeSumFactors, firstly doing a quick clustering with quickCluster because I think I understood it is better when having a big dataset as mine (10000 cells, after filtering around 5000; 24000 genes, after filtering zero average counts around 17000). The problem came when I entered:
high.ave <- rowData(sce)$ave.count >= 0.1
clusters <- quickCluster(sce, subset.row=high.ave, method="igraph")
sce <- computeSumFactors(sce, cluster=clusters, subset.row=high.ave, min.mean=NULL)
I obtained this warning:
Warning message: In .computeSumFactors(assay(x, i = assay.type), subset.row = subset.row, : encountered negative size factor estimates
To be continue below...