Question

scran::computeSumFactors with single-cell ATAC-seq data

0

Entering edit mode

Angelos Armen • 0

@angelos-armen-21507

Last seen 2.4 years ago

United Kingdom

Hi,

What are your thoughts about using scran::computeSumFactors with (10x) single-cell ATAC-seq data? Should one use the default value of min.mean = 1 for read data? I called scater::calculateAverage on one my datasets and 89% and 23% of the peaks have mean count > 0.1 and > 1, respectively. The values are 18% and 1.5% respectively for a 10x single-cell RNA dataset (so I see why min.mean = 0.1 is needed for UMI data).

An ATAC-specific concern is that, as the size factor increases, the (measured) number of cuts in small peaks will stop increasing at some point and result in incorrect ratios.

scran • 980 views

ADD COMMENT • link updated 3.5 years ago by Aaron Lun ★ 28k • written 3.5 years ago by Angelos Armen • 0

score 1 · Accepted Answer · 2020-11-01

I don't handle scATAC-seq data personally, but if you're dealing with peaks, I would guess that the features are defined based on being somewhat high-abundance (e.g., in the pool across all cells). So, in effect, Cellranger has already done a bit of filtering for you, in contrast to the gene expression case where you just get a (possibly zero) count reported for all genes regardless of whether it's actually expressed or not. This would explain why you get a higher percentage of features with means above the threshold.

As for the specific threshold to use - if it's UMI data, you might as well use 0.1 and make use of more of your features. Remember, someone's already done the filtering for you, so there's no clear need to be even more aggressive with the filtering in computeSumFactors() on top of that.

I don't really understand what you mean by the number of cuts in small peaks. I assume you're referring to the fact that the coverage becomes capped by the fact that diploid cells only have 2 chromosomes. That's true enough, but if your size factor continues to increase regardless, it indicates that the cap isn't really in effect, e.g., due to PCR duplicates or whatever. (Assuming that the size factor calculation adjusts for composition biases.)