I assume this will have an impact on identifying significant gene regulation events, as p-value correction for multiple testing (both BH or Bonferroni) directly depends on the number of features, thus more genes are likely to end up being significant in clusters that have less features.
Your logic assumes that the extra detected genes are always non-DE and thus will only contribute to the numerator of the BH correction factor. In reality, you'll probably be adding a few genes with low p-values, which bumps up the denominator and reduces the severity of the correction.
Moreover, these cell types with more detected features per cell will probably have higher coverage, which will increase detection power to detect DE genes. This means that, outside of the behavior of the BH correction, you would expect to get lower p-values anyway.
Is it too restrictive to just correct all p-values using the highest number of features?
Don't really know what you mean here.
Or are there any other methods to compensate for this potential bias?
The "bias", such as it is, can be paraphrased as such.
- Different cell types have different coverage, owing to the differences in their RNA content and/or number of cells.
- This causes differences in DE detection power when you compare between conditions.
- And that's just how it is. You'd get the same effect for bulk RNA-seq studies with different sequencing depths.
You could "compensate" for this bias by downsampling all cell types to have the same number of cells and coverage per cell, so that each comparison is equally powered... but that seems rather excessive. Cell types without much information won't give you a lot of DE genes, and that's probably okay for most applications. If you want to make a statement like "one cell type is more affected by the conditions than another cell type" based on the number of DE genes, then large differences in power will cause problems.
As an aside, I don't mind each cell type's comparisons being performed separately; in fact, that's exactly what scran::pseudoBulkDGE()
does. This approach insulates each comparison from odd behavior in particular cell types, strange effects from many zeroes for genes that are silent in all but one cell type, and differences in variance between cell types. The latter is especially relevant if you have a highly heterogeneous population where a lot of genes are strongly DE, such that the average expression for a particular gene is not a good proxy for the cell type-specific means during EB shrinkage to the mean-variance trend.
Wouldn't a proper DE analysis have a single count matrix with all involved genes and samples (or cell type, clusters, or whatever you analyse), and then being filtered to only include relevant genes (filterByExpr, at least x counts in y samples or groups, any filtering you can think of)? So number of tested genes should be the same in all comparisons rather than being nested with cell type, and therefore mt burden would be the same for all, no?
I agree with ATpoint. I also think your assumption is incorrect. Clusters with more expressed genes will generally have more DE genes at a given FDR cutoff rather than fewer.