Different number of marker genes between same clusters
2
0
Entering edit mode
ludmi.b • 0
@ludmib-21990
Last seen 3.0 years ago

Dear all,

I'm analyzing some single cell data using Single Cell Experiment package. I'm trying to detect marker genes between clusters using the findMarkers() function. In particular, I wanted to check how many marker genes there are between cluster 3 and 7, which have FDR < 0.01 and a LogFC |1.5|.

When I construct a marker set for cluster 3, the amount of genes with a FDR < 0.01 and a LogFC.7 |2| is 81. But if I construct a marker set for cluster 7, the amount of genes with a FDR < 0.01 and a LogFC.3 |2| is 807.

The thing is that I was expecting to get 81 marker genes with those thresholds also for cluster 7. Since I'm totally new to this topic (and I apologize whether I'm asking a silly question), does anyone know if it's normal to get such a difference between the two clusters? And if yes, could anybody explain me why it's happening?

sce markers single cell DE clusters • 555 views
3
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 19 hours ago
The city by the bay

For starters, it would help to show code, and it would help to tag the question with the package that you're using. I'm going to guess that you're talking about scran::findMarkers, because the SingleCellExperiment package has no such function.

In findMarkers(), DE comparisons are performed between pairs of clusters; then for each cluster, all results from all relevant pairwise comparisons (i.e., involving that cluster) are consolidated into a single marker list. This has various benefits (discussed at the end of this section) but the main point for your question is that there is no reason to expect symmetry between the marker lists for clusters 3 and 7 because their lists combine information from all other clusters.

Specifically, putting aside the direct comparison between clusters 3 and 7: if cluster 7 is very different from the other clusters (aside from 3), you'll get more DEGs. If cluster 3 is similar to the other clusters (aside from 7), then you'll get fewer DEGs. This is especially true with a lfc threshold. Imagine the following gene:

• DE with a log-fold change of 1.5 between clusters 1 and 3
• DE with a log-fold change of 1.5 between clusters 3 and 7
• DE with a log-fold change of 3 between clusters 1 and 7

This would not be detected with your lfc threshold as a DE gene for cluster 3, but would be detected as a DE gene for cluster 7.

If you want statistics for a specific pairwise comparison, set full.stats=TRUE in findMarkers() and pull out the stats.3, etc. from the marker DataFrame for cluster 7 (or vice versa). This should yield identical results with the log-fold changes flipped in sign, provided you set direction="any".

0
Entering edit mode
ludmi.b • 0
@ludmib-21990
Last seen 3.0 years ago

Thanks for the advice, next time I'll be sure to show the code - and yes, I was talking about scran::findMarkers.

Your explanation was very clear and helpful. Actually, it does not make sense to expect symmetry between the marker lists if DE comparisons are performed between pairs of clusters.

I tried to implement the code as suggested:

markers <- findMarkers(sca, clusters, full.stats=TRUE, direction="any")


, and now I have several statsas result, in the DataFrame. Each stats shows a stats.7.logFC , stats.7.log.p.value and a stats.7.log.FDR. Should it me reasonable to have all negative values of stats.7.log.p.value or all negative values in stats.7.log.FDR? In some clusters I got some zeros, also, but not as many as negative values.

0
Entering edit mode

Anyway, as the title of the fields suggest (e.g., stats.7\$log.FDR), these are log-p-values and log-FDRs. Obviously, if you log-transform a p-value, you would hope that the result was non-positive, otherwise this would mean that your p-values were greater than 1. That would not be good, obviously.