Question: Different number of marker genes between same clusters
0
gravatar for ludmi.b
6 weeks ago by
ludmi.b0
ludmi.b0 wrote:

Dear all,

I'm analyzing some single cell data using Single Cell Experiment package. I'm trying to detect marker genes between clusters using the findMarkers() function. In particular, I wanted to check how many marker genes there are between cluster 3 and 7, which have FDR < 0.01 and a LogFC |1.5|.

When I construct a marker set for cluster 3, the amount of genes with a FDR < 0.01 and a LogFC.7 |2| is 81. But if I construct a marker set for cluster 7, the amount of genes with a FDR < 0.01 and a LogFC.3 |2| is 807.

The thing is that I was expecting to get 81 marker genes with those thresholds also for cluster 7. Since I'm totally new to this topic (and I apologize whether I'm asking a silly question), does anyone know if it's normal to get such a difference between the two clusters? And if yes, could anybody explain me why it's happening?

Thank you in advance

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by ludmi.b0
Answer: Different number of marker genes between same clusters
3
gravatar for Aaron Lun
6 weeks ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

For starters, it would help to show code, and it would help to tag the question with the package that you're using. I'm going to guess that you're talking about scran::findMarkers, because the SingleCellExperiment package has no such function.

In findMarkers(), DE comparisons are performed between pairs of clusters; then for each cluster, all results from all relevant pairwise comparisons (i.e., involving that cluster) are consolidated into a single marker list. This has various benefits (discussed at the end of this section) but the main point for your question is that there is no reason to expect symmetry between the marker lists for clusters 3 and 7 because their lists combine information from all other clusters.

Specifically, putting aside the direct comparison between clusters 3 and 7: if cluster 7 is very different from the other clusters (aside from 3), you'll get more DEGs. If cluster 3 is similar to the other clusters (aside from 7), then you'll get fewer DEGs. This is especially true with a lfc threshold. Imagine the following gene:

  • DE with a log-fold change of 1.5 between clusters 1 and 3
  • DE with a log-fold change of 1.5 between clusters 3 and 7
  • DE with a log-fold change of 3 between clusters 1 and 7

This would not be detected with your lfc threshold as a DE gene for cluster 3, but would be detected as a DE gene for cluster 7.

If you want statistics for a specific pairwise comparison, set full.stats=TRUE in findMarkers() and pull out the stats.3, etc. from the marker DataFrame for cluster 7 (or vice versa). This should yield identical results with the log-fold changes flipped in sign, provided you set direction="any".

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Aaron Lun25k
Answer: Different number of marker genes between same clusters
0
gravatar for ludmi.b
6 weeks ago by
ludmi.b0
ludmi.b0 wrote:

Thanks for the advice, next time I'll be sure to show the code - and yes, I was talking about scran::findMarkers.

Your explanation was very clear and helpful. Actually, it does not make sense to expect symmetry between the marker lists if DE comparisons are performed between pairs of clusters.

I tried to implement the code as suggested:

markers <- findMarkers(sca, clusters, full.stats=TRUE, direction="any")

, and now I have several statsas result, in the DataFrame. Each stats shows a stats.7.logFC , stats.7.log.p.value and a stats.7.log.FDR. Should it me reasonable to have all negative values of stats.7.log.p.value or all negative values in stats.7.log.FDR? In some clusters I got some zeros, also, but not as many as negative values.

ADD COMMENTlink written 6 weeks ago by ludmi.b0

If you are responding to an existing answer, be sure to use the "Add Comment" button, rather than adding your own answer.

Anyway, as the title of the fields suggest (e.g., stats.7$log.FDR), these are log-p-values and log-FDRs. Obviously, if you log-transform a p-value, you would hope that the result was non-positive, otherwise this would mean that your p-values were greater than 1. That would not be good, obviously.

ADD REPLYlink written 6 weeks ago by Aaron Lun25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 187 users visited in the last hour