k argument in buildSNNGraph
1
0
Entering edit mode
@angelos-armen-21507
Last seen 4 months ago
United Kingdom

In the documentation of buildSNNGraph it says that

The choice of k can be roughly interpreted as the minimum cluster size.

Can I have an explanation for this please.

scran • 1.1k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 11 hours ago
The city by the bay

There's nothing special here. If you have a subpopulation with fewer than k+1 cells, buildSNNGraph() will forcibly construct edges between cells in that subpopulation and cells in other subpopulations. This increases the risk that the subpopulation will not form its own cluster as it is more interconnected with the rest of the cells in the dataset.

I guess the wording of the documentation is misleading, as the interpretation of k is that of the anticipated size of the smallest subpopulation. It is not a specification of the size of the smallest cluster that you are willing to obtain. The actual minimum cluster size is at the mercy of the community detection algorithm that you choose, if it enforces (explicitly or otherwise) a minimum cluster size at all.

ADD COMMENT
0
Entering edit mode

If you have a subpopulation with fewer than k+1 cells, buildSNNGraph() will forcibly construct edges between cells in that subpopulation and cells in other subpopulations. This increases the risk that the subpopulation will not form its own cluster as it is more interconnected with the rest of the cells in the dataset.

Yes this is what I was thinking too but wouldn't interpret k as the minimum cluster size. Suppose that we have a subpopulation S1 with fewer than k+1 cells. Then the cells in S1 will have a set of cells C from the nearest subpopulation S2 as nearest neighbours. If S2 is large enough and far away enough from S1, then the cells in C will only have nearest neighbours from S2 and S1 and S2 will not merge.

On the other hand, if S2 is of similar size as S1, then S1 and S2 may indeed merge regardless of their distance. So I would think of k + 1 as the size of the smallest discoverable isolated subpopulation (where isolated means that the intra-subpopulation distances are smaller than the distances to the other subpopulations).

ADD REPLY
0
Entering edit mode

Sure, but I wouldn't sweat the details. It's subject to enough extra factors that the exact interpretation can't be easily pinned down. I'd already updated the documentation to be a bit more precise but it probably doesn't really matter.

ADD REPLY

Login before adding your answer.

Traffic: 626 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6