Question: k argument in buildSNNGraph
0
gravatar for Angelos Armen
3 months ago by
Angelos Armen0 wrote:

In the documentation of buildSNNGraph it says that

The choice of k can be roughly interpreted as the minimum cluster size.

Can I have an explanation for this please.

scran • 104 views
ADD COMMENTlink modified 3 months ago by Aaron Lun25k • written 3 months ago by Angelos Armen0
Answer: k argument in buildSNNGraph
1
gravatar for Aaron Lun
3 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

There's nothing special here. If you have a subpopulation with fewer than k+1 cells, buildSNNGraph() will forcibly construct edges between cells in that subpopulation and cells in other subpopulations. This increases the risk that the subpopulation will not form its own cluster as it is more interconnected with the rest of the cells in the dataset.

I guess the wording of the documentation is misleading, as the interpretation of k is that of the anticipated size of the smallest subpopulation. It is not a specification of the size of the smallest cluster that you are willing to obtain. The actual minimum cluster size is at the mercy of the community detection algorithm that you choose, if it enforces (explicitly or otherwise) a minimum cluster size at all.

ADD COMMENTlink modified 3 months ago • written 3 months ago by Aaron Lun25k

If you have a subpopulation with fewer than k+1 cells, buildSNNGraph() will forcibly construct edges between cells in that subpopulation and cells in other subpopulations. This increases the risk that the subpopulation will not form its own cluster as it is more interconnected with the rest of the cells in the dataset.

Yes this is what I was thinking too but wouldn't interpret k as the minimum cluster size. Suppose that we have a subpopulation S1 with fewer than k+1 cells. Then the cells in S1 will have a set of cells C from the nearest subpopulation S2 as nearest neighbours. If S2 is large enough and far away enough from S1, then the cells in C will only have nearest neighbours from S2 and S1 and S2 will not merge.

On the other hand, if S2 is of similar size as S1, then S1 and S2 may indeed merge regardless of their distance. So I would think of k + 1 as the size of the smallest discoverable isolated subpopulation (where isolated means that the intra-subpopulation distances are smaller than the distances to the other subpopulations).

ADD REPLYlink modified 3 months ago • written 3 months ago by Angelos Armen0

Sure, but I wouldn't sweat the details. It's subject to enough extra factors that the exact interpretation can't be easily pinned down. I'd already updated the documentation to be a bit more precise but it probably doesn't really matter.

ADD REPLYlink written 3 months ago by Aaron Lun25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 302 users visited in the last hour