In the documentation of buildSNNGraph it says that
The choice of k can be roughly interpreted as the minimum cluster size.
Can I have an explanation for this please.
In the documentation of buildSNNGraph it says that
The choice of k can be roughly interpreted as the minimum cluster size.
Can I have an explanation for this please.
There's nothing special here. If you have a subpopulation with fewer than k+1
cells, buildSNNGraph()
will forcibly construct edges between cells in that subpopulation and cells in other subpopulations. This increases the risk that the subpopulation will not form its own cluster as it is more interconnected with the rest of the cells in the dataset.
I guess the wording of the documentation is misleading, as the interpretation of k
is that of the anticipated size of the smallest subpopulation. It is not a specification of the size of the smallest cluster that you are willing to obtain. The actual minimum cluster size is at the mercy of the community detection algorithm that you choose, if it enforces (explicitly or otherwise) a minimum cluster size at all.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Yes this is what I was thinking too but wouldn't interpret k as the minimum cluster size. Suppose that we have a subpopulation S1 with fewer than k+1 cells. Then the cells in S1 will have a set of cells C from the nearest subpopulation S2 as nearest neighbours. If S2 is large enough and far away enough from S1, then the cells in C will only have nearest neighbours from S2 and S1 and S2 will not merge.
On the other hand, if S2 is of similar size as S1, then S1 and S2 may indeed merge regardless of their distance. So I would think of k + 1 as the size of the smallest discoverable isolated subpopulation (where isolated means that the intra-subpopulation distances are smaller than the distances to the other subpopulations).
Sure, but I wouldn't sweat the details. It's subject to enough extra factors that the exact interpretation can't be easily pinned down. I'd already updated the documentation to be a bit more precise but it probably doesn't really matter.