Question

k argument in buildSNNGraph

0

Entering edit mode

Angelos Armen • 0

@angelos-armen-21507

Last seen 2.4 years ago

United Kingdom

In the documentation of buildSNNGraph it says that

The choice of k can be roughly interpreted as the minimum cluster size.

Can I have an explanation for this please.

scran • 912 views

ADD COMMENT • link updated 4.7 years ago by Aaron Lun ★ 28k • written 4.7 years ago by Angelos Armen • 0

score 1 · Accepted Answer · 2019-08-08

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 3 hours ago

The city by the bay

There's nothing special here. If you have a subpopulation with fewer than k+1 cells, buildSNNGraph() will forcibly construct edges between cells in that subpopulation and cells in other subpopulations. This increases the risk that the subpopulation will not form its own cluster as it is more interconnected with the rest of the cells in the dataset.

I guess the wording of the documentation is misleading, as the interpretation of k is that of the anticipated size of the smallest subpopulation. It is not a specification of the size of the smallest cluster that you are willing to obtain. The actual minimum cluster size is at the mercy of the community detection algorithm that you choose, if it enforces (explicitly or otherwise) a minimum cluster size at all.

ADD COMMENT • link 4.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

If you have a subpopulation with fewer than k+1 cells, buildSNNGraph() will forcibly construct edges between cells in that subpopulation and cells in other subpopulations. This increases the risk that the subpopulation will not form its own cluster as it is more interconnected with the rest of the cells in the dataset.

Yes this is what I was thinking too but wouldn't interpret k as the minimum cluster size. Suppose that we have a subpopulation S1 with fewer than k+1 cells. Then the cells in S1 will have a set of cells C from the nearest subpopulation S2 as nearest neighbours. If S2 is large enough and far away enough from S1, then the cells in C will only have nearest neighbours from S2 and S1 and S2 will not merge.

On the other hand, if S2 is of similar size as S1, then S1 and S2 may indeed merge regardless of their distance. So I would think of k + 1 as the size of the smallest discoverable isolated subpopulation (where isolated means that the intra-subpopulation distances are smaller than the distances to the other subpopulations).

ADD REPLY • link 4.7 years ago Angelos Armen • 0

0

Entering edit mode

Sure, but I wouldn't sweat the details. It's subject to enough extra factors that the exact interpretation can't be easily pinned down. I'd already updated the documentation to be a bit more precise but it probably doesn't really matter.

ADD REPLY • link 4.7 years ago Aaron Lun ★ 28k