Incosistencies in buildSNNGraph() function
1
0
Entering edit mode
Lucas • 0
@b941f4e6
Last seen 1 day ago
The Netherlands

Working with Spatial Transcriptomics data (Large Spatial Experiment)

I have observed that while performing dimensionality reduction as the following:

speBreast <- runPCA(speBreast, 
                subset_row = sel)

and then:

speBreast <- runUMAP(speBreast,
              dimred = 'PCA')

I obtain a certain result with the functions buildSNNGraph and igraph::cluster_leiden for the clustering of my samples.

k <- 10
g <- buildSNNGraph(speBreast,
                   k = k,
                   use.dimred = 'PCA')

k <- igraph::cluster_leiden(g, 
                            objective_function = 'modularity',
                            resolution = 1.2)

However if I omit runUMAP() I obtain a completely different clustering result, although buildSNNGraph uses use.dimred = 'PCA' I obtain a substantially different clustering.

I believe it is caused by runUMAP() since when I change its dimred parameter from 'PCA' to NULL I obtain the exact same clustering of omitting runUMAP() entirely, as previously described.

Attached are the results of the Leiden clustering (right subplot) for runUMAP with dimred = 'PCA' and runUMAP with dimred = NULL respectively. when runUMAP dimred = 'PCA' when runUMAP dimred = NULL

Any help is greatly appreciated

scran seurat • 57 views
ADD COMMENT
0
Entering edit mode

Just played with this a bit and I think it comes down to not having a fixed seed before running the leiden thingy. Same goes for PCA and UMAP, you should get used to always fix seeds before running such functions. Please try with fixed seeds and validate the issue is beyond that.

ADD REPLY
1
Entering edit mode
Aaron Lun ★ 29k
@alun
Last seen 9 hours ago
The city by the bay

The most obvious thing that comes to mind is the RNG seed. cluster_leiden() does some random number generation, somewhere deep in the igraph C code. Similarly, UMAP involves random number generation during the layout optimization and (sometimes) the initialization. If you skip the UMAP step, R's global RNG is in a different state when cluster_leiden() starts running, which changes the results of the clustering.

Setting reddim=NULL will instruct runUMAP to run a PCA internally. If I remember correctly, this defaults to IRLBA, which also hits up the RNG for some more random numbers. I would hypothesize that this changes the RNG state yet again, such that the ensuing sequence of random numbers in cluster_leiden() can recover the previous clustering.

This can be easily tested by calling set.seed() before running cluster_leiden(). Once the seed is set, you should see the same clustering regardless of what you do with runUMAP() beforehand.

FWIW this is no longer a problem in many of my newer packages. I just use a fixed seed for random number generation to avoid these annoying surprises, statistical correctness be damned.

ADD COMMENT

Login before adding your answer.

Traffic: 759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6