::Edit:: The embedded images are not formatting well, so I have changed them to external links.
I have been following the DESeq2 vignette for bulk RNA-Seq data and got the to section where it is recommended you make a heatmap of the euclidian distances between samples.
In addition to the recommended heat map using vst-transformed counts, I decided to also make heat maps using the raw and DESeq-normalized counts. What I found, however, did not make sense.
When using the raw counts to generate the sample distances, I found that one sample stood out as an outlier (S1). As shown below, this sample has high distances to the other samples, with distances to two samples in particular being especially high (S12 and S6).
This was expected since I knew that sample showed elevated levels of RNA degradation compared to the others.
However, when using the normalized counts, the result did not make sense. Instead of sample S1 continuing to look like an outlier, it looked normal. Instead, a new sample, S13, now looks like an outlier with a nearly identical pattern (overall higher distances with 2 samples that are especially high, S6 and S7).
I found this to be concerning as it almost looks like the normalization procedure swapped the sample labels (after some testing, however, I determined this was not the case). This is not what I would have expected from sample normalization, especially with the outlier pattern 'switching' to a different sample.
Does anyone have an idea what might be going on? Or is there something I could do to better understand what is happening here?