I have a doubt/question regarding the heatmap visualization of gene expression data obtained with bulk RNA-seq technology from different datasets, with z-score row scaling. By using the same list of genes, when the heatmap generated by using only samples from the same datasets heatmap highlights difference in the gene expression between patients vs controls (Figure1) but when the matrix include also samples from different datasets differences between patients and controls seem to disappear, while it seems to be opposite expression trends between samples from different datasets (Figure2). can you give me some suggestions on how to solve this problem?
There can be considerable differences between datasets just due to different handling, sequencing method, batch effects, etc etc, and this might be dominating the differential expression you are seeing within your original dataset.
For a heatmap, you could consider trying to subtract out the dataset effect, for example with
limma::removeBatchEffect
.I'm not sure if you have both conditions in both datasets? If not, there might not be much value in combining the datasets.