Hi, I am working on differential expression analysis of multiple leukemia RNA-seq datasets retrieved from SRA. One of my datasets consists of both normal and leukemic samples, whereas the other two are only included leukemic samples. Although I set normal samples as the reference level, the sample distance matrix plot of all datasets clusters samples of one dataset together and samples of other datasets together, no matter they are normal or leukemic. Moreover, the list of significantly expressed genes produced by DESeq2 varies when I use samples of multiple datasets instead of one. I think this problem is rising from different library preparation and sequencing protocol (batch effects) of each dataset if I am right. I would be grateful if someone can help me with fixing this issue to obtain the correct gene list and plot.