Hello, everyone! I am working with pseudo bulk RNA-seq data and facing challenges with designing an appropriate analysis approach due to confounded batch effects and unbalanced conditions. Here is a summary of my data.
challenges:
- The Diagnosis groups (e.g. healthy vs. cancer) do not overlap with the same batches, making it impossible to adjust for batch effects using the typical design matrix:
~ Diagnosis + Batch
. - I'm interested in comparing healthy vs. cancer samples while eliminating batch effects.
Questions:
Is there an alternative model or approach in tools like edgeR, limma-voom, or DESeq2 or any other that can handle confounded batch effects? (Currently I'm working with edgeR with passing Diagnosis as a single factor to the design matrix. But MDS plot separate clusters for dataset1, dataset2 and dataset3)
Would combining Diagnosis and Batch into a single group factor be advisable here?
Are there any tools that take preprocessed(batch-corrected data, i.e. I have )data in differential expression analysis? (I guess edgeR only works with raw counts)
Thank you in advance for the help.