I'm using Deseq2 for differential gene expression analysis and then for downstream analysis i'm going to classify healthy vs disease states using support vector machine. The data comprises of healthy and disease states which sequenced in 13 batches. I have 2 questions:
1) I used design = ~ batch + condition and when I ran resultsNames(dds), I found that the result is only between each batch against the first batch whereas I'm looking for all differential expressed genes between healthy and disease states in all samples.
How can I find all differential expressed genes between healthy and disease states ? Should I ignore batch effects?
2) Deseq2 doesn't remove batch effects and only model it, so how can I use this in my classification? I'm using FPKM of genes which are differentially expressed as input of support vector machine.
Many thanks