I have somewhat of a complex DEG analysis experimental design and want to make sure I am setting it up right because I have gotten some strange results.
Description of the Data
- I have three different treatment A,B,C
- It is somewhat a paired situation. Meaning some samples of the samples come from the same patient but not all. i.e. Patient X might have two different samples but the two different samples differ by treatment. However, only of a fraction of the samples are paired
- The samples have additional batch effects because they were processed in different batches
- The samples come from different tissues i.e. lung, breast etc
My objective is to get differentially expressed genes between treatments A and C, while controlling for Patient ID, Batch, and tissue.
Based on my understanding of linear models I think the design matrix should like design~ patient ID + Batch + Tissue + Treatment as follows
dds<-DESeqDataSetFromMatrix(countData=counts, colData=coldata, design= ~ Patient ID + Batch + Tissue + Treatment)
where coldata is the indicator matrix. I get some weird results in the sense that the genes are heavily biased one way, the number of genes upregulated and downregulated are not remotely close to even. Additionally some of the genes that pop out are heavily biased towards certain tissues. I figure this could just be a result of the fact the data is not spread evenly across treatment, tissues, and Patients and perhaps is the best I can do. However, I also wanted to make sure my approach was correct in setting up the experiment or if there might be a better way to do things. Additionally, I keep samples from treatment B even though I am not looking for and DEGs in B but my thought are since these correspond to different control conditions its best to leave those in to better estimate the effects of each control on each gene. Thanks