Greeting! I am seeking advice regarding the analysis of MPRA dataset using DESeq2. We chose DESeq2 based on its successful application in Abell et al., 2022 MPRA paper, which used a similar haplotype-based MPRA design. I expected that with some minor adjustments, their approach could be adapted to our dataset as well.
Our library includes close proximity variant pairs (up to 75 bp) and two enhancer types: blood specific enhancers and tissue wide enhancers. Each variant pair is represented by 200 bp oligos (170 bp genomic + adapters) including all four haplotypes per SNP pair. Example of what my 1 haplotype group encoding:
Blood_specific_1000_alt1_alt2_C/C_anc/der.
Blood_specific_1000_alt1_ref2_C/A_anc/anc.
Blood_specific_1000_ref1_alt2_T/C_der/der.
Blood_specific_1000_ref1_ref2_T/A_der/anc.
My goal is to identify which non-reference haplotypes are differentially active compared to the corresponding ref/ref haplotype within each group.
My current design formula is: ~ material + allele_combination + material:replicate + material:allele_combination I included replicate as interaction term nested within material, but there has been some disagreement about whether this is appropriate and perhaps I am overfitting the design.
I would be grateful for any advice or comments and discussions on what would be the best design to use in this case. Thanks
Thank you very much, I appreciate it!