Question

DESeq2 design for haplotype MPRA

0

Entering edit mode

Rita • 0

@9eba1973

Last seen 7 weeks ago

Estonia

Greeting! I am seeking advice regarding the analysis of MPRA dataset using DESeq2. We chose DESeq2 based on its successful application in Abell et al., 2022 MPRA paper, which used a similar haplotype-based MPRA design. I expected that with some minor adjustments, their approach could be adapted to our dataset as well.

Our library includes close proximity variant pairs (up to 75 bp) and two enhancer types: blood specific enhancers and tissue wide enhancers. Each variant pair is represented by 200 bp oligos (170 bp genomic + adapters) including all four haplotypes per SNP pair. Example of what my 1 haplotype group encoding:

Blood_specific_1000_alt1_alt2_C/C_anc/der.
Blood_specific_1000_alt1_ref2_C/A_anc/anc.
Blood_specific_1000_ref1_alt2_T/C_der/der. Blood_specific_1000_ref1_ref2_T/A_der/anc.

My goal is to identify which non-reference haplotypes are differentially active compared to the corresponding ref/ref haplotype within each group.

My current design formula is: ~ material + allele_combination + material:replicate + material:allele_combination I included replicate as interaction term nested within material, but there has been some disagreement about whether this is appropriate and perhaps I am overfitting the design.

I would be grateful for any advice or comments and discussions on what would be the best design to use in this case. Thanks

Bioconductor • 471 views

ADD COMMENT • link updated 15 days ago by aimesd808 • 0 • written 7 weeks ago by Rita • 0

score 2 · Answer 1 · 2025-10-14

My goal is to identify which non-reference haplotypes are differentially active compared to the corresponding ref/ref haplotype within each group.

You can use DESeq2 for this, but I will note you can also use e.g. mpra Bioconductor package with limma-voom, as both approaches use a design matrix and coefficient contrasts to perform comparisons.

One size factor per sample, not per allele. You can do this by estimating size factors on the matrix of _sample_ counts, and then setting those size factors appropriately once you have a matrix of _sample x allele_ counts. That is, using estimateSizeFactors() and then sizeFactors()<-.
~ material + allele + material:rep + material:allele looks good, this gives you a term per rep, and allows you to compare across allelic effects in different groups by material. I think you can take away the allele main effect term in the syntax to obtain per material allelic effects.

score 0 · Answer 2 · 2025-11-18

0

Entering edit mode

aimesd808 • 0

@c539c10d

Last seen 15 days ago

Vietnam

I appreciate you telling me about this. Now it's crystal plain to me. Publish this: DESeq2 design for haplotype MPRA fnf

ADD COMMENT • link 17 days ago • updated 15 days ago aimesd808 • 0