DESeq2 design for haplotype MPRA
1
0
Entering edit mode
Rita • 0
@9eba1973
Last seen 4 hours ago
Estonia

Greeting! I am seeking advice regarding the analysis of MPRA dataset using DESeq2. We chose DESeq2 based on its successful application in Abell et al., 2022 MPRA paper, which used a similar haplotype-based MPRA design. I expected that with some minor adjustments, their approach could be adapted to our dataset as well.

Our library includes close proximity variant pairs (up to 75 bp) and two enhancer types: blood specific enhancers and tissue wide enhancers. Each variant pair is represented by 200 bp oligos (170 bp genomic + adapters) including all four haplotypes per SNP pair. Example of what my 1 haplotype group encoding:

Blood_specific_1000_alt1_alt2_C/C_anc/der.
Blood_specific_1000_alt1_ref2_C/A_anc/anc.
Blood_specific_1000_ref1_alt2_T/C_der/der. Blood_specific_1000_ref1_ref2_T/A_der/anc.

My goal is to identify which non-reference haplotypes are differentially active compared to the corresponding ref/ref haplotype within each group.

My current design formula is: ~ material + allele_combination + material:replicate + material:allele_combination I included replicate as interaction term nested within material, but there has been some disagreement about whether this is appropriate and perhaps I am overfitting the design.

I would be grateful for any advice or comments and discussions on what would be the best design to use in this case. Thanks

Bioconductor • 47 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 7 hours ago
United States

My goal is to identify which non-reference haplotypes are differentially active compared to the corresponding ref/ref haplotype within each group.

You can use DESeq2 for this, but I will note you can also use e.g. mpra Bioconductor package with limma-voom, as both approaches use a design matrix and coefficient contrasts to perform comparisons.

  • One size factor per sample, not per allele. You can do this by estimating size factors on the matrix of _sample_ counts, and then setting those size factors appropriately once you have a matrix of _sample x allele_ counts. That is, using estimateSizeFactors() and then sizeFactors()<-.
  • ~ material + allele + material:rep + material:allele looks good, this gives you a term per rep, and allows you to compare across allelic effects in different groups by material. I think you can take away the allele main effect term in the syntax to obtain per material allelic effects.
ADD COMMENT
0
Entering edit mode

Thank you very much, I appreciate it!

ADD REPLY

Login before adding your answer.

Traffic: 685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6