Apologies for cross-posting on biostars, but the part most relevant to me involves procedures using Bioconductor packages. I'm working on a haplotype-resolved diploid assembly of a plant genome, where each chromosome is represented by two FASTA/GTF pairs rather than a single consensus. I want to carry out Bulk RNA-seq count-based differential expression analysis with Bioconductor (e.g. limma, edgeR or DESeq2 ) but I'm unsure how to adapt the standard workflow for this dual-sequence setup.
Experimental Design:
- Organism: Plant.
- Samples: 3 replicates of Condition A and 3 replicates of Condition B.
- Data: Paired-end RNA-seq reads (150 bp, 30 millions reads for sample) aligned to a haplotype-resolved genome assembly.
- Goal: Identify DE genes between Conditions A and B, accounting for haplotype-specific expression.
I would appreciate your opinions on:
- It would make sense to concatenate the two haplotype FASTAs (and GTFs) into one "merged" reference, or it would be better to keep them separate and run two parallel alignments?
I was wondering how to use subread package to take into account haplotype information.
I was wondering how to build the count matrix:
- Option A: Separate counts per haplotype (two columns per sample) and then sum counts for downstream DE?
Option B: Sum at the gene level before DE and ignore haplotypeorigin? Option C: Test allele-specific expression by including haplotype as a factor in the design?
About the Statistical modelling in DESeq2/limma/edgeR:
- If I keep haplotypes separate, can I simply aggregate counts (geneA_hap1 + geneA_hap2) into a single count per sample?
- If I wish to model allele-specific changes (e.g. hap1 vs. hap2 expression bias across conditions), what design formulas or contrasts are recommended?
I know these are a lot of questions and not really focused on code that can actually be used. My idea was to get some general opinions about the procedure first, and then focus on the specific code for the analysis. Thanks in advance!
Given that you have cross-posted to Biostars and under a different name, it would be helpful to link to the cross-post: https://www.biostars.org/p/9612183/