Hello,
This question is crossposted from Biostars as I wasn't sure which platform is the more appropriate one for asking it.
I am studying the gene expression of a species that has undergone a duplication event. I have a synteny table of gene duplicates for multiple tissue types, which was derived using the genome of a related ancestral species (that existed prior to the duplication event).
I want to identify loci where the duplicates have significantly different expressions - I was wondering if I could use DESeq2 to do this. In particular, I was going to set up a table with samples consisting of all tissue x duplicate pairs that looks as follows:
Locus_id | t1_d1_r1 | t1_d1_r2 | t1_d1_r3 | t1_d2_r1 | t1_d2_r2 |t1_d2_r3 | t2_d1_r1 | t2_d1_r2 | t2_d1_r3 |....
Here t denotes the tissue type, d denotes the duplicate (corresponding to subgenomes 1 and 2) and r indicates one of three replicates. I was then considering constructing a design matrix that can identify differentially expressed loci - for example conducting a log ratio test to see if the duplicate factor is significant in the design.
My question is whether this violates assumptions of deseq2 framework. I assumed that because the gene pairs are duplicates, it is okay to determine the means and dispersion estimates for each gene pair.
Any feedback on this is much appreciated.
Thank you.