deseq2 design with two variables
Hello, I `ve conducted DEG analysis by Deseq2 and I just want to make sure that I´m right. I have to following information for the design:


region: region1, region 1, region 2,

species: species1, species 2, species1

group: region1_species1, region1_species2, region2_species1 (unfortunately, region2_species 2 is missing!)

--> So I have both species from one region, but only species1 in region2 and I want to test for species and region. Within each region/species I have 5 replicates.

Is it ok to use the combined factor (group) to test for my comparisons of interest:

--> differences in species: region1_species1 vs region1_species 2

--> differences in region: region1_species1 vs region2_species1

dds_all<- DESeqDataSetFromMatrix(countData = matrix,
                                 colData = info, 
                                 design = ~ group)

Or is it better to use a nested design (somethink like: ~species+region)?

What I can see so far is a huge effect of species, so it makes sense to separate them (e.g. to test for region within respective species.

Regarding the correct statistical modeling approach, I unfortunately don't have time to vet people's analysis plan but have to reserve my time for DESeq2 software usage issues. I recommend collaborating with a statistician to help choose the correct design and contrasts.

Hi Michael, I totally understand. I´m asking since both options (summarizing factors to one & nested design) seems to be ok from statistical point of view. But it would be nice to get an opinion about the design which does not contain "equal" (=region2_species2 is missing) factor levels; I was thinking to get support here for such "special cases".

I just really don't have any extra time to consult in this way on the support site. Sorry!


