Hello edgeR support team,
First off, thank you for the incredibly useful and well-documented package. It has been a great help for our research.
We have a data set with two tissues sampled from many species, which are clustered into several species groups (orthology assignment is upstream of this analysis). The sampling is unbalanced in both number of biological replicates per species and number of species per group (but balanced between tissue types, which is the primary contrast of interest). Here's a small example of what we're working with, including the options for design groupings under consideration:
> ExSamples
Species Tissue Replicate SpeciesGroup Option1 Option2
Sample.1 Spp1 T1 rep1 Grp1 Grp1.T1 Spp1.T1
Sample.2 Spp1 T2 rep1 Grp1 Grp1.T2 Spp1.T2
Sample.3 Spp1 T1 rep2 Grp1 Grp1.T1 Spp1.T1
Sample.4 Spp1 T2 rep2 Grp1 Grp1.T2 Spp1.T2
Sample.5 Spp2 T1 rep1 Grp1 Grp1.T1 Spp2.T1
Sample.6 Spp2 T2 rep1 Grp1 Grp1.T2 Spp2.T2
Sample.7 Spp3 T1 rep1 Grp1 Grp1.T1 Spp3.T1
Sample.8 Spp3 T2 rep1 Grp1 Grp1.T2 Spp3.T2
Sample.9 Spp3 T1 rep2 Grp1 Grp1.T1 Spp3.T1
Sample.10 Spp3 T2 rep2 Grp1 Grp1.T2 Spp3.T2
Sample.11 Spp4 T1 rep1 Grp2 Grp2.T1 Spp4.T1
Sample.12 Spp4 T2 rep1 Grp2 Grp2.T2 Spp4.T2
Sample.13 Spp4 T1 rep2 Grp2 Grp2.T1 Spp4.T1
Sample.14 Spp4 T2 rep2 Grp2 Grp2.T2 Spp4.T2
Sample.15 Spp5 T1 rep1 Grp3 Grp3.T1 Spp5.T1
Sample.16 Spp5 T2 rep1 Grp3 Grp3.T2 Spp5.T2
Sample.17 Spp5 T1 rep2 Grp3 Grp3.T1 Spp5.T1
Sample.18 Spp5 T2 rep2 Grp3 Grp3.T2 Spp5.T2
Sample.19 Spp6 T1 rep1 Grp3 Grp3.T1 Spp6.T1
Sample.20 Spp6 T2 rep1 Grp3 Grp3.T2 Spp6.T2
For one portion of the analysis we'd like to contrast the two tissues across all species, giving equal weight to each species group, and within the groups equal weight to each species. My first approach was to use species group by tissue as the design levels:
> design.Option1 <- model.matrix(~ 0 + ExSamples$Option1)
> colnames(design.Option1) <- levels(ExSamples$Option1)
> contrast.Option1 <- makeContrasts( ( Grp1.T1 + Grp2.T1 + Grp3.T1 ) / 3
+ - ( Grp1.T2 + Grp2.T2 + Grp3.T2 ) / 3 ,
+ levels = design.Option1 )
However this results in the better-sampled species dominating the within-group analysis, since each replicate is given equal weight within the species group. In this example for instance species group 3 is dominated by species 5, whereas philosophically we'd like to treat species 5 and 6 equivalently.
I'm also concerned it's inappropriate to use the higher-level grouping to calculate normalization and dispersion.
An alternate approach that I think solves these problems would be to use species by tissue as the design level, then introduce variation in species weight (to account for different species per group) when defining the contrast:
> design.Option2 <- model.matrix( ~ 0 + ExSamples$Option2)
> colnames(design.Option2) <- levels(ExSamples$Option2)
> contrast.Option2 <- makeContrasts( ( (Spp1.T1 + Spp2.T1 + Spp3.T1)/3
+ + (Spp4.T1)/1
+ + (Spp5.T1 + Spp6.T1)/2 ) / 3
+ - ( (Spp1.T2 + Spp2.T2 + Spp3.T2)/3
+ + (Spp4.T2)/1
+ + (Spp5.T2 + Spp6.T2)/2 ) / 3 ,
+ levels = design.Option2)
To wrap up here are some specific questions/requests for advice:
- Does this second approach accomplish what I'm aiming for, namely to both give each species group equal weight AND weight each species equally within its group?
- If this is a valid approach is there a more natural way within edgeR to define the contrast?
- Somewhat tangentially, is it ever appropriate to use higher-level groupings for calculating normalization factors and dispersion values, when a more granular and biologically appropriate grouping is available?
Thank you very much for taking the time to consider this problem. I look forward to your response,
David