Question

Building contrasts in edgeR with hierarchical and unbalanced experiment design

0

Entering edit mode

David • 0

@bca76025

Last seen 2.9 years ago

United States

Hello edgeR support team,

First off, thank you for the incredibly useful and well-documented package. It has been a great help for our research.

We have a data set with two tissues sampled from many species, which are clustered into several species groups (orthology assignment is upstream of this analysis). The sampling is unbalanced in both number of biological replicates per species and number of species per group (but balanced between tissue types, which is the primary contrast of interest). Here's a small example of what we're working with, including the options for design groupings under consideration:

> ExSamples
          Species Tissue Replicate SpeciesGroup Option1 Option2
Sample.1     Spp1     T1      rep1         Grp1 Grp1.T1 Spp1.T1
Sample.2     Spp1     T2      rep1         Grp1 Grp1.T2 Spp1.T2
Sample.3     Spp1     T1      rep2         Grp1 Grp1.T1 Spp1.T1
Sample.4     Spp1     T2      rep2         Grp1 Grp1.T2 Spp1.T2
Sample.5     Spp2     T1      rep1         Grp1 Grp1.T1 Spp2.T1
Sample.6     Spp2     T2      rep1         Grp1 Grp1.T2 Spp2.T2
Sample.7     Spp3     T1      rep1         Grp1 Grp1.T1 Spp3.T1
Sample.8     Spp3     T2      rep1         Grp1 Grp1.T2 Spp3.T2
Sample.9     Spp3     T1      rep2         Grp1 Grp1.T1 Spp3.T1
Sample.10    Spp3     T2      rep2         Grp1 Grp1.T2 Spp3.T2
Sample.11    Spp4     T1      rep1         Grp2 Grp2.T1 Spp4.T1
Sample.12    Spp4     T2      rep1         Grp2 Grp2.T2 Spp4.T2
Sample.13    Spp4     T1      rep2         Grp2 Grp2.T1 Spp4.T1
Sample.14    Spp4     T2      rep2         Grp2 Grp2.T2 Spp4.T2
Sample.15    Spp5     T1      rep1         Grp3 Grp3.T1 Spp5.T1
Sample.16    Spp5     T2      rep1         Grp3 Grp3.T2 Spp5.T2
Sample.17    Spp5     T1      rep2         Grp3 Grp3.T1 Spp5.T1
Sample.18    Spp5     T2      rep2         Grp3 Grp3.T2 Spp5.T2
Sample.19    Spp6     T1      rep1         Grp3 Grp3.T1 Spp6.T1
Sample.20    Spp6     T2      rep1         Grp3 Grp3.T2 Spp6.T2

For one portion of the analysis we'd like to contrast the two tissues across all species, giving equal weight to each species group, and within the groups equal weight to each species. My first approach was to use species group by tissue as the design levels:

> design.Option1 <- model.matrix(~ 0 + ExSamples$Option1)
> colnames(design.Option1) <- levels(ExSamples$Option1)
> contrast.Option1 <- makeContrasts( ( Grp1.T1 + Grp2.T1 + Grp3.T1 ) / 3
+                                  - ( Grp1.T2 + Grp2.T2 + Grp3.T2 ) / 3 ,
+                                    levels = design.Option1 )

However this results in the better-sampled species dominating the within-group analysis, since each replicate is given equal weight within the species group. In this example for instance species group 3 is dominated by species 5, whereas philosophically we'd like to treat species 5 and 6 equivalently.

I'm also concerned it's inappropriate to use the higher-level grouping to calculate normalization and dispersion.

An alternate approach that I think solves these problems would be to use species by tissue as the design level, then introduce variation in species weight (to account for different species per group) when defining the contrast:

> design.Option2 <- model.matrix( ~ 0 + ExSamples$Option2)
> colnames(design.Option2) <- levels(ExSamples$Option2)
> contrast.Option2 <- makeContrasts( ( (Spp1.T1 + Spp2.T1 + Spp3.T1)/3
+                                       + (Spp4.T1)/1
+                                       + (Spp5.T1 + Spp6.T1)/2 ) / 3
+                                  - ( (Spp1.T2 + Spp2.T2 + Spp3.T2)/3
+                                       + (Spp4.T2)/1
+                                       + (Spp5.T2 + Spp6.T2)/2 ) / 3 ,
+                                  levels = design.Option2)

To wrap up here are some specific questions/requests for advice:

Does this second approach accomplish what I'm aiming for, namely to both give each species group equal weight AND weight each species equally within its group?
If this is a valid approach is there a more natural way within edgeR to define the contrast?
Somewhat tangentially, is it ever appropriate to use higher-level groupings for calculating normalization factors and dispersion values, when a more granular and biologically appropriate grouping is available?

Thank you very much for taking the time to consider this problem. I look forward to your response,

David

edgeR • 424 views

ADD COMMENT • link 2.9 years ago David • 0