nested modeling for unequal groups with edgeR
1
0
Entering edit mode
@emmanouela-repapi-6515
Last seen 2.0 years ago
United Kingdom

Hello, 

I have a fairly complicated design of experiment and I would like some help/feedback on designing the model.matrix. The data is coming from an experiment for which there are two groups of mice (young/old), the cells of which have been used for sorting populations with two markers (sort1/sort2) and each sort has positive cells and negative cells. The problem is that the mice from which the cells are coming are nested within both the sorts and the age groups and that some groups have 3 some 4 and some 5 mice. To explain a bit better, my samples matrix looks like this:

  sort cell age mouse mouse_nest
sample1 sort1 positive young 1 1
sample2 sort1 negative young 1 1
sample3 sort1 positive young 2 2
sample4 sort1 negative young 2 2
sample5 sort1 positive young 3 3
sample6 sort1 negative young 3 3
sample7 sort1 positive young 4 4
sample8 sort1 negative young 4 4
sample9 sort1 positive young 5 5
sample10 sort1 negative young 5 5
sample11 sort1 positive old 6 1
sample12 sort1 negative old 6 1
sample13 sort1 positive old 7 2
sample14 sort1 negative old 7 2
sample15 sort1 positive old 8 3
sample16 sort1 negative old 8 3
sample17 sort1 positive old 9 4
sample18 sort1 negative old 9 4
sample19 sort2 positive young 10 1
sample20 sort2 negative young 10 1
sample21 sort2 positive young 11 2
sample22 sort2 negative young 11 2
sample23 sort2 positive young 12 3
sample24 sort2 negative young 12 3
sample25 sort2 positive old 13 1
sample26 sort2 negative old 13 1
sample27 sort2 positive old 14 2
sample28 sort2 negative old 14 2
sample29 sort2 positive old 15 3
sample30 sort2 negative old 15 3

Initially I thought of splitting the data in two (sort1 and sort2 groups) and then using a nested design within that:

  design <- model.matrix( ~ cell + age +  age:cell + age:mouse_nest)

which works for sort2 but not for sort1 because the groups of mice are different for the two groups of young and old (5 vs 4 samples per group). As far as I understand the way to resolve this is either to remove a pair of samples so that I have 4 samples in each group or to remove the age:mouse_nest term. However, neither of the two solutions sounds great to me because a) don't like removing samples and b) there seem to be differences according to the mice. How do people go about choosing which is best, looking at the dispersion estimates? Any other ways to resolve this?

Also I would like to be able to compare between the positive cells of one marker (sort1) with the positive cells of the other marker (sort2) so I would like to put all the samples together but then the problem with the nesting becomes even greater because of the differences in group sizes. Is the best way to just put these samples together (sort1+ve vs sort2+ve) for young and old and forget about the nesting all together, using a design matrix like the below:

  design <- model.matrix( ~ age + sort + age:sort)

(or the equivalent form of combining them into one factor and using contrasts)

Thank you in advance for all your help!

Best wishes,

Emma

edger r edger de • 947 views
ADD COMMENT
2
Entering edit mode
@ryan-c-thompson-5618
Last seen 7 months ago
Scripps Research, La Jolla, CA

The only factor that is nested inside another factor is mouse. So I think your best bet is to use the limma-voom with duplicateCorrelation model. Assuming you want the complete 3-way interaction between age, sort, and cell, I would create a group variable and use that in the design:

library(limma)
group <- interaction(sort, cell, age, sep=".", drop=TRUE)
design <- model.matrix(~0 + group)

and then proceed with the analysis as described here[1], using mouse as the block argument to duplicateCorrelation. If you don't want the 3-way interaction, use whatever design you like involving those three variables, but leave out mouse, since that is handled as a random effect by duplicateCorrelation.

I don't think the unequal group sizes are an issue in this design, as long as you are properly modelling all the variables involved.

[1]: A: using duplicateCorrelation with limma+voom for RNA-seq data

ADD COMMENT
0
Entering edit mode

Thank you for your answer Ryan! In my mind the cell factor is also nested within the sort because the positive cells are specific for the sort in question. There shouldn't be a great difference in the negative cells of the two sorts because they are just the remaining cells from either sort. Although in theory if you are taking out different things from the same pools of cells then you are left with different groups of cells, I wouldn't expect significant differences. 

I think you are right in using limma for this analysis. Assuming that cell is also nested, something like this :

 design <- model.matrix( ~ sort + sort:cell + age + age:cell + sort:age + sort:age:cell)

would look unreasonably complicated to interpret properly, so I guess the best way is to keep the main effect of cell even if not much comes out of it. Correct me if I am wrong.

Many thanks, 

Emma

ADD REPLY
0
Entering edit mode

If you think that the negatives for both sorts should be equivalent, you can represent this by creating a single factor with 3 levels: "negitive", "positive1", "positive2". By using this factor in place of sort and cell, you'll be comparing both positive groups to a common baseline consisting of all the negative samples from both sorts.

Either way, my recommended way to construct a design matrix for any interaction model is still to combine all the interacting factors into a single "group" variable as demonstrated above and then use a design of ~0+group, giving you a coefficient for each unique combination of factor levels, and then to construct contrasts between your groups of interest.

ADD REPLY
0
Entering edit mode

With regard to your suggested design above, be aware that as long as you have sort:age:cell in the design, including or excluding any of the previous terms will only result in a different parametrization of the same design. So the design that you suggested is just a more complicated version of my suggested ~0+group.

(This only applies to factor variables, though. I think the situation with numeric/continuous variables is a bit different.)

ADD REPLY

Login before adding your answer.

Traffic: 841 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6