Hi all,
I am new to edgeR and I would like to get some advice on constructing a design matrix for my RNA-seq data.
We have 24 subjects from 12 families and in this experiment we want to measure the difference between Disease and Healthy, adjusting for age/genetic differences (Family), as well as gender and/or sequencing lane bias (gender effect is stronger).
We have 7 gender-matched families while the other 5 families are not gender-matched.
In this case we are not interested in the effect of the disease on an individual family, but the average effect of the disease over 12 different families. And we are not interested in the gender differences on disease too.
Donor Diagnosis Family Gender seqLane
1 Disease A Male Lane 1
2 Healthy A Male Lane 1
3 Disease B Male Lane 1
4 Healthy B Male Lane 1
5 Disease C Female Lane 1
6 Healthy C Female Lane 1
7 Disease D Male Lane 1
8 Healthy D Female Lane 1
9 Disease E Male Lane 2
10 Healthy E Male Lane 2
11 Disease F Male Lane 2
12 Healthy F Male Lane 2
13 Disease G Female Lane 2
14 Healthy G Female Lane 2
15 Disease H Male Lane 2
16 Healthy H Female Lane 2
17 Disease I Male Lane 3
18 Healthy I Male Lane 3
19 Disease J Male Lane 3
20 Healthy J Female Lane 3
21 Disease K Male Lane 3
22 Healthy K Female Lane 3
23 Disease L Male Lane 3
24 Healthy L Female Lane 3
- I am having trouble setting up the right design matrix for this analysis,
design <- model.matrix(~0 + Family + Family:Diagnosis + Gender + seqLane)
Is this formula close to what I am trying to achieve?
- If I needed to use RUVg (with empirical control genes) to remove effects from unwanted variables before DE analysis (where effects from gender/seq bias are stronger than disease), do I include just the covariates of interest - Diagnosis, Family and factors computed by RUVg (W_1) as following,
design <- model.matrix(~0 + Diagnosis + Family + W_1)
Any opinions are greatly appreciated. Thanks for your time!
I know that's a valid design, but can you really model family well with only two members in each family, when they also differ in diagnosis?
It's a paired comparison, equivalent to a paired t-test. The fact the two members of each family differ in diagnosis is the whole reason why the paired comparison works. The family effects are completely removed, not modeled.
Hi Gordon,
Thanks for your advice it is very helpful.
If we were able to sequence another two gender-unmatched families (with both females with Disease and males are Healthy), assuming sequencing batch effect is accounted for in that scenario - do you think that would help control for the gender confounder in this study?
Yes, it would help somewhat. The strategy I suggested (remove sex-linked genes) should work reasonably well regardless of whether you sequence more samples or not, providing that the disease itself is not strongly sex-linked.
Hi Gordon,
It is not a sex-linked condition though a few literature has reported a slight difference in incidence between males and females. I will proceed to remove the sex-linked genes from the analysis as you suggested. Thanks for your input!