Nested design matrix in edgeR
1
0
Entering edit mode
ScA • 0
@sca-23526
Last seen 3.9 years ago

Hi all,

I am new to edgeR and I would like to get some advice on constructing a design matrix for my RNA-seq data.

We have 24 subjects from 12 families and in this experiment we want to measure the difference between Disease and Healthy, adjusting for age/genetic differences (Family), as well as gender and/or sequencing lane bias (gender effect is stronger).

We have 7 gender-matched families while the other 5 families are not gender-matched.

In this case we are not interested in the effect of the disease on an individual family, but the average effect of the disease over 12 different families. And we are not interested in the gender differences on disease too.

Donor   Diagnosis Family    Gender      seqLane
1       Disease     A       Male        Lane 1
2       Healthy     A       Male        Lane 1
3       Disease     B       Male        Lane 1
4       Healthy     B       Male        Lane 1
5       Disease     C       Female      Lane 1
6       Healthy     C       Female      Lane 1
7       Disease     D       Male        Lane 1
8       Healthy     D       Female      Lane 1
9       Disease     E       Male        Lane 2
10      Healthy     E       Male        Lane 2
11      Disease     F       Male        Lane 2
12      Healthy     F       Male        Lane 2
13      Disease     G       Female      Lane 2
14      Healthy     G       Female      Lane 2
15      Disease     H       Male        Lane 2
16      Healthy     H       Female      Lane 2
17      Disease     I       Male        Lane 3
18      Healthy     I       Male        Lane 3
19      Disease     J       Male        Lane 3
20      Healthy     J       Female      Lane 3
21      Disease     K       Male        Lane 3
22      Healthy     K       Female      Lane 3
23      Disease     L       Male        Lane 3
24      Healthy     L       Female      Lane 3
  1. I am having trouble setting up the right design matrix for this analysis,

design <- model.matrix(~0 + Family + Family:Diagnosis + Gender + seqLane)

Is this formula close to what I am trying to achieve?

  1. If I needed to use RUVg (with empirical control genes) to remove effects from unwanted variables before DE analysis (where effects from gender/seq bias are stronger than disease), do I include just the covariates of interest - Diagnosis, Family and factors computed by RUVg (W_1) as following,

design <- model.matrix(~0 + Diagnosis + Family + W_1)

Any opinions are greatly appreciated. Thanks for your time!

edger limma paired blocking ruvseq • 1.4k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 26 minutes ago
WEHI, Melbourne, Australia

This is a paired-comparison experiment so the design matrix is simply

design <- model.matrix(~ Family + Diagnosis)

There is an additional problem with the families that are not gender-matched. The problem is that every one of the non-matched families confounds Female with Healthy and Male with Disease, meaning that there is no way to unravel the two effects. If I was analysing your experiment, I would remove all sex-linked genes from the analysis (Xist and Y chromosome) and then ignore gender. If you don't wish to do that, then you will essentially need to remove the non-matched families from the analysis.

There is no need to adjust for sequencing lane as it is already accounted for by Family.

ADD COMMENT
0
Entering edit mode

I know that's a valid design, but can you really model family well with only two members in each family, when they also differ in diagnosis?

ADD REPLY
0
Entering edit mode

It's a paired comparison, equivalent to a paired t-test. The fact the two members of each family differ in diagnosis is the whole reason why the paired comparison works. The family effects are completely removed, not modeled.

ADD REPLY
0
Entering edit mode

Hi Gordon,

Thanks for your advice it is very helpful.

If we were able to sequence another two gender-unmatched families (with both females with Disease and males are Healthy), assuming sequencing batch effect is accounted for in that scenario - do you think that would help control for the gender confounder in this study?

ADD REPLY
0
Entering edit mode

Yes, it would help somewhat. The strategy I suggested (remove sex-linked genes) should work reasonably well regardless of whether you sequence more samples or not, providing that the disease itself is not strongly sex-linked.

ADD REPLY
0
Entering edit mode

Hi Gordon,

It is not a sex-linked condition though a few literature has reported a slight difference in incidence between males and females. I will proceed to remove the sex-linked genes from the analysis as you suggested. Thanks for your input!

ADD REPLY

Login before adding your answer.

Traffic: 585 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6