Question

Nested design matrix in edgeR

0

Entering edit mode

ScA • 0

@sca-23526

Last seen 5.7 years ago

Hi all,

I am new to edgeR and I would like to get some advice on constructing a design matrix for my RNA-seq data.

We have 24 subjects from 12 families and in this experiment we want to measure the difference between Disease and Healthy, adjusting for age/genetic differences (Family), as well as gender and/or sequencing lane bias (gender effect is stronger).

We have 7 gender-matched families while the other 5 families are not gender-matched.

In this case we are not interested in the effect of the disease on an individual family, but the average effect of the disease over 12 different families. And we are not interested in the gender differences on disease too.

Donor   Diagnosis Family    Gender      seqLane
1       Disease     A       Male        Lane 1
2       Healthy     A       Male        Lane 1
3       Disease     B       Male        Lane 1
4       Healthy     B       Male        Lane 1
5       Disease     C       Female      Lane 1
6       Healthy     C       Female      Lane 1
7       Disease     D       Male        Lane 1
8       Healthy     D       Female      Lane 1
9       Disease     E       Male        Lane 2
10      Healthy     E       Male        Lane 2
11      Disease     F       Male        Lane 2
12      Healthy     F       Male        Lane 2
13      Disease     G       Female      Lane 2
14      Healthy     G       Female      Lane 2
15      Disease     H       Male        Lane 2
16      Healthy     H       Female      Lane 2
17      Disease     I       Male        Lane 3
18      Healthy     I       Male        Lane 3
19      Disease     J       Male        Lane 3
20      Healthy     J       Female      Lane 3
21      Disease     K       Male        Lane 3
22      Healthy     K       Female      Lane 3
23      Disease     L       Male        Lane 3
24      Healthy     L       Female      Lane 3

I am having trouble setting up the right design matrix for this analysis,

design <- model.matrix(~0 + Family + Family:Diagnosis + Gender + seqLane)

Is this formula close to what I am trying to achieve?

If I needed to use RUVg (with empirical control genes) to remove effects from unwanted variables before DE analysis (where effects from gender/seq bias are stronger than disease), do I include just the covariates of interest - Diagnosis, Family and factors computed by RUVg (W_1) as following,

design <- model.matrix(~0 + Diagnosis + Family + W_1)

Any opinions are greatly appreciated. Thanks for your time!

edger limma paired blocking ruvseq • 2.1k views

ADD COMMENT • link updated 5.7 years ago by Gordon Smyth 53k • written 5.7 years ago by ScA • 0

score 2 · Accepted Answer · 2020-05-14

2

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

This is a paired-comparison experiment so the design matrix is simply

design <- model.matrix(~ Family + Diagnosis)

There is an additional problem with the families that are not gender-matched. The problem is that every one of the non-matched families confounds Female with Healthy and Male with Disease, meaning that there is no way to unravel the two effects. If I was analysing your experiment, I would remove all sex-linked genes from the analysis (Xist and Y chromosome) and then ignore gender. If you don't wish to do that, then you will essentially need to remove the non-matched families from the analysis.

There is no need to adjust for sequencing lane as it is already accounted for by Family.

ADD COMMENT • link 5.7 years ago Gordon Smyth 53k

0

Entering edit mode

I know that's a valid design, but can you really model family well with only two members in each family, when they also differ in diagnosis?

ADD REPLY • link 5.7 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

It's a paired comparison, equivalent to a paired t-test. The fact the two members of each family differ in diagnosis is the whole reason why the paired comparison works. The family effects are completely removed, not modeled.

ADD REPLY • link 5.7 years ago Gordon Smyth 53k

0

Entering edit mode

Hi Gordon,

Thanks for your advice it is very helpful.

If we were able to sequence another two gender-unmatched families (with both females with Disease and males are Healthy), assuming sequencing batch effect is accounted for in that scenario - do you think that would help control for the gender confounder in this study?

ADD REPLY • link 5.7 years ago ScA • 0

0

Entering edit mode

Yes, it would help somewhat. The strategy I suggested (remove sex-linked genes) should work reasonably well regardless of whether you sequence more samples or not, providing that the disease itself is not strongly sex-linked.

ADD REPLY • link 5.7 years ago Gordon Smyth 53k

0

Entering edit mode

Hi Gordon,

It is not a sex-linked condition though a few literature has reported a slight difference in incidence between males and females. I will proceed to remove the sex-linked genes from the analysis as you suggested. Thanks for your input!

ADD REPLY • link 5.7 years ago ScA • 0