Question

How to account for population variation within regions in EdgeR?

0

Entering edit mode

swaegers.janne • 0

@swaegersjanne-11229

Last seen 7.5 years ago

Hi everyone,

I am trying to fit a multi factor comparison using EdgeR

I want to analyse expression data from samples collected from different populations at different latitudes (3 populations at both latitudes) which were subjected to a treatment.

I want to know the effect of latitude (lat), treatment (treat) and their interaction on the expression data.

My data looks like this:

sample   pop   lat   treat
1   1   N   1
2   1   N   1
3   1   N   2
4   1   N   2
5   2   N   1
6   2   N   1
7   2   N   2
8   2   N   2
9   3   N   1
10   3   N   1
11   3   N   2
12   3   N   2
13   4   S   1
14   4   S   1
15   4   S   2
16   4   S   2
17   5   S   1
18   5   S   1
19   5   S   2
20   5   S   2
21   6   S   1
22   6   S   1
23   6   S   2
24   6   S   2

When I want to fit a model using this design

design <- model.matrix(~lat+pop+treat+lat:treat)

the matrix is not of full rank because pop is nested in lat.

I have read in another post (DESeq2 paired multifactor test) and in the Edge R user guide (section 3.5) that this can be solved by creating another population variable (pop.nested) like this:

sample   pop   lat   treat   pop.nested
1   1   N   1   1
2   1   N   1   1
3   1   N   2   1
4   1   N   2   1
5   2   N   1   2
6   2   N   1   2
7   2   N   2   2
8   2   N   2   2
9   3   N   1   3
10   3   N   1   3
11   3   N   2   3
12   3   N   2   3
13   4   S   1   1
14   4   S   1   1
15   4   S   2   1
16   4   S   2   1
17   5   S   1   2
18   5   S   1   2
19   5   S   2   2
20   5   S   2   2
21   6   S   1   3
22   6   S   1   3
23   6   S   2   3
24   6   S   2   3

Yet, how does EdgeR handle this data? It seems statistically not sound to me as now population 1 and 4 are assumed to be same in the model?

Could someone comment on this?
Thanks a lot!

edger multifactor nested design • 1.5k views

ADD COMMENT • link 7.8 years ago • updated 7.6 years ago swaegers.janne • 0

score 1 · Answer 1 · 2016-08-04

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 5 hours ago

The city by the bay

The best way to proceed would be something like this:

pop <- factor(rep(1:6, each=4))
lat <- factor(rep(c("N", "S"), each=12))
treatment <- factor(rep(rep(1:2, each=2), 6))
design <- model.matrix(~0 + pop + lat:treatment)
design <- design[,-grep("treatment1", colnames(design))]
colnames(design) <- sub(":", "_", colnames(design))

Now, looking at colnames(design), you get:

[1] "pop1"            "pop2"            "pop3"            "pop4"           
[5] "pop5"            "pop6"            "latN_treatment2" "latS_treatment2"

The first 6 coefficients represent blocking factors for the populations, while the last two represent latitude-specific treatment effects (of treatment 2 over treatment 1). You can drop them separately to test for the effect of treatment at either latitude; or you can compare them to each other, to test for differential treatment effects between latitudes:

con <- makeContrasts(latN_treatment2 - latS_treatment2, levels=design)

... or you can average them, to test for the average effect of treatment across both latitudes:

con <- makeContrasts((latN_treatment2 + latS_treatment2)/2, levels=design)

However, you cannot directly test for the effect of latitude because this is nested in population. Comparisons between populations are not sensible in this model as we've used all the population-level information to estimate the pop* coefficients. If you want to do this, you should consider using voom with duplicateCorrelation, which allows you to compare between levels of a blocking factor.

ADD COMMENT • link 7.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi, I have another question related to this answer. Can I also compare latN.treat20 to latS.treat20 using this design? So compare the gene expression within the 20 treatment between the N and S latitude? How would you go about to test this? Thanks in advance!

ADD REPLY • link 7.6 years ago swaegers.janne • 0

0

Entering edit mode

Where did the 20 come from? Your post only has values of 1 or 2 for treat.

ADD REPLY • link 7.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

Sorry, I used the wrong numbers. So using the above-mentioned design, my question is: Can I also compare latN_treatment1 to latS.treatment1 using this design? So compare the gene expression within the treatment 1 between the N and S latitude? How would you go about to test this? Would this involve making a subset of the data? Thanks in advance!

ADD REPLY • link 7.6 years ago swaegers.janne • 0

0

Entering edit mode

Remember, latN_treatment2 and latS_treatment2 represent the log-fold change between treatments at each latitude; you can't have a separate coefficient like latN_treatment1 or latS_treatment1 in this design, because the log-fold change is already computed in treatment 2 relative to treatment 1.

More generally, if you want to directly compare expression between the N/1 and S/1 groups, you can't do this with this design because the different latitudes belong in different pop levels. You've already blocked on pop, so a direct comparison is simply not possible. Rather, you'll have to subset the data to only take the treatment 1 samples, and then pool the two treatment 1 samples at each pop level, such that there would be one pooled library per pop level. These can be treated as direct replicates and modelled with a simple one-way layout based on latitude.

ADD REPLY • link 7.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

Allright, thanks!

I'm still wondering though: could I also make the dummy variable "pop.nested" (as below) and then make a design like this: design = ~0 +lat + lat:pop.n

Would this be a valid alternative for pooling my samples within the populations?

sample	pop	lat	treat	pop.nested
1	1	N	1	1
2	1	N	1	1
5	2	N	1	2
6	2	N	1	2
9	3	N	1	3
10	3	N	1	3
13	4	S	1	1
14	4	S	1	1
17	5	S	1	2
18	5	S	1	2
21	6	S	1	3
22	6	S	1	3

ADD REPLY • link 7.6 years ago swaegers.janne • 0

0

Entering edit mode

No. This doesn't properly account for correlations for samples from the same pop. Consider a gene with no latitude effect; in this situation, your design implies that samples 1, 2, 13 and 14 should be independent replicates. However, 1 and 2 are from the same pop and will be correlated, as will 13 and 14. Failure to account for this means that the degrees of freedom in the model - and the confidence of downstream inferences - will be overstated. This is why I recommended pooling, to avoid worrying about these correlations.

ADD REPLY • link 7.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks for your reply! Do you have a suggestion on how to pool the data within populations? Just add up the counts? Or would you first do a normalisation procedure? In EdgeR would this mean adding up the rows of the samples within populatons after y <- calcNormFactors(y) ?

ADD REPLY • link 7.6 years ago swaegers.janne • 0

1

Entering edit mode

Run sumTechReps before calcNormFactors.

ADD REPLY • link 7.6 years ago Aaron Lun ★ 28k

score 0 · Answer 2 · 2016-08-07

0

Entering edit mode

swaegers.janne • 0

@swaegersjanne-11229

Last seen 7.5 years ago

Thanks for your clear and fast answer!

ADD COMMENT • link 7.8 years ago swaegers.janne • 0

score 0 · Answer 3 · 2016-09-20

0

Entering edit mode

swaegers.janne • 0

@swaegersjanne-11229

Last seen 7.5 years ago

Hi again,

I actually have the same question regarding DESeq2. How can I implement the above method to account for populations into the design formula of DESeq2? I can not find a way to edit the design matrix as you can not give a matrix as input in DESeq2. At least, I can't find a straightforward way to do that.

Any help is greatly appreciated.

Best,

Janne

ADD COMMENT • link 7.6 years ago swaegers.janne • 0

0

Entering edit mode

Probably best to post this as a separate question with a DESeq2 tag.

ADD REPLY • link 7.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

done! How to account for population variation within regions in DESeq2?

ADD REPLY • link 7.6 years ago swaegers.janne • 0