Question: How to account for population variation within regions in EdgeR?
0
gravatar for swaegers.janne
3.1 years ago by
swaegers.janne0 wrote:

Hi everyone,

I am trying to fit a multi factor comparison using EdgeR

I want to analyse expression data from samples collected from different populations at different latitudes (3 populations at both latitudes) which were subjected to a treatment.

I want to know the effect of latitude (lat), treatment (treat) and their interaction on the expression data.

My data looks like this:

sample    pop    lat    treat
1    1    N    1
2    1    N    1
3    1    N    2
4    1    N    2
5    2    N    1
6    2    N    1
7    2    N    2
8    2    N    2
9    3    N    1
10    3    N    1
11    3    N    2
12    3    N    2
13    4    S    1
14    4    S    1
15    4    S    2
16    4    S    2
17    5    S    1
18    5    S    1
19    5    S    2
20    5    S    2
21    6    S    1
22    6    S    1
23    6    S    2
24    6    S    2

When I want to fit a model using this design

design <- model.matrix(~lat+pop+treat+lat:treat)

the matrix is not of full rank because pop is nested in lat.

I have read in another post (DESeq2 paired multifactor test) and in the Edge R user guide (section 3.5) that this can be solved by creating another population variable (pop.nested) like this:

sample    pop    lat    treat    pop.nested
1    1    N    1    1
2    1    N    1    1
3    1    N    2    1
4    1    N    2    1
5    2    N    1    2
6    2    N    1    2
7    2    N    2    2
8    2    N    2    2
9    3    N    1    3
10    3    N    1    3
11    3    N    2    3
12    3    N    2    3
13    4    S    1    1
14    4    S    1    1
15    4    S    2    1
16    4    S    2    1
17    5    S    1    2
18    5    S    1    2
19    5    S    2    2
20    5    S    2    2
21    6    S    1    3
22    6    S    1    3
23    6    S    2    3
24    6    S    2    3

Yet, how does EdgeR handle this data? It seems statistically not sound to me as now population 1 and 4 are assumed to be same in the model?

Could someone comment on this?
Thanks a lot!

 

 

 

ADD COMMENTlink modified 2.9 years ago • written 3.1 years ago by swaegers.janne0
Answer: How to account for population variation within regions in EdgeR?
1
gravatar for Aaron Lun
3.1 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

The best way to proceed would be something like this:

pop <- factor(rep(1:6, each=4))
lat <- factor(rep(c("N", "S"), each=12))
treatment <- factor(rep(rep(1:2, each=2), 6))
design <- model.matrix(~0 + pop + lat:treatment)
design <- design[,-grep("treatment1", colnames(design))]
colnames(design) <- sub(":", "_", colnames(design))

Now, looking at colnames(design), you get:

[1] "pop1"            "pop2"            "pop3"            "pop4"           
[5] "pop5"            "pop6"            "latN_treatment2" "latS_treatment2"

The first 6 coefficients represent blocking factors for the populations, while the last two represent latitude-specific treatment effects (of treatment 2 over treatment 1). You can drop them separately to test for the effect of treatment at either latitude; or you can compare them to each other, to test for differential treatment effects between latitudes:

con <- makeContrasts(latN_treatment2 - latS_treatment2, levels=design)

... or you can average them, to test for the average effect of treatment across both latitudes:

con <- makeContrasts((latN_treatment2 + latS_treatment2)/2, levels=design)

However, you cannot directly test for the effect of latitude because this is nested in population. Comparisons between populations are not sensible in this model as we've used all the population-level information to estimate the pop* coefficients. If you want to do this, you should consider using voom with duplicateCorrelation, which allows you to compare between levels of a blocking factor.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Aaron Lun24k

Hi, I have another question related to this answer. Can I also compare latN.treat20 to latS.treat20 using this design? So compare the gene expression within the 20 treatment between the N and S latitude? How would you go about to test this? Thanks in advance!

ADD REPLYlink written 2.9 years ago by swaegers.janne0

Where did the 20 come from? Your post only has values of 1 or 2 for treat.

ADD REPLYlink written 2.9 years ago by Aaron Lun24k

Sorry, I used the wrong numbers. So using the above-mentioned design, my question is: Can I also compare latN_treatment1 to latS.treatment1 using this design? So compare the gene expression within the treatment 1 between the N and S latitude? How would you go about to test this? Would this involve making a subset of the data? Thanks in advance!

ADD REPLYlink written 2.9 years ago by swaegers.janne0

Remember, latN_treatment2 and latS_treatment2 represent the log-fold change between treatments at each latitude; you can't have a separate coefficient like latN_treatment1 or latS_treatment1 in this design, because the log-fold change is already computed in treatment 2 relative to treatment 1.

More generally, if you want to directly compare expression between the N/1 and S/1 groups, you can't do this with this design because the different latitudes belong in different pop levels. You've already blocked on pop, so a direct comparison is simply not possible. Rather, you'll have to subset the data to only take the treatment 1 samples, and then pool the two treatment 1 samples at each pop level, such that there would be one pooled library per pop level. These can be treated as direct replicates and modelled with a simple one-way layout based on latitude.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Aaron Lun24k

Allright, thanks!

I'm still wondering though: could I also make the dummy variable "pop.nested" (as below) and then make a design like this: design = ~0 +lat + lat:pop.n

Would this be a valid alternative for pooling my samples within the populations?

sample    pop    lat    treat    pop.nested
1    1    N    1    1
2    1    N    1    1
5    2    N    1    2
6    2    N    1    2
9    3    N    1    3
10    3    N    1    3
13    4    S    1    1
14    4    S    1    1
17    5    S    1    2
18    5    S    1    2
21    6    S    1    3
22    6    S    1    3
ADD REPLYlink written 2.9 years ago by swaegers.janne0

No. This doesn't properly account for correlations for samples from the same pop. Consider a gene with no latitude effect; in this situation, your design implies that samples 1, 2, 13 and 14 should be independent replicates. However, 1 and 2 are from the same pop and will be correlated, as will 13 and 14. Failure to account for this means that the degrees of freedom in the model - and the confidence of downstream inferences - will be overstated. This is why I recommended pooling, to avoid worrying about these correlations.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Aaron Lun24k

Thanks for your reply! Do you have a suggestion on how to pool the data within populations? Just add up the counts? Or would you first do a normalisation procedure? In EdgeR would this mean adding up the rows of the samples within populatons after y <- calcNormFactors(y) ?

ADD REPLYlink written 2.9 years ago by swaegers.janne0
1

Run sumTechReps before calcNormFactors.

ADD REPLYlink written 2.9 years ago by Aaron Lun24k
Answer: How to account for population variation within regions in EdgeR?
0
gravatar for swaegers.janne
3.0 years ago by
swaegers.janne0 wrote:

Thanks for your clear and fast answer!

ADD COMMENTlink written 3.0 years ago by swaegers.janne0
Answer: How to account for population variation within regions in EdgeR?
0
gravatar for swaegers.janne
2.9 years ago by
swaegers.janne0 wrote:

Hi again,

I actually have the same question regarding DESeq2. How can I implement the above method to account for populations into the design formula of DESeq2? I can not find a way to edit the design matrix as you can not give a matrix as input in DESeq2. At least, I can't find a straightforward way to do that.

Any help is greatly appreciated.

Best,

Janne

ADD COMMENTlink written 2.9 years ago by swaegers.janne0

Probably best to post this as a separate question with a DESeq2 tag.

ADD REPLYlink written 2.9 years ago by Aaron Lun24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 248 users visited in the last hour