Question

EdgeR between/within comparisons (3.5) with nested non-balanced data set

1

Entering edit mode

jcalis ▴ 10

@jcalis-11562

Last seen 5.0 years ago

Dear all,

I wanted to ask if I can extend from the edgeR user manual 3.5 example, on between/within subjects comparisons.

I'm thinking on 3 ways I would like to extend:

Adding non-paired samples. From some patients I have only a single sample (ie not both tissueA and tissueB; for example, only tissueA from patient s19), is there a way to include these in the model?
Non-balanced paired sample counts. I have more paired (ie both tissueA and tissueB) samples from the Disease1 group (n=7) than from the Healthy (n=6) group, how can I include all paired samples? If I include them in the current model the coefficient DiseaseHealthy:patientIndexp7 is mentioned to be not estimable and this throws an error.
Introducing other co-factors. Can I add co-factors to the model, e.g. treatment?

See below for the samples and model.

Thanks for all your help/suggestions.

Best regards, Jorg

#

targets frame (full):

file           tissue     patient       Disease
file007       tissueA         s22      Disease1
file006       tissueB         s22      Disease1
file004       tissueA         s19      Disease1
file005       tissueB         s21       Healthy
file009       tissueB         s25      Disease1
file011       tissueB         s26       Healthy
file012       tissueB         s27       Healthy
file014       tissueA         s28      Disease1
file013       tissueB         s28      Disease1
file015       tissueA         s29       Healthy
file016       tissueB         s29       Healthy
file017       tissueB         s32       Healthy
file019       tissueA         s33      Disease1
file018       tissueB         s33      Disease1
file020       tissueA         s34       Healthy
file021       tissueB         s34       Healthy
file022       tissueA         s35      Disease1
file024       tissueA         s36       Healthy
file023       tissueB         s36       Healthy
file025       tissueB         s37      Disease1
file026       tissueA         s38      Disease1
file027       tissueB         s38      Disease1
file028       tissueA         s39      Disease1
file029       tissueB         s39      Disease1
file030       tissueA         s40      Disease1
file031       tissueB         s40      Disease1
file032       tissueA         s41      Disease1
file034       tissueA         s42       Healthy
file035       tissueA         s43       Healthy
file036       tissueB         s43       Healthy
file037       tissueA         s44       Healthy
file038       tissueB         s44       Healthy
file639       tissueA         s45       Healthy
file040       tissueB         s45       Healthy
file041       tissueA         s46      Disease1
file042       tissueA         s47       Healthy
file043       tissueB         s47       Healthy
file044       tissueA         s48      Disease1
file045       tissueB         s50      Disease1
file001       tissueA          s6      Disease1
file002       tissueB          s6      Disease1
file003       tissueA          s7      Disease1

#

targets frame (balanced/paired):

file           tissue     patient       Disease  patientIndex
file007       tissueA         s22      Disease1            p1
file006       tissueB         s22      Disease1            p1
file014       tissueA         s28      Disease1            p2
file013       tissueB         s28      Disease1            p2
file015       tissueA         s29       Healthy            p1
file016       tissueB         s29       Healthy            p1
file019       tissueA         s33      Disease1            p3
file018       tissueB         s33      Disease1            p3
file020       tissueA         s34       Healthy            p2
file021       tissueB         s34       Healthy            p2
file024       tissueA         s36       Healthy            p3
file023       tissueB         s36       Healthy            p3
file026       tissueA         s38      Disease1            p4
file027       tissueB         s38      Disease1            p4
file028       tissueA         s39      Disease1            p5
file029       tissueB         s39      Disease1            p5
file030       tissueA         s40      Disease1            p6
file031       tissueB         s40      Disease1            p6
file035       tissueA         s43       Healthy            p4
file036       tissueB         s43       Healthy            p4
file037       tissueA         s44       Healthy            p5
file038       tissueB         s44       Healthy            p5
file042       tissueA         s47       Healthy            p6
file043       tissueB         s47       Healthy            p6

This works with the model as discussed in the manual 3.5:

~Disease + Disease:patientIndex + Disease:tissue

Thanks for all your help/suggestions.

Best regards, Jorg

edgeR nested glm glmQLFit • 1.1k views

ADD COMMENT • link updated 5.0 years ago by Aaron Lun ★ 28k • written 5.0 years ago by jcalis ▴ 10

0

Entering edit mode

And your contrast of interest is... what?

ADD REPLY • link 5.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

I would like to make several contrasts:

disease vs healthy. For all tissues, and tissue specific (tissueAdisease - tissueAhealthy or tissueBdisease - tissueBhealthy)
tissueA vs tissueB.
Difference of differences contrast for disease vs healthy in one tissue vs the other: (tissueAdisease - tissueAhealthy) - (tissueBdisease - tissueBhealthy)

Thanks for your help, Jorg

ADD REPLY • link 5.0 years ago jcalis ▴ 10

score 3 · Answer 1 · 2019-04-17

3

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 34 minutes ago

The city by the bay

Use voom and block on patient with duplicateCorrelation. This is the only way to achieve all of your desired contrasts in a single model, as you cannot compare between disease states while blocking on patient in the design matrix. As a bonus, this strategy will make use of the patients that only contribute one sample, whereas these would not be informative with edgeR (as the blocking term for each patient would absorb all its information). As for your other questions:

Balancing isn't necessary, unbalanced designs do not compromise the validity of the tests.
You can add as many co-factors as you like, as long as they're not confounding. You should try to understand the assumptions regarding their interactions (or lack thereof) with the current factors.

ADD COMMENT • link 5.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks a lot for your help. I've been looking into how to implement your suggestions and came to the following. I'd like to share my current set-up here, both to see if it is correct, and to help others facing the same problem.

Aside from the factors mentioned above, I now also include another categorical co-factor, that is patient specific (e.g. sex). So, the meta data looks like this:

file           tissue     patient       Disease    otherCoFactor
file007       tissueA         s22      Disease1                X
file006       tissueB         s22      Disease1                X
file004       tissueA         s19      Disease1                Y
...
file042       tissueA         s47       Healthy                X
file043       tissueB         s47       Healthy                X
file044       tissueA         s48      Disease1                Y
file045       tissueB         s50      Disease1                X
file001       tissueA          s6      Disease1                Y
file002       tissueB          s6      Disease1                Y
file003       tissueA          s7      Disease1                X

The meta data is in an object called metaData, in which the following are set as reference levels, than making the design:

metaData$tissue        <- relevel(metaData$tissue,        ref="tissueA")
metaData$patient       <- relevel(metaData$patient,       ref="s22")
metaData$Disease       <- relevel(metaData$Disease,       ref="Healthy")
metaData$otherCoFactor <- relevel(metaData$otherCoFactor, ref="X")

design <- model.matrix(~otherCoFactor + Disease + tissue + Disease:tissue, data=metaData)

Starting with the DGElist object y:

y      <- calcNormFactors(y)
v      <- voom(y, design)
corfit <- duplicateCorrelation(v, design, block = metaData$patient)
v      <- voom(y, design, block = metaData$patient, correlation = corfit$consensus)

fit <- lmFit(v, design, block = metaData$patient, correlation = corfit$consensus)

# Using the following factors to set up contrasts:
colnames(design)
[1] "(Intercept)"                   "otherCoFactorY"
[3] "DiseaseDisease1"               "tissuetissueB" 
[5] "DiseaseDisease1:tissuetissueB"

MYcontrasts <- makeContrasts(
  Disease1vsHealthy_tissueA   = DiseaseDisease1 ,
  Disease1vsHealthy_tissueB   = 
    (DiseaseDisease1 + tissuetissueB + DiseaseDisease1:tissuetissueB) - (tissuetissueB) ,
  tissueBvsA_inHealthy  = tissuetissueB ,
  tissueBvsA_inDisease1 = 
    (DiseaseDisease1 + tissuetissueB + DiseaseDisease1:tissuetissueB) - (DiseaseDisease1) ,
  differentialDiseaseResponseBetweenTissues =  
    ((DiseaseDisease1 + tissuetissueB + DiseaseDisease1:tissuetissueB) - (tissuetissueB))
     - (DiseaseDisease1), 
  levels=design)

fit2 <- contrasts.fit(fit, MYcontrasts)
fit2 <- eBayes(fit2)

# To find DE genes for a specific contrast:
geneFit_Disease1vsHealthy_tissueB <- topTable(fit2, adjust="BH", n=Inf, coef="Disease1vsHealthy_tissueB")

# To find DE gene sets with camera for a specific contrast:
camera_Disease1vsHealthy_tissueB <- camera(v, idx, design, contrast=MYcontrasts[,"Disease1vsHealthy_tissueB"])

Let me know if you have any suggestions/comments/questions.

ADD REPLY • link 5.0 years ago jcalis ▴ 10

0

Entering edit mode

You would save yourself a lot of mental gymnastics by using a cell-means model:

grouping <- factor(paste0(metadata$tissue, ".", metadata$disease))
design <- model.matrix(~0 + grouping + metadata$otherCoFactor)

# adding column names for cleaner contrast set-up
colnames(design)[seq_len(nlevels(grouping))] <- levels(grouping)

... and then just doing your comparisons simply with:

MYcontrasts <- makeContrasts(grouping
  Disease1vsHealthy_tissueA =  A.disease1 - A.healthy ,
  Disease1vsHealthy_tissueB =  B.disease1 - B.healthy
  tissueBvsA_inHealthy  = B.healthy - A.healthy,
  tissueBvsA_inDisease1 = B.disease1 - A.disease1,
  differentialDiseaseResponseBetweenTissues =  
    (A.disease1 - A.healthy) - (B.disease1 - B.healthy),
  levels=design)

I've constructed the contrasts based on the names that you've given them, rather than trying to make them mathematically equivalent to the contrasts in your code, because I have a feeling that some of your current contrasts are wrong. I would need to sit down and look through the design matrix to be sure; but the fact that a reader can't check the correctness straight away is already a problem in itself.

ADD REPLY • link 5.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks again! I'll move on with the grouping as you suggested, I agree that it is always better to make something as easy (to follow) as possible.

Just to check whether something went wrong or not. Here is the way I selected the factors for each contrast, first by selecting factors for each group:

For A.Healthy: (Intercept) For B.Healthy: (Intercept) + tissuetissueB For A.Disease1: (Intercept) + DiseaseDisease1 For B.Disease1: (Intercept) + tissuetissueB + DiseaseDisease1 + DiseaseDisease1:tissuetissueB

Than, for the contrasts I fill out these factors based on how each group is contrasted and get the following:

Disease1vsHealthy_tissueA = A.disease1 - A.healthy = 
    ((Intercept) + DiseaseDisease1) - ((Intercept)) = 
    DiseaseDisease1

Disease1vsHealthy_tissueB = B.disease1 - B.healthy = 
    ((Intercept) + tissuetissueB + DiseaseDisease1 + DiseaseDisease1:tissuetissueB) - 
    ((Intercept) + tissuetissueB) = 
    DiseaseDisease1 + DiseaseDisease1:tissuetissueB

tissueBvsA_inHealthy = B.healthy - A.healthy = 
    ((Intercept) + tissuetissueB) - ((Intercept)) = 
    tissuetissueB

tissueBvsA_inDisease1 = B.Disease1 - A.Disease1 = 
    ((Intercept) + tissuetissueB + DiseaseDisease1 + DiseaseDisease1:tissuetissueB) - 
    ((Intercept) + DiseaseDisease1) = 
    tissuetissueB + DiseaseDisease1:tissuetissueB

# Note that for this last contrast I have set it up as B-A, not A-B as in your example. 
# Maybe that's where you thought things go wrong?

differentialDiseaseResponseBetweenTissues =  
    (B.disease1 - B.healthy) - (A.disease1 - A.healthy) = 
    (((Intercept) + tissuetissueB + DiseaseDisease1 + DiseaseDisease1:tissuetissueB) - 
    ((Intercept) + tissuetissueB)) - 
    (((Intercept) + DiseaseDisease1) - ((Intercept))) = 
    (DiseaseDisease1 + DiseaseDisease1:tissuetissueB) - (DiseaseDisease1) = 
    DiseaseDisease1:tissuetissueB

ADD REPLY • link 5.0 years ago jcalis ▴ 10

0

Entering edit mode

The theory looks fine, but you can see how much extra effort it was to prove it. Also, your code wouldn't have worked in the first place because : isn't a valid name.

ADD REPLY • link 5.0 years ago Aaron Lun ★ 28k