edgeR design matrix and contrasts for comparisons both between and within subjects
1
0
Entering edit mode
@rebeccaleajohnston-22750
Last seen 2.5 years ago
Australia

Hi Bioconductor Community,

I would like to confirm that I have constructed my design matrix appropriately and am testing the correct coefficients/contrasts given my biological questions.

My experiment is similar to Section 3.5 "Comparisons both between and within subjects" of the edgeR user guide. I found a related question asked here and so have set up my design matrix based on the answers from this post to avoid the use of interaction terms (which I find confusing). This post is also related, as we have very similar research questions.

Experimental design and design matrix

Here is a reproducible example of my experimental design and design matrix. Note that the Control and Diseased patients are different individuals.

(targets <- 
   data.frame(Disease = c(rep("Control", 6), rep("Diseased", 6)),
              Patient = c(paste0("C", (gl(3, 2, 6))), 
                          paste0("D", (gl(3, 2, 6)))),
              Tissue = rep(c("Blood", "Bone"), 6)))

    Disease Patient Tissue
1   Control      C1  Blood
2   Control      C1   Bone
3   Control      C2  Blood
4   Control      C2   Bone
5   Control      C3  Blood
6   Control      C3   Bone
7  Diseased      D1  Blood
8  Diseased      D1   Bone
9  Diseased      D2  Blood
10 Diseased      D2   Bone
11 Diseased      D3  Blood
12 Diseased      D3   Bone

Patient <- targets$Patient
Disease <- targets$Disease
Tissue <- targets$Tissue
Group <- paste0(Disease, ".", Tissue)

design <- model.matrix(~Patient + Group)
# Get to full rank
design <- design[,!grepl("Blood$", colnames(design))]
colnames(design)
[1] "(Intercept)"        "PatientC2"          "PatientC3"          "PatientD1"          "PatientD2"          "PatientD3"         
[7] "GroupControl.Bone"  "GroupDiseased.Bone"

Biological questions

I have three main biological questions. Am I using the correct approaches to answer them?

Q1) To find genes differentially expressed (DE) between Control Bone vs. Control Blood:

glmLRT(fit, coef = "GroupControl.Bone")

Q2) To find genes DE between Diseased Bone vs. Diseased Blood:

glmLRT(fit, coef = "GroupDiseased.Bone")

Q3) And most importantly, to find genes DE between Diseased Bone vs. Diseased Blood (as per Q2), but ensure they are unique to the diseased condition (i.e. not DE in Control Bone vs. Control Blood as per Q1): Find the intersect (e.g. using dplyr::full_join) of the DE gene lists from Q1 and Q2 (after filtering each list based on FDR and logFC) to determine which genes are common and unique to each group.

Or is there a better approach to answer Q3? Do I need to use limma::duplicateCorrelation? As I don't believe this contrast answers Q3:

myContrast <- makeContrasts(GroupDiseased.Bone - GroupControl.Bone, levels = design)

Rather, I believe it tests whether the differences between Bone and Blood are the same between Diseased and Control patients, which is a slightly different question.

Your help and advice would be greatly appreciated!

Many thanks, Rebecca

edger rnaseq design and contrast matrix • 2.1k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 9 minutes ago
WEHI, Melbourne, Australia

Q1) Yes.

Q2) Yes.

Q3) The following code gives the simple intersection of genes DE for Q2 but not for Q1:

Control.DE <- glmLRT(fit, coef = "GroupControl.Bone")
Disease.DE <- glmLRT(fit, coef = "GroupDisease.Bone")
Control <- decideTests( Control.DE )
Disease <- decideTests( Disease.DE )
DiseaseOnly.DE <- Disease & !Control
Top <- topTags( Disease.DE, n=Inf, sort="none")
Top.DiseaseOnly <- Top[DiseaseOnly.DE,]

Or alternatively:

Control.DE <- glmLRT(fit, coef = "GroupControl.Bone")
Disease.DE <- glmLRT(fit, coef = "GroupDisease.Bone")
Control <- topTags( Control.DE, n=Inf, sort="none")
Disease <- topTags( Disease.DE, n=Inf, sort="none")
DiseaseOnly <- Disease[Control$table$FDR > 0.05 & Disease$table$FDR < 0.05,]

There are many variations on this. If you want a stricter list, you could require a moderately small p-value for myContrast as well, to avoid including genes that are nearly the same in Disease and Control but just miss out on significance for Control by a very small margin. Or an alternative way to achieve the same purpose is to increase the FDR cutoff for the Control decideTests step.

PS. I think full_join would be disasterous in this context.

ADD COMMENT
0
Entering edit mode

Excellent, thank you so much for your help Gordon!

As a follow up question for my understanding: Given my approaches to answering Q1 and Q2 above are correct, how does this work? Because here, the intercept term represents PatientC1, yes? Yet if we specify the coefficient "GroupControl.Bone" in glmLRT it compares to "GroupControl.Blood" even though this is not the intercept term. And the same is true when specifying the coefficient "GroupDisease.Bone", it compares to "GroupDisease.Blood". Is this just a feature of additive models that I have to remember? Or how can I check what each coefficient represents, given each group doesn't have it's own coefficient?

PS. Yes I should have explained how I would use full_join in this context, see below. I get the same result as your approach, it's just a longer way to get there. I definitely prefer your way!

library("tidyverse")
Control.DE <- glmLRT(fit, coef = "GroupControl.Bone")
Disease.DE <- glmLRT(fit, coef = "GroupDisease.Bone")

Control <- topTagsControl.DE, n = Inf, sort = "none")
Disease <- topTagsDisease.DE, n = Inf, sor t= "none")

Control.table <- Control$table %>% 
  rownames_to_column("Gene.Name") %>% 
  filter(FDR < 0.05) %>%
  selectGene.Name, logFC, FDR) %>% 
  rename(logFC.Control = logFC, FDR.Control = FDR)

Disease.table <- Disease$table %>% 
  rownames_to_column("Gene.Name") %>% 
  filter(FDR < 0.05) %>%
  selectGene.Name, logFC, FDR) %>% 
  rename(logFC.Disease = logFC, FDR.Disease = FDR)

Top.Disease.rm.Control <- 
  full_join(Control.table, Disease.table, by = "Gene.Name") %>% 
  filteris.na(logFC.Control)) 
ADD REPLY
0
Entering edit mode

You're asking me to explain your own design matrix? Including Patient in the linear model adjusts for the baseline level of al the patients. It makes no difference which one is the intercept. Indeed the results would be unchanged if you removed the intercept altogether by 0+Patient+Group.

Basically, you are simply doing two paired comparisons, one for the Control treatment and one for Disease treatment. The design matrix is the two paired-comparison design matrices merged together.

ADD REPLY
0
Entering edit mode

My apologies, I don't think I'm explaining my question very well. I am used to constructing very simple design matrices like ~0 + Group where "Group" is one factor with two or more levels, so all the groups have their own coefficients that can easily be accessed using makeContrasts. But in my case, where "Group" is a combination of two factors, not all groups have their own coefficient (i.e. coefficients for "GroupControl.Blood" and "GroupDiseased.Blood" don't exist). So it isn't clear to me how the last two coefficients ("GroupControl.Bone" and "GroupDiseased.Bone") represent the logFC of Control Bone over Control Blood, and logFC of Diseased Bone over Diseased Blood, respectively. I came across the explanation of the coefficients by reading other posts on Bioconductor, but I don't understand how it works.

I think this is close to the answer I'm seeking:

Basically, you are simply doing two paired comparisons, one for the Control treatment and one for Disease treatment. The design matrix is the two paired-comparison design matrices merged together.

If it's too difficult to explain more than this, could you please suggest some further reading? Thanks again for your help, I really appreciate your time.

ADD REPLY

Login before adding your answer.

Traffic: 853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6