Question

Help with model for differential expression analysis

0

Entering edit mode

Lucy ▴ 60

@lucy-17014

Last seen 10 months ago

United Kingdom

Hi,

I am struggling to design a model for my differential expression analysis and I was hoping for some advice.

I have two different vaccines and within each vaccine group, I have four timepoints from a single donor (with some missing values). I also have a batch variable that I need to control for, as the libraries were generated and sequenced in three batches. As the timepoints are pre-prime (T1) and post-prime (T2) and pre-boost (T3) and post-boost (T4), I am interested in the following comparisons:

T2 vs. T1 for vaccine A - response to prime dose of vaccine A
T4 vs. T3 for vaccine A - response to boost dose of vaccine A
T2 vs. T1 for vaccine B - response to prime dose of vaccine B
T4 vs. T3 for vaccine B - response to boost dose of vaccine B
(T2 vs. T1 for vaccine A) compared with (T2 vs. T1 for vaccine B) - how does prime response differ for the two vaccines?
(T4 vs. T3 for vaccine A) compared with (T4 vs. T3 for vaccine B) - how does boost response differ for the two vaccines?
(T4 vs. T3 for vaccine A) compared with (T2 vs. T1 for vaccine A) - how does boost response compare to prime response for vaccine A?
(T4 vs. T3 for vaccine B) compared with (T2 vs. T1 for vaccine B) - how does boost response compare to prime response for vaccine B?

My samples table looks something like this (but I have ~ 10 donors per vaccine group and not all donors have every timepoint):

vaccine	timepoint	donor	batch
A	T1	D1	B1
A	T2	D1	B1
A	T3	D1	B2
A	T4	D1	B1
A	T1	D2	B2
A	T2	D2	B1
A	T3	D2	B1
A	T4	D2	B1
A	T1	D3	B2
A	T2	D3	B1
A	T3	D3	B3
A	T4	D3	B1
B	T1	D4	B2
B	T2	D4	B1
B	T3	D4	B3
B	T4	D4	B3
B	T1	D5	B1
B	T2	D5	B1
B	T3	D5	B2
B	T4	D5	B3
B	T1	D6	B1
B	T2	D6	B2
B	T3	D6	B2
B	T4	D6	B3

One option would be to split the samples into prime (T1, T2) and boost (T3, T4) timepoints and set up the model in a similar way to the "Comparisons both between and within subjects" example in the edgeR manual. However, this would then not allow me to compare the difference between prime and boost within a vaccine.

Any thoughts would be most appreciated.

Best wishes,

Lucy

DifferentialExpression edgeR • 1.3k views

ADD COMMENT • link 18 months ago • updated 17 months ago Lucy ▴ 60

score 0 · Answer 1 · 2023-12-30

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 9 hours ago

WEHI, Melbourne, Australia

The approach outined in the edgeR manual for between and within subjects does not require you to split the samples into prime and boost. The approach works for any number of factor levels. The approach outlined in that section seems directly applicable to your data (if you ignore batch) and would seem to answer all your questions. It makes all your intended comparisons quite straightforward.

As an alternative, the method outlined in the limma User's Guide under "multi-level" designs is also applicable.

ADD COMMENT • link 18 months ago Gordon Smyth 52k

0

Entering edit mode

Thank you. I have a design matrix with the following columns: donor1, donor2, donor3, ..., batch2, batch3, vaccineA.T2, vaccineB.T2, vaccineA.T3, vaccineB.T3, vaccineA.T4, vaccineB.T4

The main comparisons I am unsure how to make are:

(T4 vs. T3 for vaccine A) compared with (T4 vs. T3 for vaccine B) - how does boost response differ for the two vaccines?
(T4 vs. T3 for vaccine A) compared with (T2 vs. T1 for vaccine A) - how does boost response compare to prime response for vaccine A?
(T4 vs. T3 for vaccine B) compared with (T2 vs. T1 for vaccine B) - how does boost response compare to prime response for vaccine B?

Does the following make sense:

my.contrasts <- makeContrasts(
    A_B_boost = (vaccineA.T4 - vaccineA.T3) - (vaccineB.T4 - vaccineB.T3), 
    A_boost_prime = (vaccineA.T4 - vaccineA.T3) - vaccineA.T2, 
    B_boost_prime = (vaccineB.T4 - vaccineB.T3) - vaccineB.T2, 
    levels = design)

Then e.g. for boost vs. prime, vaccine A:

qlf <- glmQLFTest(fit, contrast = my.contrasts[, "A_boost_prime"])
topTags(qlf, n = nrow(counts))

ADD REPLY • link 18 months ago Lucy ▴ 60

0

Entering edit mode

The design matrix columns and contrasts look correct in principle. Why are you unsure?

Keep in mind that I have not seen either your complete data or the commands used to create the design matrix, so I can't comment on whether the batch correction is correct.

ADD REPLY • link 18 months ago Gordon Smyth 52k

0

Entering edit mode

Thank you. Regarding batch correction, I detected a batch effect by PCA that could be successfully removed with ComBat, so I'm hoping that edgeR will also be able to deal with the batch effect. However, since the T1 timepoint for each donor/vaccine combination can only be in one batch, I don't know whether this could cause issues?

The design matrix was created as below following the example at EdgeR - Model matrix for complex model similar to user guide 3.5 example - but more complex.

df <- data.frame(
    vaccine = factor(y$samples$vaccine), 
    donor = factor(y$samples$donor), 
    batch = factor(y$samples$batch), 
    timepoint = factor(y$samples$timepoint))

design <- model.matrix(~ 0 + donor + batch + vaccine:timepoint, df)
design <- design[, !grepl("T1$", colnames(design))]    # removing vaccineA.T1 and vaccineB.T1 columns from design matrix
colnames(design) <- gsub(":", ".", colnames(design))

How would the coefficients be interpreted in this scenario:

donor coefficients e.g. donor1, donor2 - log expression at T1
batch coefficients (batch2, batch3) - log fold change in expression relative to batch1 at T1 - would this be on average for both vaccine A and B, because assuming no interaction effect between batch and vaccine?
vaccine/timepoint coefficients: vaccineA.T2 would be log fold change in expression between vaccineA.T2 and vaccineA.T1; vaccineA.T3 would be log fold change in expression between vaccineA.T3 and vaccineA.T1?

For me, it would be more intuitive to write the boost vs. prime contrasts as below, but this doesn't fit the format of the design matrix. I think this is where my confusion comes in.

my.contrasts <- makeContrasts(
    A_boost_prime = (vaccineA.T4 - vaccineA.T3) - (vaccineA.T2 - vaccineA.T1), 
    B_boost_prime = (vaccineB.T4 - vaccineB.T3) - (vaccineB.T2 - vaccineB.T1), 
    levels = design)

I am keen to update my understanding of GLMs, so I would be happy to be pointed to some useful books or resources.

ADD REPLY • link 18 months ago Lucy ▴ 60

0

Entering edit mode

Gordon Smyth, it would be great to understand if my interpretation of the coefficients is correct or what I am missing?

ADD REPLY • link 17 months ago Lucy ▴ 60

vaccine	timepoint	donor	batch
A	T1	D1	B1
A	T2	D1	B1
A	T3	D1	B2
A	T4	D1	B1
A	T1	D2	B2
A	T2	D2	B1
A	T3	D2	B1
A	T4	D2	B1
A	T1	D3	B2
A	T2	D3	B1
A	T3	D3	B3
A	T4	D3	B1
B	T1	D4	B2
B	T2	D4	B1
B	T3	D4	B3
B	T4	D4	B3
B	T1	D5	B1
B	T2	D5	B1
B	T3	D5	B2
B	T4	D5	B3
B	T1	D6	B1
B	T2	D6	B2
B	T3	D6	B2
B	T4	D6	B3

vaccine	timepoint	donor	batch
A	T1	D1	B1
A	T2	D1	B1
A	T3	D1	B2
A	T4	D1	B1
A	T1	D2	B2
A	T2	D2	B1
A	T3	D2	B1
A	T4	D2	B1
A	T1	D3	B2
A	T2	D3	B1
A	T3	D3	B3
A	T4	D3	B1
B	T1	D4	B2
B	T2	D4	B1
B	T3	D4	B3
B	T4	D4	B3
B	T1	D5	B1
B	T2	D5	B1
B	T3	D5	B2
B	T4	D5	B3
B	T1	D6	B1
B	T2	D6	B2
B	T3	D6	B2
B	T4	D6	B3

vaccine	timepoint	donor	batch
A	T1	D1	B1
A	T2	D1	B1
A	T3	D1	B2
A	T4	D1	B1
A	T1	D2	B2
A	T2	D2	B1
A	T3	D2	B1
A	T4	D2	B1
A	T1	D3	B2
A	T2	D3	B1
A	T3	D3	B3
A	T4	D3	B1
B	T1	D4	B2
B	T2	D4	B1
B	T3	D4	B3
B	T4	D4	B3
B	T1	D5	B1
B	T2	D5	B1
B	T3	D5	B2
B	T4	D5	B3
B	T1	D6	B1
B	T2	D6	B2
B	T3	D6	B2
B	T4	D6	B3