Question

Model and Contrasts with 4 variables

0

Entering edit mode

Cristina • 0

@bcb87936

Last seen 9 months ago

United Kingdom

I have a question regarding a RNA-seq data set I have.

This data consists of 2 disease groups (Disease 1, Disease 2), 2 treatments (Treatment 1, Treatment 2), and 2 times points (Timepoint 1, Timepoint 2). And in total 30 Patients.

I want to make the following comparisons:

Timepoint 2 vs Timepoint 1 for Patients on Treatment 1
Timepoint 2 vs Timepoint 1 for Patients on Treatment 2
Treatment 1 vs Treatment 2 irrespective of Disease at Timepoint 2
Treatment 1 vs Treatment 2 for Patients with Disease 1 at Timepoint 2
Treatment 1 vs Treatment 2 for Patients with Disease 2 at Timepoint 2

What would be the best way to design the model and do the comparisons?

contrast edgeR DESeq2 • 779 views

ADD COMMENT • link updated 10 months ago by James W. MacDonald 65k • written 10 months ago by Cristina • 0

score 0 · Answer 1 · 2023-06-13

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Have you read the respective vignettes for edgeR and DESeq2? Both provide extensive examples. This support site is meant to provide technical advice for very specific questions about how to use Bioconductor software rather than provide basic statistics advice. Using statistical analysis tools presupposes that you already understand the statistics part, and if you don't then it's up to you to either learn or find someone local who does have the required knowledge.

ADD COMMENT • link 10 months ago James W. MacDonald 65k

0

Entering edit mode

Thank you - yes I have read both manuals which are very helpful. However, I couldn't find an example that did exactly what I wanted. I have done the following:

`design <- model.matrix(~ Donor)

Treatment1.Timepoint2 <- Treatment== "Treatment1" & Timepoint=="Timepoint2"
Treatment2.Timepoint2 <- Treatment== "Treatment2" & Timepoint=="Timepoint2"
design <- cbind(design,  Treatment1.Timepoint2,  Treatment2.Timepoint2)

dge <- estimateDisp(dge, design)

fit <- glmQLFit(dge, design)
qlf1 <- glmQLFTest(fit, coef="Treatment1.Timepoint2")
topTags(qlf1)

qlf2 <- glmQLFTest(fit, coef="Treatment2.Timepoint2")
topTags(qlf2)

qlf3 <- glmQLFTest(fit, coef=31:32) #where 31:32 are "Treatment1.Timepoint2":"Treatment2.Timepoint2"
 topTags(qlf3)

qlf4 <- glmQLFTest(fit, contrast=c(rep(0, 30),1,-1)) #where 0:30 are the Donors(n=30)
topTags(qlf4)

ADD REPLY • link 10 months ago Cristina • 0

0

Entering edit mode

The comparisons you describe are not 100% clear, but the way I interpret them, qlf1 and qlf2 answer your first two comparisons. The remainder are all within timepoint2 comparisons, at which point there is no longer any pairing, so you can just make the usual comparisons. Normally the best thing to do is to fit a cell means model and make the correct comparisons using contrasts. Something like

combo <- paste(Treatment, Timepoint, Disease, sep = ".")
design2 <- model.matrix(~ 0 + combo)
contrast <- makeContrasts((Treatment1.Timepoint2.Disease1 + Treatment1.Timepoint2.Disease2)/2 - (Treatment2.Timepoint2.Disease1 + Treatment2.Timepoint2.Disease2)/2,
                           Treatment2.Timepoint2.Disease1 - Treatment1.Timepoint2.Disease1,
                           Treatment2.Timepoint2.Disease2 - Treatment1.Timepoint2.Disease2,
                            levels = design2)

Your qlf3 comparison is testing for any genes that change expression between time 1 and time 2 for either treatment, and your qlf4 is doing the interaction term, where you are asking for genes that react differently between timepoints depending on the treatment.

ADD REPLY • link 10 months ago James W. MacDonald 65k

0

Entering edit mode

Yes, you got the comparisons I am trying to make! Thanks for the explanation. I was just wondering how do you adjust for baseline differences between the patients in your model? Should you not consider Donors for the model?

ADD REPLY • link 10 months ago Cristina • 0

0

Entering edit mode

You could. As I noted, qlf3 is already doing an F-test for any difference between time 1 and 2 for either treatment. In other words, Treatment1.Timepoint2 and Treatment1.Timepoint2 in your model are the difference between time 2 and time 1, within each of the two treatment arms. And qlf3 is doing an F-test that tests for either (or both) of those coefficients being different from zero. You could also do

glmQLFTest(fit, contrast = rep(c(0,0.5), c(30,2)))

Which would test that the average of the difference between time 1 and 2 for the two treatments is different from zero, which would adjust for the baseline donor effect. But that is not the same as what you asked for earlier.

ADD REPLY • link 10 months ago James W. MacDonald 65k