Comparing between two differential expression analysis - two different fold changes
1
0
Entering edit mode
Marina ▴ 40
@marina-15019
Last seen 3.2 years ago

Hi,

I’m trying to do some differential expression analysis which I believe that it should be trivial but I couldn’t find any reference for what I’m looking for. I’m pretty sure that the solution will be much shorter than the description below for what I’m looking for (sorry for that…):

I would like to find differentially expressed genes that indicate that a treatment A has a higher impact comparing a treatment B, where each of the treatments is compared to DIFFERENT reference levels as follows:

• I have overall of 24 samples.
• 6 first samples are of 6 different patients (group A) before a treatment ”A” was given. I consider the 6 patients as biological replicants, as I’m not interested to differentiate between patients.
• Next 6 samples are of the same 6 patients as above, but after the treatment “A” was given. Once again, I consider all these 6 samples as biological replicants.
• Next 6 samples are of another group (group B) of 6 patients, before the treatment “B” was given. All of these 6 samples are considered as biological replicants.
• Last 6 samples are of the 6 patients of group B, after the treatment “B” was given.
•

I would like to find a differential expression for the following: [fold change of  ” after treatment A” vs ”before treatment A”], compared to [fold change of ”after treatment B” vs ” before treatment B”].

I managed easily to apply DESeq2 for a differential expression of  ”before treatment X” vs ”after treatment X”, but now I have 2 result matrices that include LOG2FC, pvalue and adjusted pvalue, and I’m not sure if and how I should proceed and compare between them. I prefer that DESeq2 will do the entire required analysis from the beginning, if possible.

I was looking in the DESeq2’s vignette and I tried the following code which works well, but I have no idea if it does what I mean:

d <- expand.grid(condition=c(rep("beforeT",6), rep("afterT",6)), treatment=c("A", "B"))

coldataBOTH <- data.frame(row.names=colnames(BOTH), d)

ddsBOTH <- DESeqDataSetFromMatrix(countData=BOTH, colData=coldataBOTH, design=~condition+treatment)

#(the matrix BOTH is the raw count matrix that includes 24 libraries)

mcols(ddsBOTH) <- geneID

ddsBOTH <- DESeq(ddsBOTH)

resB <- results(ddsBOTH)

I suspect that what I really need is described under either the section “Multi-factor designs”or probably under ”Interactions” in the following great, just recently updated, vignette:

https://bioconductor.org/packages/3.7/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions

However, I’m not sure that I could find there a good correspondence with what I’m trying to do.

I would really appreciate if someone can show me an explicit example, tailored for my case (I guess that it should be just an addition of 2-3 lines to my code)…

If DESEQ2 is not the right tool I would be happy to get any recommendation for a proper one. Or any other statistical suggestions.

p.s – just for the case that the someone in the future will encounter my question and it’s not what he will be looking for, I found some hints for what I am trying to do in previous topics in this forum:

DESEQ2--compare two log2-transformed fold changes

DESeq2: Paired Test

But once again I failed to relate and implement the help that was given in those topics into my case.

0
Entering edit mode

A difference between fold changes can be found with an interaction. Have you seen the diagram in the vignette? Before I post code, I’m curious why you dont want to account for patient in the comparison.

0
Entering edit mode

Hi Michael,

Thanks so much for your reply. Do you mean the diagrams in the Interaction section in the vignette? – I had a look into it and although they were clear enough, I’m not sure how to deduce from these figures to what I need.

One of my problems is that I’m new for the entire RNAseq and differential expression business. My supervisor and I first thought that since we would like to know whether the treatment is successful or not, generally speaking, we should not take into account differentiations in the patients. We are not looking for personal medicine or personal treatment so we hoped that collecting data from 6 different patients for each treatment would allow us considering them as biological replicates. However, your question hit me and I now believe that since we deal with different patients we should also consider differentiating between treatments while considering the different patients. Either way, I would like then to implement both options:

1. differential expression between treatments (“treatment-A referenced to before-treatment-A” vs “treatment-B referenced to before-treatment-B”), patients of each treatments are considered as replicates.
2. differential expression between treatments (“treatment-A referenced to before-treatment-A” vs “treatment-B referenced to before-treatment-B”), while considering different patients - meaning using coupled analysis.

Another problem of mine is that I’m even not a rookie in R. I have only (awfully) basic knowledge in programming, so I study mainly from examples. I’m not sure that I fully understand the format of the design equation, and especially not meaning of “:” when specifying multiple factors. I tried the following code in order to differentiate between treatments referenced to their base line, considering the different patients:

> d2 <- expand.grid(condition=c(rep("beforeT",6), rep("afterT",6)), treatment=c("A", "B"), patient=c("P"))

>d2$patient <- c(rep(paste0("P", 1:6),2), rep(paste0("P", 7:12),2)) >coldataBOTH <- data.frame(row.names=colnames(BOTH), d2) >ddsBOTH <- DESeqDataSetFromMatrix(countData=BOTH, colData=coldataBOTH, >design=~treatment+treatment:condition+treatment:condition:patient) #(the matrix BOTH is the raw count matrix that includes 24 libraries) But I get “Model matrix not full rank” error. I also tried instead to combine all factors into a single factor: >ddsBOTH$group <- factor(paste0(c(rep("beforeT",6), rep("afterT",6)), c(rep("A", 12), rep("B",12)), c(rep(paste0("P", 1:6),2), rep(paste0("P", 7:12),2))))

>design(ddsBOTH) <- ~group

but I’m not sure how to extract now my interesting compares…

Any help will be highly appreciated!

0
Entering edit mode

Simon, Michael, Many thanks to both of you!

0
Entering edit mode
Simon Anders ★ 3.7k
@simon-anders-3855
Last seen 3.8 years ago
Zentrum für Molekularbiologie, Universi…

Hi Marina

Mike must have overlooked that you have two different sets of patients, and each patient only gets either treatment "A" or treatment "B". An interaction would be needed if you had the both treatment applied to the same patient, but in your case, it is actually straightforward.

I use the following slightly obscure R code to produce an example sample table:

sampleTable <- data.frame(
sample = sprintf( "Sample%02d", 1:24 ),
patient = rep( sprintf( "Pat%02d", 1:12 ), each=2 ),
treatment = c( rep( c( "before", "afterA" ), 6 ), rep( c( "before", "afterB" ), 6 ) ) )


It produces the following data frame, with one row per sample:

     sample patient treatm
1  Sample01   Pat01 before
2  Sample02   Pat01 afterA
3  Sample03   Pat02 before
4  Sample04   Pat02 afterA
5  Sample05   Pat03 before
6  Sample06   Pat03 afterA
7  Sample07   Pat04 before
8  Sample08   Pat04 afterA
9  Sample09   Pat05 before
10 Sample10   Pat05 afterA
11 Sample11   Pat06 before
12 Sample12   Pat06 afterA
13 Sample13   Pat07 before
14 Sample14   Pat07 afterB
15 Sample15   Pat08 before
16 Sample16   Pat08 afterB
17 Sample17   Pat09 before
18 Sample18   Pat09 afterB
19 Sample19   Pat10 before
20 Sample20   Pat10 afterB
21 Sample21   Pat11 before
22 Sample22   Pat11 afterB
23 Sample23   Pat12 before
24 Sample24   Pat12 afterB


I assume that your sample table looks like this, too. Make your own sample table, with one row per sample, and two columns, labelled patient with the ID of the patient and treatment, with values before, afterA, and afterB.

It is helpful (though not really necessary) to use

mf$treatment <- relevel( mf$treatment, "before" )


to make the before level the default base of comparisons. (Note that we have only one "before" treatment level. After all, before treatment, there is no difference yet between A and B.)

Now, I get me some (random, non-sensical) example count data for 24 samples with

library( DESeq2 )
exampleCounts <- counts(makeExampleDESeqDataSet( m=24 ))


and run DESeq

dds <- DESeqDataSetFromMatrix( exampleCounts, mf, ~ patient + treatment )
dds <- DESeq(dds)


You can now use three result commands to get results for three comparisons of interest:

results( dds, c( "treatment", "afterA", "before" ) )
results( dds, c( "treatment", "afterB", "before" ) )
results( dds, c( "treatment", "afterB", "afterA" ) )


The first two show the genes affected by the two treatments, respectively. The last one compares the two treatments against each other and looks for significant differences. As "before" is the same for both, we do not need to subtract it or even mention it.

Note that our design formula above (~ patient + treatment) mentions patient to account for this so-called "blocking factor". This causes the exact opposite of looking at the differences between patients: namely, it causes the baseline expression values (the "before" expression) to be subtracted so that the treatment effects are calculated as the averages of the differences between after and before treatment, with the difference taken for each patient first, and averaged over paatients afterwards.

If you want to understand this last point better, remind yourself first from an elementary statistics textbook about the difference between an unpaired and a paired t test, and the read up on how this generalizes to blocking factors in ANOVA. What you have is often called a "split-plot design" in textbooks.

0
Entering edit mode

I was working towards, one can also build that kind of design with interactions, something like ~group + group:nested.individual + group:treatment, but I'm not even sure which takes more effort actually :-)

0
Entering edit mode

Right. That's why I made an example sample table for myself to check, because I wasn't sure either. Turned out it's easier without interaction.

0
Entering edit mode

Simon, Michael,

Many thanks to both of you. I appreciate the detailed explanation – it helped a lot. It looks like I’m done with these comparisons thanks to you.

It was a brilliant idea to treat all “before” patients as the same reference. I guess that I should had done this in the first place. I had no idea why I had a wrong conception that I had to treat all “before” patients as beforeA and beforeB… I guess that it is easier to think about a “fully” paired test.

Thanks also for referring me to the proper statistical model. I’m going to read about split-plot design as I feel that I need to expand my knowledge about statistics in general and specifically about statistic models that I need to use.

Although it has no relevance right now, but I guess that eventually I might also need to do what I asked in the beginning: how to compare between 2 treatments, where each has its own reference. For example, let’s say that I would like to compare 2 treatments: treatment A that is given to women only, and treatment B that is given to men only, and I would like to compare between treatment A vs. treatment B. The result of this question should be which of the 2 treatments is better. I understand that this result does not mean that generally treatment A is better than treatment B, for example, but it says only that treatment A is better for women, comparing treatment B for men.

How can this be done?

Thanks again!

0
Entering edit mode

Regarding your last question: First not that asking for difference in treatment success is somewhat different to asking for difference in gene expression, so let's stick to the latter. Now, if you see a gene that goes up much more strongly in men upon treatment A than in women upon treatment B, you cannot say whether this means (a) that the gene generally reacts more strongly to any treatment if the subject is a man rather than a woman, or (b) the gene reacts more strongly to treatment A than to treatment B, no matter the sex of the subject or (c) both.

Such an experiment design cannot distinguish (a) from (b), no matter what, and hence there is not point in the question how to analyze this. You hinted that you might not need to be able to distinguish (a) from (b), but I find it hard to imagine a situation where this would be the case.