Question

DESEQ2--compare two log2-transformed fold changes

2

Entering edit mode

an.anand233 ▴ 30

@ananand233-9844

Last seen 7.8 years ago

Hi Folks,

I have some RNA-seq datasets from four genotypes (A, B, C, D) and two conditions (time1 and time2). Each genotype in each condition has three replicates. I would like to compare whether log2(A/B) equals log2(C/D) or not. Does someone have suggestions on how to implement this test in DESeq2? And how to set a contrast.

Thanks a lot.

Zhu

deseq2 compare two log2-transformated fold changes • 5.0k views

ADD COMMENT • link 8.2 years ago an.anand233 ▴ 30

score 1 · Answer 1 · 2016-03-03

1

Entering edit mode

an.anand233 ▴ 30

@ananand233-9844

Last seen 7.8 years ago

Hi Michael,

Thanks for a reply. I think it's possible to split the conditions (different time point) into two separate datasets and let one dataset has only one factor (genotype) ; I read the DESeq2 manual, but it seems most examples are dealing with lineal model, like comparing (A+B) to (C+D). But I wonder if it is possible to compare log2(A/B) to log2(C/D) and get adjusted p values (test equality of the ratios at each timepoint)?

Best,

Zhu

ADD COMMENT • link 8.2 years ago an.anand233 ▴ 30

2

Entering edit mode

hi Zhu,

You don't need to split into separate datasets.

So, the easiest way to test, e.g. B/A = D/C is using interaction terms. First you need to code some new variables. Here I create a little toy dataset which doesn't have the 3 replicates per genotype x condition, but you'll see how to do that same to your colData. Create two variables which each divide A,B,C,D into two groups. Maybe you can give some more meaningful names here than geno2 and geno3 (for example, some description of whatever A and C have in common, and the same for whatever A and B have in common).

> d <- expand.grid(geno=letters[1:4], time=c("1","2"))
> d
  geno time
1    a    1
2    b    1
3    c    1
4    d    1
5    a    2
6    b    2
7    c    2
8    d    2
> d$geno2 <- ifelse(d$geno %in% c("a","c"), "1", "2")
> d
  geno time geno2
1    a    1     1
2    b    1     2
3    c    1     1
4    d    1     2
5    a    2     1
6    b    2     2
7    c    2     1
8    d    2     2
> d$geno3 <- ifelse(d$geno %in% c("a","b"), "1", "2")
> d
  geno time geno2 geno3
1    a    1     1     1
2    b    1     2     1
3    c    1     1     2
4    d    1     2     2
5    a    2     1     1
6    b    2     2     1
7    c    2     1     2
8    d    2     2     2

Now if you didn't have two time points, you would just use a design ~geno2 + geno3 + geno2:geno3 and run DESeq() and then test the interaction term using something like results(dds, name="geno22.geno32") ... you'll have to look at resultsNames(dds) to see what the name of the interaction term is for your dataset. The interaction term will have a low p-value if there is a small probability of seeing such an large difference in observed fold changes under the null hypothesis: B/A = D/C.

However, you can simply add time to the design in the following way to create more model terms to account for the two time points, including two interaction terms:

~ time + time:geno2 + time:geno3 + time:geno2:geno3

Now you can test name="time1.geno22.geno32" for B/A = D/C at time 1, and likewise for time 2. See the names in resultsNames(dds).

Note: I'm assuming that all of these terms are factors. You can make sure with: class(dds$time), etc.

ADD REPLY • link 8.2 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

How do you interpret the log2foldchanges in the results table for testing the interaction term? It seem like the larger the difference between fold changes, the larger the log2foldchange is in the results table. I'm not sure about the sign of the log2foldchange.

ADD REPLY • link 6.7 years ago rrcutler ▴ 70

1

Entering edit mode

Take a look at the diagram of interactions in the vignette. The interaction is the difference in the effect size between groups, where it compares to the reference level (e.g. Control group or WT or whatever you specified). So + means "bigger effect than in reference group", and - means "smaller effect", and 0 means the effect is the same size as in the reference group.

Note that by bigger and smaller I refer to the signed value, not the absolute value. So if the main effect is negative, and the interaction term is negative, this means the effect is larger in absolute value (a bigger negative number) relative to the reference group.

ADD REPLY • link 6.7 years ago Michael Love 41k

score 0 · Answer 2 · 2016-03-03

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

Can you say how your desired tests involve the time points? Do you want to test equality of these ratios at each time point?

ADD COMMENT • link 8.2 years ago Michael Love 41k

score 0 · Answer 3 · 2016-03-03

0

Entering edit mode

an.anand233 ▴ 30

@ananand233-9844

Last seen 7.8 years ago

Hi Mike, thank you very much for your detail suggestion. I have tested running this design with my data. Just have a couple of more questions to be classified: 1) the interaction term geno2:geno3 is to test if the effect of geno2 is equal or different in geno3 1 vs geno3 2? if equal, "geno22.geno32" is non-significant and no geno2:geno3 effect in defined model? 2) the results (adj p values) of "geno22.geno32" is same to "geno21.geno31 (fold changes is minus), so not shown in resultsNames()? 3) for replicates (e.g. b1, b2, b3), I can simply use same factor numbers in geno2 (as "2") or geno3 (as "1"), or I need a new variable?

ADD COMMENT • link 8.2 years ago an.anand233 ▴ 30

0

Entering edit mode

1) yes the interaction tests if the ratio is equal. note that testing

B/A = D/C

is identical to testing

B/D = A/C

Non-significance does not imply equality, for example, you could have an underpowered experiment. This is a general rule in frequentist statistics: p-value > alpha does not necessarily mean you have evidence that the null is true.

2) There is a single interaction term for the test of the ratios being equal. You can control which levels are in the denominator of the fold change by setting the reference level of the factors using relevel() (see vignette).

3) My code above for a toy dataset does not have replicates, but these would just be repeated rows of exactly what I have there. You would not define new variables.

ADD REPLY • link 8.2 years ago Michael Love 41k

0

Entering edit mode

Thanks very much for the classification. I am bad in this statistical thoughts. So p-value<alpha we can reject the null hypothesis of the quality, but p-value>alpha we can NOT accept the hypothesis. Does this mean we need another test and set the null hypothesis to be B/A ≠ D/C and see if this to be rejected or not? and how to design this in DESeq2?

ADD REPLY • link 8.2 years ago an.anand233 ▴ 30

0

Entering edit mode

Yes in fact you can use another test for small fold changes, by setting a different null hypothesis.

For background, you should definitely read the DESeq2 paper, the section on hypothesis tests with thresholds and in particular "Specifying maximum effect size".

Then you can use lfcThresold and altHypothesis="lessAbs". (See the vignette section on threshold tests or ?results for details on how to do this using DESeq2.)

ADD REPLY • link 8.2 years ago Michael Love 41k

0

Entering edit mode

Great, I will try disabling the LFC shrinkage, using the same design but altHypothesis. I think lfcThresold is same with the default (0) or weak (e.g. 0.05) for testing the equality. Many thanks.

ADD REPLY • link 8.2 years ago an.anand233 ▴ 30