Search
Question: DESEQ2--compare two log2-transformed fold changes
0
gravatar for an.anand233
20 months ago by
an.anand23310
an.anand23310 wrote:

Hi Folks,

I have some RNA-seq datasets from four genotypes (A, B, C, D) and two conditions (time1 and time2). Each genotype in each condition has three replicates. I  would like to compare whether log2(A/B) equals log2(C/D) or not. Does someone have suggestions on how to implement this test in DESeq2? And how to set a contrast.

Thanks a lot.

Zhu

ADD COMMENTlink modified 20 months ago • written 20 months ago by an.anand23310
1
gravatar for an.anand233
20 months ago by
an.anand23310
an.anand23310 wrote:

Hi Michael,

Thanks for a reply.  I think it's possible to split the conditions (different time point) into two separate datasets and let one dataset has only one factor (genotype) ; I read the DESeq2 manual, but it seems most examples are dealing with lineal model, like comparing (A+B) to (C+D). But I wonder if it is possible to compare log2(A/B) to log2(C/D) and get adjusted p values (test equality of the ratios at each timepoint)?

Best,

Zhu

ADD COMMENTlink modified 20 months ago • written 20 months ago by an.anand23310
1

hi Zhu,

You don't need to split into separate datasets.

So, the easiest way to test, e.g. B/A = D/C is using interaction terms. First you need to code some new variables. Here I create a little toy dataset which doesn't have the 3 replicates per genotype x condition, but you'll see how to do that same to your colData. Create two variables which each divide A,B,C,D into two groups. Maybe you can give some more meaningful names here than geno2 and geno3 (for example, some description of whatever A and C have in common, and the same for whatever A and B have in common).

> d <- expand.grid(geno=letters[1:4], time=c("1","2"))
> d
  geno time
1    a    1
2    b    1
3    c    1
4    d    1
5    a    2
6    b    2
7    c    2
8    d    2
> d$geno2 <- ifelse(d$geno %in% c("a","c"), "1", "2")
> d
  geno time geno2
1    a    1     1
2    b    1     2
3    c    1     1
4    d    1     2
5    a    2     1
6    b    2     2
7    c    2     1
8    d    2     2
> d$geno3 <- ifelse(d$geno %in% c("a","b"), "1", "2")
> d
  geno time geno2 geno3
1    a    1     1     1
2    b    1     2     1
3    c    1     1     2
4    d    1     2     2
5    a    2     1     1
6    b    2     2     1
7    c    2     1     2
8    d    2     2     2

Now if you didn't have two time points, you would just use a design ~geno2 + geno3 + geno2:geno3 and run DESeq() and then test the interaction term using something like results(dds, name="geno22.geno32") ... you'll have to look at resultsNames(dds) to see what the name of the interaction term is for your dataset. The interaction term will have a low p-value if there is a small probability of seeing such an large difference in observed fold changes under the null hypothesis: B/A = D/C.

However, you can simply add time to the design in the following way to create more model terms to account for the two time points, including two interaction terms:

~ time + time:geno2 + time:geno3 + time:geno2:geno3

Now you can test name="time1.geno22.geno32" for B/A = D/C at time 1, and likewise for time 2. See the names in resultsNames(dds).

Note: I'm assuming that all of these terms are factors. You can make sure with: class(dds$time), etc.

ADD REPLYlink written 20 months ago by Michael Love14k

Hi Michael,

How do you interpret the log2foldchanges in the results table for testing the interaction term? It seem like the larger the difference between fold changes, the larger the log2foldchange is in the results table. I'm not sure about the sign of the log2foldchange.

ADD REPLYlink written 3 months ago by rrcutler30
1

Take a look at the diagram of interactions in the vignette. The interaction is the difference in the effect size between groups, where it compares to the reference level (e.g. Control group or WT or whatever you specified). So + means "bigger effect than in reference group", and - means "smaller effect", and 0 means the effect is the same size as in the reference group.

Note that by bigger and smaller I refer to the signed value, not the absolute value. So if the main effect is negative, and the interaction term is negative, this means the effect is larger in absolute value (a bigger negative number) relative to the reference group.

ADD REPLYlink modified 3 months ago • written 3 months ago by Michael Love14k
0
gravatar for Michael Love
20 months ago by
Michael Love14k
United States
Michael Love14k wrote:

Can you say how your desired tests involve the time points? Do you want to test equality of these ratios at each time point?

ADD COMMENTlink written 20 months ago by Michael Love14k
0
gravatar for an.anand233
20 months ago by
an.anand23310
an.anand23310 wrote:

Hi Mike, thank you very much for your detail suggestion. I have tested running this design with my data. Just have a couple of more questions to be classified: 1) the interaction term  geno2:geno3 is to test if the effect of geno2 is equal or different in  geno3 1 vs geno3 2?  if equal, "geno22.geno32" is non-significant and no geno2:geno3 effect in defined model?  2) the results (adj p values) of "geno22.geno32" is same to "geno21.geno31 (fold changes is minus), so  not shown in resultsNames()?  3) for replicates (e.g. b1, b2, b3), I can simply use same factor numbers in geno2 (as "2") or geno3 (as "1"), or I need a new variable?

ADD COMMENTlink written 20 months ago by an.anand23310

1) yes the interaction tests if the ratio is equal. note that testing

B/A = D/C 

is identical to testing

B/D = A/C

Non-significance does not imply equality, for example, you could have an underpowered experiment. This is a general rule in frequentist statistics: p-value > alpha does not necessarily mean you have evidence that the null is true.

2) There is a single interaction term for the test of the ratios being equal. You can control which levels are in the denominator of the fold change by setting the reference level of the factors using relevel() (see vignette).

3) My code above for a toy dataset does not have replicates, but these would just be repeated rows of exactly what I have there. You would not define new variables.

ADD REPLYlink written 20 months ago by Michael Love14k

Thanks very much for the classification. I am bad in this statistical thoughts. So p-value<alpha we can reject the null hypothesis of the quality, but p-value>alpha we can NOT accept the hypothesis. Does this mean we need another test and set the null hypothesis to be B/A ≠ D/C and see if this to be rejected or not? and how to design this in DESeq2?

ADD REPLYlink written 20 months ago by an.anand23310

Yes in fact you can use another test for small fold changes, by setting a different null hypothesis.

For background, you should definitely read the DESeq2 paper, the section on hypothesis tests with thresholds and in particular "Specifying maximum effect size".

Then you can use lfcThresold and altHypothesis="lessAbs". (See the vignette section on threshold tests or ?results for details on how to do this using DESeq2.)

ADD REPLYlink written 20 months ago by Michael Love14k

Great, I will try disabling the LFC shrinkage, using the same design but altHypothesis. I think lfcThresold is same with the default (0) or weak (e.g. 0.05) for testing the equality. Many thanks.

ADD REPLYlink modified 20 months ago • written 20 months ago by an.anand23310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 171 users visited in the last hour