Question

How to set the contrast in case of a design of 2 level factor and interaction term.

0

Entering edit mode

solgakar@bi.technion.ac.il ▴ 90

@solgakarbitechnionacil-6453

Last seen 7.0 years ago

European Union

Hi,

I have a question on how to define contrast when the design includes 2 level factors and an interaction term. design =~genome+condition+condition:genome.

The resultsNames(dds):

"Intercept" "genome_yb1_vs_v252" "condition_mice_vs_log" "genomeyb1.conditionmice"

I need the following comparison:

1. mice vs log in all the samples

2. mice vs log only in v252 samples

3. mice vs log only in yb1 samples

The way I defined the contrast for each comparison:

1. contrast = c("condition","mice","log")

2. contrast = list("condition_mice_vs_log")

3. contrast = list(c("condition_mice_vs_log","genomeyb1.conditionmice"))

I get the same results for the first 2 comparisons. To which of the comparisons is the contrast correct and how to define the contrast to the other comparison.

Thank you,

Karen

deseq2 deseq rna-seq • 2.0k views

ADD COMMENT • link 9.4 years ago • updated 9.3 years ago solgakar@bi.technion.ac.il ▴ 90

score 0 · Answer 1 · 2014-12-10

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

Hi, See here for a similar post deseq2: coding 2x2 design Your desired contrast #1 (mice vs log in all the samples) might be described as the average effect across the two groups, and you can use a numeric contrast as described in the post above.

ADD COMMENT • link 9.4 years ago Michael Love 41k

score 0 · Answer 2 · 2014-12-22

0

Entering edit mode

solgakar@bi.technion.ac.il ▴ 90

@solgakarbitechnionacil-6453

Last seen 7.0 years ago

European Union

Hi Michael,

Thanks for the help, and fast reply.

I ran all comparisons and have another questions:

When comparing mice vs log only in yb1 samples I get genes with no expression in any of these samples but I do get statistics of FC and p-adj values. I set the contrast = list(c("condition_mice_vs_log","genomeyb1.conditionmice")).

Other samples in the dataset that are not included in this comparison are expressed. can you explain to me why it has such effect on the results. It's confusing to get such results.

For example gene VV2026 that all 4 samples in this differential expression test had no expression (normalized counts 0), the baseMean = 84.13789 and the statistics were : log2FoldChange -0.47838 ; pvalue 0.619603; padj 0.714385.

Thanks a lot,

Karen

ADD COMMENT • link 9.3 years ago solgakar@bi.technion.ac.il ▴ 90

0

Entering edit mode

hi Karen,

The non zero LFC here is because models with an interaction term include shrinkage on the interaction term but not on main effects. The inference is borrowing strength from the other group and from the other genes. The interaction effects were found to be small over all genes, and the condition effect was found to be large for this gene in the other group, so the model is essentially predicting that if the counts for yb1 rise above zero, a negative LFC would be likely. But to avoid such situations, you can either run a model with a single factor "~ group" where group encodes, for example "mice_yb1", etc.; or you can set betaPrior=FALSE to turn off the shrinkage of interaction terms. Then the LFCs as in your contrast will be closer to zero.

ADD REPLY • link 9.3 years ago Michael Love 41k