Search
Question: DESeq2: Compare expression of one group to the mean of others
1
3.4 years ago by
johntlovell10
United States
johntlovell10 wrote:

Hi Folks, this is a conceptual question:

I have a RNA-seq dataset where I am trying to make comparisons among 3 levels of a factor - two parental genotypes and an F1. In addition to asking whether there is pairwise DE between genotypes (simple using contrast arguments in results), I would like to ask whether the F1 is intermediate between the two parents. This test is usually accomplished by comparing the mid-parent value (the estimate of the parental means) to the distribution of expression values of the F1. Is there a way to implement this test in DESeq2 - perhaps by setting a contrast between a value (mid parent mean) and the F1 count distribution?

Thanks,

John Lovell

Postdoctoral Fellow, UT Austin.

modified 3.4 years ago by Michael Love20k • written 3.4 years ago by johntlovell10

Could you show us your sample table? Especially, do you have several related trios of father, mother, offspring as individuals (as in mice) or rather of whole collections (as in flies)?

FYI... the way I am currently running the analysis follows:

1. extract varianceStabilizedTransformation data
2. calculate arithmetic mean of parents (mid parent vale)
3. conduct a one-sample t-test with mu = midParentValue, data= F1VstData

I am not sure that this is appropriate, but I couldn't think of how to implement it within DESeq. I guess another option would be to extract the estimateSizeFactors normalized data, then run glm.nb.

4
3.4 years ago by
Michael Love20k
United States
Michael Love20k wrote:

Hi John,

I think that you can perform this test with a numeric contrast. If you have a design ~ genotype, and the reference level of genotype is the F1 generation (you can set this using relevel(), see the vignette), then the resultsNames(dds) should be something like:

Intercept, genotypehalfil, genotypehal, genotypefil

You want a numeric contrast of halfil against 1/2 (hal + fil):

results(dds, contrast=c(0, 1, -1/2, -1/2))

Or equivalently, using the list style of contrast:

results(dds, contrast = list("genotypehalfil", c("genotypehal","genotypefil")),
listValues=c(1, -1/2))

This gives you the log fold change of halfil over the midparent. So small p-values for genes in which halfil is above or below the midparent value.

Dear Michael,

In the latest version of DESeq2, resultsNames(dds) do not show something like:

Intercept, genotypehalfil, genotypehal, genotypefil

but like this:

>resultsNames(dds)
[1] "Intercept"        "condition_b_vs_a" "condition_c_vs_a"

if I want to do the above comparation, I should use DESeq(dds, betaPrior = TRUE), then I find something confusing.

>resultsNames(dds)
[1] "Intercept"  "conditiona" "conditionb" "conditionc"(conditiona is the hyb condition, and the other two are P1 and P2, I've set the conditionia  to be reference)
>mid<-results(dds, contrast=c(0, 1, -1/2, -1/2))
>mid[rownames(mid)=="geneA",]
log2 fold change (MAP): 0,+1,-0.5,-0.5
Wald test p-value: 0,+1,-0.5,-0.5
DataFrame with 1 row and 6 columns
baseMean   log2FoldChange             lfcSE
<numeric>        <numeric>         <numeric>
geneA 72.3609376218373 1.15447089513261 0.228187646879045
<numeric>            <numeric>            <numeric>
geneA 5.05930496642775 4.20787419598304e-07 6.10070739443769e-05
>mid_norm<-counts(dds, normalized=TRUE)
>mid_norm[rownames(mid_norm)=="geneA",]
sample_hyb_1     sample_hyb_2     sample_hyb_3     sample_P1_1     sample_P1_2     sample_P1_3     sample_P2_1
75.965725  62.666961  69.122855   3.389493   3.060267   4.232354 143.554684
sample_P2_2     sample_P2_3
148.588003 140.668097

So for geneA, the mean of parents(P1,P2) is similar to hyb. (3.38+3.06+4.23+143.55+148.58+140.66)/6=73.91,(75.96+62.66+69.12)/3=69.25, they are similar.

How could the log2FoldChange be as large as 1.154? that means a foldchange of 2^1.154=2.23. Could you please give some explanation?

or are there any other better methods for the latest version if I want to compare genotypehalfil to the mean of genotypehal and genotypefil? Looking forward to your reply. Thank you very much！

Best,

Aifu.

1

When you average the coefficients, it's not equivalent to averaging the data and then computing a LFC.

It's closer to this:

log2(70/70) - .5*log2(3/70) - .5*log2(140/70)

which gives an LFC above 1.

Thank you very much for your explanation, Michael. But I still have some confusion. Your explanation seems reasonable for geneA above. I find another gene, geneB, seems different.

> bb<-counts(dds,normalized=T)
> b<-bb[rownames(bb)=="geneB",]
> b
R_SY_1    R_SY_2    R_SY_3    R_MH_1    R_MH_2    R_MH_3    R_ZS_1    R_ZS_2
58.13703  70.72414  46.96809  80.50047  75.48658 110.04121  28.71094  26.10330
R_ZS_3
22.81104
> n<-mean(b[1:3])
> m<-mean(b[4:6])
> z<-mean(b[7:9])
> log2(n) - .5*log2(m) - .5*log2(z)
[1] 0.2910863
> res[rownames(res)=="geneB",]
log2 fold change (MAP): 0,+1,-0.5,-0.5
Wald test p-value: 0,+1,-0.5,-0.5
DataFrame with 1 row and 6 columns
baseMean   log2FoldChange             lfcSE
<numeric>        <numeric>         <numeric>
geneB 57.7203113936858 1.06110078378387 0.240486475709396
<numeric>            <numeric>            <numeric>
geneB 4.41230959310205 1.02273726663086e-05 0.000258779640785735

The log2FoldChange is 1.06110078378387, quite different from 0.29. Could you please explain this to me? Thank you for your attention.

1

There is shrinkage applied. The output is simply the linear combination of the coefficients. You can look at the coefficients yourself with coef(dds). You will find that multiplying 1, -.5, -.5 times the  coefficients gives the LFC.

I unfortunately don’t have much spare time at the moment so I may not reply to further gene examples.

Yes, that's true! Thank you very much!

Aifu.

0
3.4 years ago by
johntlovell10
United States
johntlovell10 wrote:

Hi Simon, Thanks for your response.

I have many biological replicates (clones) derived from a single individual of each parent and the F1 for each genotype. So it is just one trio. A library was prepared for each clone, so there was no pooling of samples.

Here is the colData from the DESeq object. I am currently fitting a model where only genotype (halfil=f1, hal=parent1, fil=parent2) is the response variable, but I am open to fitting other models that could more accurately pick up the mid parent value.

Thanks, John

jgi.id generation genotype
1      F1_G948_376         F1 halfil
2      F1_G969_327         F1 halfil
3      F1_H005_732         F1 halfil
4    F1_H016.1_219         F1 halfil
5      F1_H021_969         F1 halfil
6      F1_H040_457         F1 halfil
7    F1_H042.1_688         F1 halfil
8      F1_H064_616         F1 halfil
9      F1_H107_918         F1 halfil
10     F1_H128_677         F1 halfil
11     F1_H131_966         F1 halfil
12     F1_H140_842         F1 halfil
13     F1_H148_386         F1 halfil
14     F1_H151_243         F1 halfil
15     F1_H185_520         F1 halfil
16     F1_H220_365         F1 halfil
17     F1_H229_417         F1 halfil
18     F1_H231_250         F1 halfil
19       F1_H239_8         F1 halfil
20      F1_H249_22         F1 halfil
21     F1_H253_725         F1 halfil
22     F1_H270_964         F1 halfil
23     F1_H273_254         F1 halfil
24     F1_H317_925         F1 halfil
25     F1_H343_981         F1 halfil
26   F1_H347.1_410         F1 halfil
27     F1_H350_879         F1 halfil
28     F1_H352_759         F1 halfil
29      F1_H355_38         F1 halfil
30   FIL2_G956_490         F0    fil
31     FIL2_G957_7         F0    fil
32   FIL2_G964_479         F0    fil
33   FIL2_G968_100         F0    fil
34    FIL2_G976_42         F0    fil
35   FIL2_G979_865         F0    fil
36   FIL2_H033_594         F0    fil
37   FIL2_H037_176         F0    fil
38   FIL2_H058_486         F0    fil
39   FIL2_H068_368         F0    fil
40    FIL2_H092_19         F0    fil
41   FIL2_H097_895         F0    fil
42   FIL2_H098_248         F0    fil
43   FIL2_H114_549         F0    fil
44   FIL2_H119_563         F0    fil
45     FIL2_H120_2         F0    fil
46   FIL2_H144_888         F0    fil
47   FIL2_H170_247         F0    fil
48   FIL2_H172_475         F0    fil
49   FIL2_H182_534         F0    fil
50   FIL2_H192_228         F0    fil
51   FIL2_H207_610         F0    fil
52   FIL2_H208_103         F0    fil
53   FIL2_H211_367         F0    fil
54    FIL2_H213_58         F0    fil
55   FIL2_H228_485         F0    fil
56   FIL2_H235_289         F0    fil
57   FIL2_H267_130         F0    fil
58   FIL2_H275_502         F0    fil
59    FIL2_H293_55         F0    fil
60   FIL2_H297_504         F0    fil
61    FIL2_H326_12         F0    fil
62   FIL2_H338_904         F0    fil
63   FIL2_H345_323         F0    fil
64   HAL2_G950_542         F0    hal
65 HAL2_G959.1_220         F0    hal
66   HAL2_G988_482         F0    hal
67   HAL2_G996_353         F0    hal
68   HAL2_H008_403         F0    hal
69   HAL2_H020_447         F0    hal
70    HAL2_H057_24         F0    hal
71   HAL2_H074_147         F0    hal
72    HAL2_H090_76         F0    hal
73   HAL2_H101_237         F0    hal
74   HAL2_H126_821         F0    hal
75   HAL2_H152_332         F0    hal
76   HAL2_H166_959         F0    hal
77   HAL2_H188_341         F0    hal
78   HAL2_H193_752         F0    hal
79 HAL2_H195.1_306         F0    hal
80    HAL2_H236_73         F0    hal
81   HAL2_H246_762         F0    hal
82   HAL2_H261_344         F0    hal
83   HAL2_H266_370         F0    hal
84   HAL2_H305_910         F0    hal
85   HAL2_H309_469         F0    hal
86   HAL2_H312_538         F0    hal
87   HAL2_H319_128         F0    hal
88   HAL2_H351_435         F0    hal