Question

Problems with LRT code using DESeq2

0

Entering edit mode

newsomew13 • 0

@newsomew13-9225

Last seen 8.2 years ago

United States

Hello,

I have some questions about running a likelihood ratio test in the DESeq2 package in R. My experiment has two factors: sex and population, with 8-12 replicates per condition. I wish to identify genes that show an interaction between sex and population (specifically, genes with a sexually dimorphic expression in populations A and B, but not populations C and D).
Unfortunately, I'm not sure that I've set up the code properly, as my LRT results don't seem to corroborate with the results from some of my more heuristic methods.
Can anyone tell me if I'm making any glaring mistakes in my code?

countData <- read.table ("Counttable.txt", header=TRUE)

stickleDesign = data.frame (
row.names = colnames(countData),
sex = c("M", "F", "M", cont... )
pop = c("A", "A", "B", cont... )

library("DESeq2")

dds <- DESeqDataSetFromMatrix (countData=countData,
colData=stickleDesign,
design = ~sex)
dds <- DESeq(dds)

# Multi-Factor Analysis

design(dds) <- formula(~ sex + pop + sex:pop)
dds <- DESeq(dds, test=c("LRT"), full=~ sex + pop + sex:pop, reduced=~sex + pop)

resMFType <- results(dds, contrast=c("sex","M","F"))

Additionally, how do I tell that the model with the interaction term is necessarily better than one without? Is there an objective way to tell? I guess I was expecting a single p-value to represent the effectiveness of the model, but I don't see that I'm provided with anything like that.

Thanks for the help!
Will

deseq2 LRT differential gene expression • 3.0k views

ADD COMMENT • link 8.7 years ago • updated 8.5 years ago newsomew13 • 0

score 0 · Answer 1 · 2015-11-23

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 12 hours ago

United States

Can you post your sessionInfo? The current version of DESeq2 is 1.10, which we encourage users to use. The current help files online correspond to this version so it's important to update. To test the interaction you should just use: results(dds) And not use the contrast argument, which may be giving you some other result, not the LRT pvalues. For more details, see the description of interaction terms in the current vignette, and the section of ?results which discusses LRT. If the pvalue for the LRT is small, this is evidence against the null hypothesis that the interaction term is equal to zero, i.e. that sex and pop have additive effects only and no interaction.

ADD COMMENT • link 8.7 years ago Michael Love 42k

0

Entering edit mode

Hi Michael, Is this still true in the current version? Should I refrain from using the "contrast" argument when using LRT instead of the wald test? If so, how are my log2 fold changes calculated? First category in the factor compared to the others in alphabetical order?

ADD REPLY • link 6.9 years ago morien • 0

0

Entering edit mode

If you use 'contrast' and want to test the contrast, you have to specify test="Wald"

If you use contrast, the only change will be the LFC. See the section of ?results that discusses LRT and p-values.

See the vignette regarding factor levels.

ADD REPLY • link 6.9 years ago Michael Love 42k

0

Entering edit mode

Thanks, that clarifies things.

ADD REPLY • link 6.9 years ago morien • 0

score 0 · Answer 2 · 2015-11-24

> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] DESeq2_1.6.3              RcppArmadillo_0.5.600.2.0
[3] Rcpp_0.12.1               GenomicRanges_1.18.4
[5] GenomeInfoDb_1.2.5        IRanges_2.0.1
[7] S4Vectors_0.4.0           BiocGenerics_0.12.1

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.2
[4] base64enc_0.1-3      BatchJobs_1.6        BBmisc_1.9
[7] Biobase_2.26.0       BiocParallel_1.0.3   brew_1.0-6
[10] checkmate_1.6.2      cluster_2.0.1        codetools_0.2-14
[13] colorspace_1.2-6     DBI_0.3.1            digest_0.6.8
[16] fail_1.3             foreach_1.4.3        foreign_0.8-63
[19] Formula_1.2-1        genefilter_1.48.1    geneplotter_1.44.0
[22] ggplot2_1.0.1        grid_3.1.3           gridExtra_2.0.0
[25] gtable_0.1.2         Hmisc_3.17-0         iterators_1.0.8
[28] lattice_0.20-30      latticeExtra_0.6-26 locfit_1.5-9.1
[31] magrittr_1.5         MASS_7.3-39          munsell_0.4.2
[34] nnet_7.3-9           plyr_1.8.3           proto_0.3-10
[37] RColorBrewer_1.1-2   reshape2_1.4.1       rpart_4.1-9
[40] RSQLite_1.0.0        scales_0.3.0         sendmailR_1.2-1
[43] splines_3.1.3        stringi_0.5-5        stringr_1.0.0
[46] survival_2.38-1      tools_3.1.3          XML_3.98-1.3
[49] xtable_1.7-4         XVector_0.6.0

Thanks Michael,
Will

score 0 · Answer 3 · 2016-01-13

Unfortunately I am still having difficulty extracting the results of my LRT. When I use results(dds) I get the following:

log2 fold change (MLE): sexM.popS
LRT p-value: '~ sex + pop + sex:pop' vs '~ sex + pop'
DataFrame with 22456 rows and 6 columns
                     baseMean log2FoldChange      lfcSE      stat    pvalue
                    <numeric>      <numeric> <numeric> <numeric> <numeric>
ENSGACG00000000002 0.2013952      5.3650025 10.2462723 0.3718070 0.9460029
ENSGACG00000000003 6.2212988      0.5003966 1.7919992 1.2702247 0.7362154
ENSGACG00000000004 1.8083348      2.5071250 4.5249928 0.4426256 0.9313029
ENSGACG00000000005 0.9426167     -2.9608755 5.7570828 3.1274730 0.3723858
ENSGACG00000000006 31.7050037     -0.5546616 0.4258378 1.7519404 0.6254487
...                       ...            ...        ...       ...       ...
ENSGACG00000022902 0.0000000             NA         NA        NA        NA
ENSGACG00000022903 0.0000000             NA         NA        NA        NA
ENSGACG00000022904 0.2886593      -2.760850   4.739929 3.184223 0.3640844
ENSGACG00000022905 0.0000000             NA         NA        NA        NA
ENSGACG00000022906 1.1238251       3.225585   4.525376 4.876424 0.1810729
                        padj
                   <numeric>
ENSGACG00000000002        NA
ENSGACG00000000003 0.9982040
ENSGACG00000000004 0.9988318
ENSGACG00000000005        NA
ENSGACG00000000006 0.9982040
...                      ...
ENSGACG00000022902        NA
ENSGACG00000022903        NA
ENSGACG00000022904        NA
ENSGACG00000022905        NA
ENSGACG00000022906        NA

I'm hoping to obtain a single p-value for the model, not p-values for each individual gene. I feel like I'm missing something simple, but I'm lost.

Additionally, while I'm interested in whether the degree of sexual dimorphism varies by population, it is just as important to determine which populations appear to be the least sexually dimorphic. So if the p-value for the LRT is significant, I suspect I will need to run some sort of post hoc analysis to determine how the groups differ. Does anyone have any suggestions for which analyses to run here?

Thanks in advance for the help!
Will