Problems with LRT code using DESeq2
3
0
Entering edit mode
newsomew13 • 0
@newsomew13-9225
Last seen 7.9 years ago
United States

Hello,

I have some questions about running a likelihood ratio test in the DESeq2 package in R. My experiment has two factors: sex and population, with 8-12 replicates per condition. I wish to identify genes that show an interaction between sex and population (specifically, genes with a sexually dimorphic expression in populations A and B, but not populations C and D).
Unfortunately, I'm not sure that I've set up the code properly, as my LRT results don't seem to corroborate with the results from some of my more heuristic methods.
Can anyone tell me if I'm making any glaring mistakes in my code?

 

countData <- read.table ("Counttable.txt", header=TRUE)

stickleDesign = data.frame (
row.names = colnames(countData),
sex = c("M", "F", "M", cont... )
pop = c("A", "A", "B", cont... )

library("DESeq2")

dds <- DESeqDataSetFromMatrix (countData=countData,
colData=stickleDesign,
design = ~sex)
dds <- DESeq(dds)

# Multi-Factor Analysis

design(dds) <- formula(~ sex + pop + sex:pop)
dds <- DESeq(dds, test=c("LRT"), full=~ sex + pop + sex:pop, reduced=~sex + pop)

resMFType <- results(dds, contrast=c("sex","M","F"))

 

Additionally, how do I tell that the model with the interaction term is necessarily better than one without? Is there an objective way to tell? I guess I was expecting a single p-value to represent the effectiveness of the model, but I don't see that I'm provided with anything like that.

Thanks for the help!
Will

deseq2 LRT differential gene expression • 2.8k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 4 hours ago
United States
Can you post your sessionInfo? The current version of DESeq2 is 1.10, which we encourage users to use. The current help files online correspond to this version so it's important to update. To test the interaction you should just use: results(dds) And not use the contrast argument, which may be giving you some other result, not the LRT pvalues. For more details, see the description of interaction terms in the current vignette, and the section of ?results which discusses LRT. If the pvalue for the LRT is small, this is evidence against the null hypothesis that the interaction term is equal to zero, i.e. that sex and pop have additive effects only and no interaction.
ADD COMMENT
0
Entering edit mode

Hi Michael, Is this still true in the current version? Should I refrain from using the "contrast" argument when using LRT instead of the wald test? If so, how are my log2 fold changes calculated? First category in the factor compared to the others in alphabetical order?

ADD REPLY
0
Entering edit mode

If you use 'contrast' and want to test the contrast, you have to specify test="Wald"

If you use contrast, the only change will be the LFC. See the section of ?results that discusses LRT and p-values.

See the vignette regarding factor levels.

ADD REPLY
0
Entering edit mode

Thanks, that clarifies things.

ADD REPLY
0
Entering edit mode
newsomew13 • 0
@newsomew13-9225
Last seen 7.9 years ago
United States

> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] DESeq2_1.6.3              RcppArmadillo_0.5.600.2.0
[3] Rcpp_0.12.1               GenomicRanges_1.18.4     
[5] GenomeInfoDb_1.2.5        IRanges_2.0.1            
[7] S4Vectors_0.4.0           BiocGenerics_0.12.1      

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.2
 [4] base64enc_0.1-3      BatchJobs_1.6        BBmisc_1.9          
 [7] Biobase_2.26.0       BiocParallel_1.0.3   brew_1.0-6          
[10] checkmate_1.6.2      cluster_2.0.1        codetools_0.2-14    
[13] colorspace_1.2-6     DBI_0.3.1            digest_0.6.8        
[16] fail_1.3             foreach_1.4.3        foreign_0.8-63      
[19] Formula_1.2-1        genefilter_1.48.1    geneplotter_1.44.0  
[22] ggplot2_1.0.1        grid_3.1.3           gridExtra_2.0.0     
[25] gtable_0.1.2         Hmisc_3.17-0         iterators_1.0.8     
[28] lattice_0.20-30      latticeExtra_0.6-26  locfit_1.5-9.1      
[31] magrittr_1.5         MASS_7.3-39          munsell_0.4.2       
[34] nnet_7.3-9           plyr_1.8.3           proto_0.3-10        
[37] RColorBrewer_1.1-2   reshape2_1.4.1       rpart_4.1-9         
[40] RSQLite_1.0.0        scales_0.3.0         sendmailR_1.2-1     
[43] splines_3.1.3        stringi_0.5-5        stringr_1.0.0       
[46] survival_2.38-1      tools_3.1.3          XML_3.98-1.3        
[49] xtable_1.7-4         XVector_0.6.0

Thanks Michael,
Will

ADD COMMENT
0
Entering edit mode

You should update your install of R to the latest version (3.2.2), then reinstall your bioconductor packages so that you are using Bioconductor version 3.2 which has DESeq2 v1.10.1. You are using DESeq2 v1.6.3 which is a few cycles out of date.

 

ADD REPLY
0
Entering edit mode
newsomew13 • 0
@newsomew13-9225
Last seen 7.9 years ago
United States

Unfortunately I am still having difficulty extracting the results of my LRT. When I use results(dds) I get the following:

log2 fold change (MLE): sexM.popS
LRT p-value: '~ sex + pop + sex:pop' vs '~ sex + pop'
DataFrame with 22456 rows and 6 columns
                     baseMean log2FoldChange      lfcSE      stat    pvalue
                    <numeric>      <numeric>  <numeric> <numeric> <numeric>
ENSGACG00000000002  0.2013952      5.3650025 10.2462723 0.3718070 0.9460029
ENSGACG00000000003  6.2212988      0.5003966  1.7919992 1.2702247 0.7362154
ENSGACG00000000004  1.8083348      2.5071250  4.5249928 0.4426256 0.9313029
ENSGACG00000000005  0.9426167     -2.9608755  5.7570828 3.1274730 0.3723858
ENSGACG00000000006 31.7050037     -0.5546616  0.4258378 1.7519404 0.6254487
...                       ...            ...        ...       ...       ...
ENSGACG00000022902  0.0000000             NA         NA        NA        NA
ENSGACG00000022903  0.0000000             NA         NA        NA        NA
ENSGACG00000022904  0.2886593      -2.760850   4.739929  3.184223 0.3640844
ENSGACG00000022905  0.0000000             NA         NA        NA        NA
ENSGACG00000022906  1.1238251       3.225585   4.525376  4.876424 0.1810729
                        padj
                   <numeric>
ENSGACG00000000002        NA
ENSGACG00000000003 0.9982040
ENSGACG00000000004 0.9988318
ENSGACG00000000005        NA
ENSGACG00000000006 0.9982040
...                      ...
ENSGACG00000022902        NA
ENSGACG00000022903        NA
ENSGACG00000022904        NA
ENSGACG00000022905        NA
ENSGACG00000022906        NA

I'm hoping to obtain a single p-value for the model, not p-values for each individual gene. I feel like I'm missing something simple, but I'm lost.

Additionally, while I'm interested in whether the degree of sexual dimorphism varies by population, it is just as important to determine which populations appear to be the least sexually dimorphic. So if the p-value for the LRT is significant, I suspect I will need to run some sort of post hoc analysis to determine how the groups differ. Does anyone have any suggestions for which analyses to run here?

Thanks in advance for the help!
Will

 

ADD COMMENT
0
Entering edit mode

I don't see why you would want to do that.

You now have a test, for each gene, whether there is evidence of differences in the sex effect across population. For a well powered experiment, some set of genes will reject the null hypothesis, and after adjustment, you have a set of genes with an expected FDR less than some threshold, which is one kind of endpoint in a differential expression analysis (there are then downstream explorations, gene set testing, etc).

To combine tests across all genes, you could use one of the many methods to combine p-values, but then you are testing the null hypothesis that no genes have any significant interaction, which will likely be trivial to reject, and then what? That combined test provides much less information and in my opinion is not very meaningful to a reader.

To determine which groups differ by sex compared to the reference population, you can look at the interaction terms by their size. Large interactions indicate bigger differences in the sex effect in that population relative to the reference population. You can pull out the matrix of all the coefficients with:

coef(dds)

And then examine the columns which are interactions between sex and population. This could be plotted for a subset of genes with a heatmap for example (see vignette).

ADD REPLY

Login before adding your answer.

Traffic: 790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6