Design and Different contrast results with betaPrior
1
0
Entering edit mode
bkrajacich • 0
@bkrajacich-13259
Last seen 6.9 years ago

Hi Michael-

I've been trying to get some microbiome analysis through DESeq2, and while I've had some success, I've also been running into some snags and want to make sure I'm on the right track.

The study design of my samples is overly complicated (they were done before I was here), but are as follows:

I have field (2 locations) and laboratory (1 location) samples with some pooled (3 samples in one run) and individual samples for each group. 

Additionally for my field samples I have two different timepoints from different seasons (dry and wet).

Total this is 53 samples (see tables below).

Field Samples: Wet Season Dry Season
Location 1 (Sahel) 8 Pooled, 6 Individuals 5 Pooled, 6 Individuals
Location 2 (Perm) 8 Pooled, 6 Individuals 3 Pooled, 5 Individuals

 

Lab: Samples:
Location 1 (Lab) 4 Pooled, 2 Individual

 

The principle comparisons I want are:

  • Field Location 1: Wet Season vs. Dry Season
  • Field Location 2: Wet Season vs. Dry Season
  • Field (overall) vs. Lab 

As far as the pooled vs. individual comparison I don't believe they make much difference as from PCAs, diversity metrics, and qualitative composition I don't see differences, but maybe I can control for this in my design.

My initial run through was with me dropping laboratory samples from the analysis first, and then based on your discussion here: A: DESEq2 comparison with mulitple cell types under 2 conditions, I made a joint variable "SeasonLoc" so I had my two locations merged with their season for 4 total (full code).:

#pull in DESeq Matrix/Info
setwd("C:/Acronis/microbiomedata/DEseqMatrix/")
cts <- as.data.frame(read.csv("deseqmatrix.csv",row.names=1, check.names=FALSE, stringsAsFactors = FALSE))
coldata <- read.csv("coldata.csv", row.names=1)

#make DESeqDataSet, perform abundance analysis
dds <- DESeqDataSetFromMatrix(cts, coldata, ~SeasonLoc)
dds <- estimateSizeFactors(dds, type="poscounts")
dds <- DESeq(dds, fitType="parametric",betaPrior = TRUE)
resultsNames(dds)

[1] "Intercept"         "SeasonLocPermDry"  "SeasonLocPermWet"  "SeasonLocSahelDry" "SeasonLocSahelWet"

And then I do my contrasts (I think now with the right format after the changes to results/lfcShrink in 1.16.1):

resT <- results(dds, contrast=c("SeasonLoc", "PermDry", "PermWet"))
resT <- lfcShrink(dds, contrast=c("SeasonLoc", "PermDry", "PermWet"), res=resT)
df <- as.data.frame(resT[order(resT$padj, na.last=NA), ])

My problem then is that I only find one genera to be differentially expressed at p-adj < 0.05, and it something seems off with the log2fold changes. I.E. one taxa that has 8.44x higher abundance in the Dry Season based on raw count data (5.63 higher when accounting for sizeFactors) says it is only 1.50 fold higher in the results table and is found non-significant. Maybe I'm not understanding how the baseMeans are calculated? It's just counts / size factor for each gene, correct? The basemean does or doesn't account for the samples defined by the contrast?

Furthermore, If I try to use the betaPrior=FALSE flag it changesthe resultsNames, giving:

> resultsNames(dds)
[1] "Intercept"     "SeasonLoc_PermWet_vs_PermDry"  "SeasonLoc_SahelDry_vs_PermDry" "SeasonLoc_SahelWet_vs_PermDry"

So then I can't compare "SahelDry" to "SahelWet". (i.e. Dry vs. Wet Season in Location 1).

 

Because of all this, I feel like I'm off with something (maybe related to the design formula?), and I'm not sure how to make the betaPrior=FALSE implementation work with my contrasts.

Here is a link to my matrix and coldata (I'm doing my analysis from phyloseq, but this was easier to share):

https://www.dropbox.com/s/jz2r4dvcrh9nhba/DEseqMatrix.zip?dl=0

Here also is my sessionInfo() if it helps:

> sessionInfo()
R version 3.4.0 Patched (2017-06-09 r72776)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggplot2_2.2.1              dplyr_0.7.0                DESeq2_1.16.1              phyloseq_1.20.0           
 [5] SummarizedExperiment_1.6.3 DelayedArray_0.2.7         matrixStats_0.52.2         Biobase_2.36.2            
 [9] GenomicRanges_1.28.3       GenomeInfoDb_1.12.2        IRanges_2.10.2             S4Vectors_0.14.3          
[13] BiocGenerics_0.22.0       

loaded via a namespace (and not attached):
 [1] jsonlite_1.5            splines_3.4.0           foreach_1.4.3           assertthat_0.2.0        Formula_1.2-1          
 [6] latticeExtra_0.6-28     GenomeInfoDbData_0.99.0 RSQLite_1.1-2           backports_1.1.0         lattice_0.20-35        
[11] glue_1.0.0              digest_0.6.12           RColorBrewer_1.1-2      XVector_0.16.0          checkmate_1.8.2        
[16] colorspace_1.3-2        htmltools_0.3.6         Matrix_1.2-10           plyr_1.8.4              XML_3.98-1.7           
[21] genefilter_1.58.1       zlibbioc_1.22.0         xtable_1.8-2            scales_0.4.1            BiocParallel_1.10.1    
[26] htmlTable_1.9           tibble_1.3.3            annotate_1.54.0         mgcv_1.8-17             nnet_7.3-12            
[31] lazyeval_0.2.0          survival_2.41-3         magrittr_1.5            memoise_1.1.0           nlme_3.1-131           
[36] MASS_7.3-47             foreign_0.8-68          vegan_2.4-3             tools_3.4.0             data.table_1.10.4      
[41] stringr_1.2.0           locfit_1.5-9.1          munsell_0.4.3           cluster_2.0.6           AnnotationDbi_1.38.1   
[46] Biostrings_2.44.1       ade4_1.7-6              compiler_3.4.0          rlang_0.1.1             rhdf5_2.20.0           
[51] grid_3.4.0              RCurl_1.95-4.8          iterators_1.0.8         biomformat_1.4.0        htmlwidgets_0.8        
[56] igraph_1.0.1            labeling_0.3            bitops_1.0-6            base64enc_0.1-3         gtable_0.2.0           
[61] codetools_0.2-15        multtest_2.32.0         DBI_0.6-1               R6_2.2.1                reshape2_1.4.2         
[66] gridExtra_2.2.1         knitr_1.16              Hmisc_4.0-3             permute_0.9-4           ape_4.1                
[71] stringi_1.1.5           Rcpp_0.12.11            geneplotter_1.54.0      rpart_4.1-11            acepack_1.4.1

 

 

Any input would be greatly appreciated (sorry for the length).

Thanks,

Ben  

deseq2 • 762 views
ADD COMMENT
0
Entering edit mode

Hi Michael, thanks for checking in.

I think something is weird though with the betaprior T/F call. i.e. if I do a contrast of SahelDry to SahelWet with betaPrior=FALSE or betaPrior=TRUE

I get:

  baseMean log2FoldChange stat pvalue padj
Tatumella w/ betaPrior=FALSE 2.62663926 0.3387139 -7.43407692 1.053007e-13 4.043546e-11
Tatumella w/ betaPrior=TRUE 2.62663926 0.3491179 -0.58953541 0.5555022 0.9709815

This genera is a really low abundance one in these samples (i.e 19 reads in two samples of SahelWet to 30 reads in 1 sample of SahelDry)---I'm not sure why it comes up so differently.

I've also played with the fitTypes, but it doesn't seem to change things greatly.

 

Any thoughts?

-Ben

ADD REPLY
0
Entering edit mode

I couldn't find this gene when running the code on my machine. Anyway here is the code I would recommend

dds <- estimateSizeFactors(dds, type="poscounts")
dds <- DESeq(dds, minReplicates=FALSE) # replacement procedure probably not good for zero inflated data
res <- results(dds, contrast=c("SeasonLoc","SahelDry","SahelWet"))

 

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 18 hours ago
United States

Re: "I can't compare "SahelDry" to "SahelWet"" You can, just use the contrast argument, this works regardless of betaPrior's value. As for some results not being significant but you feel they should be, it could be that the model is not well specified. Newer methods designed for metagenomics may outperform, such as metagenomeSeq or method called zingeR.

 

ADD COMMENT

Login before adding your answer.

Traffic: 1062 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6