Question

Design and Different contrast results with betaPrior

0

Entering edit mode

bkrajacich • 0

@bkrajacich-13259

Last seen 6.9 years ago

Hi Michael-

I've been trying to get some microbiome analysis through DESeq2, and while I've had some success, I've also been running into some snags and want to make sure I'm on the right track.

The study design of my samples is overly complicated (they were done before I was here), but are as follows:

I have field (2 locations) and laboratory (1 location) samples with some pooled (3 samples in one run) and individual samples for each group.

Additionally for my field samples I have two different timepoints from different seasons (dry and wet).

Total this is 53 samples (see tables below).

Field Samples:	Wet Season	Dry Season
Location 1 (Sahel)	8 Pooled, 6 Individuals	5 Pooled, 6 Individuals
Location 2 (Perm)	8 Pooled, 6 Individuals	3 Pooled, 5 Individuals

Lab:	Samples:
Location 1 (Lab)	4 Pooled, 2 Individual

The principle comparisons I want are:

Field Location 1: Wet Season vs. Dry Season
Field Location 2: Wet Season vs. Dry Season
Field (overall) vs. Lab

As far as the pooled vs. individual comparison I don't believe they make much difference as from PCAs, diversity metrics, and qualitative composition I don't see differences, but maybe I can control for this in my design.

My initial run through was with me dropping laboratory samples from the analysis first, and then based on your discussion here: A: DESEq2 comparison with mulitple cell types under 2 conditions, I made a joint variable "SeasonLoc" so I had my two locations merged with their season for 4 total (full code).:

#pull in DESeq Matrix/Info
setwd("C:/Acronis/microbiomedata/DEseqMatrix/")
cts <- as.data.frame(read.csv("deseqmatrix.csv",row.names=1, check.names=FALSE, stringsAsFactors = FALSE))
coldata <- read.csv("coldata.csv", row.names=1)

#make DESeqDataSet, perform abundance analysis
dds <- DESeqDataSetFromMatrix(cts, coldata, ~SeasonLoc)
dds <- estimateSizeFactors(dds, type="poscounts")
dds <- DESeq(dds, fitType="parametric",betaPrior = TRUE)
resultsNames(dds)

[1] "Intercept"         "SeasonLocPermDry"  "SeasonLocPermWet"  "SeasonLocSahelDry" "SeasonLocSahelWet"

And then I do my contrasts (I think now with the right format after the changes to results/lfcShrink in 1.16.1):

resT <- results(dds, contrast=c("SeasonLoc", "PermDry", "PermWet"))
resT <- lfcShrink(dds, contrast=c("SeasonLoc", "PermDry", "PermWet"), res=resT)
df <- as.data.frame(resT[order(resT$padj, na.last=NA), ])

My problem then is that I only find one genera to be differentially expressed at p-adj < 0.05, and it something seems off with the log2fold changes. I.E. one taxa that has 8.44x higher abundance in the Dry Season based on raw count data (5.63 higher when accounting for sizeFactors) says it is only 1.50 fold higher in the results table and is found non-significant. Maybe I'm not understanding how the baseMeans are calculated? It's just counts / size factor for each gene, correct? The basemean does or doesn't account for the samples defined by the contrast?

Furthermore, If I try to use the betaPrior=FALSE flag it changesthe resultsNames, giving:

> resultsNames(dds)
[1] "Intercept"     "SeasonLoc_PermWet_vs_PermDry"  "SeasonLoc_SahelDry_vs_PermDry" "SeasonLoc_SahelWet_vs_PermDry"

So then I can't compare "SahelDry" to "SahelWet". (i.e. Dry vs. Wet Season in Location 1).

Because of all this, I feel like I'm off with something (maybe related to the design formula?), and I'm not sure how to make the betaPrior=FALSE implementation work with my contrasts.

Here is a link to my matrix and coldata (I'm doing my analysis from phyloseq, but this was easier to share):

https://www.dropbox.com/s/jz2r4dvcrh9nhba/DEseqMatrix.zip?dl=0

Here also is my sessionInfo() if it helps:

> sessionInfo()
R version 3.4.0 Patched (2017-06-09 r72776)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] ggplot2_2.2.1 dplyr_0.7.0 DESeq2_1.16.1 phyloseq_1.20.0
[5] SummarizedExperiment_1.6.3 DelayedArray_0.2.7 matrixStats_0.52.2 Biobase_2.36.2
[9] GenomicRanges_1.28.3 GenomeInfoDb_1.12.2 IRanges_2.10.2 S4Vectors_0.14.3
[13] BiocGenerics_0.22.0

loaded via a namespace (and not attached):
[1] jsonlite_1.5 splines_3.4.0 foreach_1.4.3 assertthat_0.2.0 Formula_1.2-1
[6] latticeExtra_0.6-28 GenomeInfoDbData_0.99.0 RSQLite_1.1-2 backports_1.1.0 lattice_0.20-35
[11] glue_1.0.0 digest_0.6.12 RColorBrewer_1.1-2 XVector_0.16.0 checkmate_1.8.2
[16] colorspace_1.3-2 htmltools_0.3.6 Matrix_1.2-10 plyr_1.8.4 XML_3.98-1.7
[21] genefilter_1.58.1 zlibbioc_1.22.0 xtable_1.8-2 scales_0.4.1 BiocParallel_1.10.1
[26] htmlTable_1.9 tibble_1.3.3 annotate_1.54.0 mgcv_1.8-17 nnet_7.3-12
[31] lazyeval_0.2.0 survival_2.41-3 magrittr_1.5 memoise_1.1.0 nlme_3.1-131
[36] MASS_7.3-47 foreign_0.8-68 vegan_2.4-3 tools_3.4.0 data.table_1.10.4
[41] stringr_1.2.0 locfit_1.5-9.1 munsell_0.4.3 cluster_2.0.6 AnnotationDbi_1.38.1
[46] Biostrings_2.44.1 ade4_1.7-6 compiler_3.4.0 rlang_0.1.1 rhdf5_2.20.0
[51] grid_3.4.0 RCurl_1.95-4.8 iterators_1.0.8 biomformat_1.4.0 htmlwidgets_0.8
[56] igraph_1.0.1 labeling_0.3 bitops_1.0-6 base64enc_0.1-3 gtable_0.2.0
[61] codetools_0.2-15 multtest_2.32.0 DBI_0.6-1 R6_2.2.1 reshape2_1.4.2
[66] gridExtra_2.2.1 knitr_1.16 Hmisc_4.0-3 permute_0.9-4 ape_4.1
[71] stringi_1.1.5 Rcpp_0.12.11 geneplotter_1.54.0 rpart_4.1-11 acepack_1.4.1

Any input would be greatly appreciated (sorry for the length).

Thanks,

Ben

deseq2 • 762 views

ADD COMMENT • link 6.9 years ago bkrajacich • 0

0

Entering edit mode

Hi Michael, thanks for checking in.

I think something is weird though with the betaprior T/F call. i.e. if I do a contrast of SahelDry to SahelWet with betaPrior=FALSE or betaPrior=TRUE

I get:

	baseMean	log2FoldChange	stat	pvalue	padj
Tatumella w/ betaPrior=FALSE	2.62663926	0.3387139	-7.43407692	1.053007e-13	4.043546e-11
Tatumella w/ betaPrior=TRUE	2.62663926	0.3491179	-0.58953541	0.5555022	0.9709815

This genera is a really low abundance one in these samples (i.e 19 reads in two samples of SahelWet to 30 reads in 1 sample of SahelDry)---I'm not sure why it comes up so differently.

I've also played with the fitTypes, but it doesn't seem to change things greatly.

Any thoughts?

-Ben

ADD REPLY • link 6.9 years ago bkrajacich • 0

0

Entering edit mode

I couldn't find this gene when running the code on my machine. Anyway here is the code I would recommend

dds <- estimateSizeFactors(dds, type="poscounts")
dds <- DESeq(dds, minReplicates=FALSE) # replacement procedure probably not good for zero inflated data
res <- results(dds, contrast=c("SeasonLoc","SahelDry","SahelWet"))

ADD REPLY • link 6.9 years ago Michael Love 41k

score 1 · Answer 1 · 2017-06-14

Re: "I can't compare "SahelDry" to "SahelWet"" You can, just use the contrast argument, this works regardless of betaPrior's value. As for some results not being significant but you feel they should be, it could be that the model is not well specified. Newer methods designed for metagenomics may outperform, such as metagenomeSeq or method called zingeR.