Hi Michael-
I've been trying to get some microbiome analysis through DESeq2, and while I've had some success, I've also been running into some snags and want to make sure I'm on the right track.
The study design of my samples is overly complicated (they were done before I was here), but are as follows:
I have field (2 locations) and laboratory (1 location) samples with some pooled (3 samples in one run) and individual samples for each group.
Additionally for my field samples I have two different timepoints from different seasons (dry and wet).
Total this is 53 samples (see tables below).
Field Samples: | Wet Season | Dry Season |
Location 1 (Sahel) | 8 Pooled, 6 Individuals | 5 Pooled, 6 Individuals |
Location 2 (Perm) | 8 Pooled, 6 Individuals | 3 Pooled, 5 Individuals |
Lab: | Samples: |
Location 1 (Lab) | 4 Pooled, 2 Individual |
The principle comparisons I want are:
- Field Location 1: Wet Season vs. Dry Season
- Field Location 2: Wet Season vs. Dry Season
- Field (overall) vs. Lab
As far as the pooled vs. individual comparison I don't believe they make much difference as from PCAs, diversity metrics, and qualitative composition I don't see differences, but maybe I can control for this in my design.
My initial run through was with me dropping laboratory samples from the analysis first, and then based on your discussion here: A: DESEq2 comparison with mulitple cell types under 2 conditions, I made a joint variable "SeasonLoc" so I had my two locations merged with their season for 4 total (full code).:
#pull in DESeq Matrix/Info setwd("C:/Acronis/microbiomedata/DEseqMatrix/") cts <- as.data.frame(read.csv("deseqmatrix.csv",row.names=1, check.names=FALSE, stringsAsFactors = FALSE)) coldata <- read.csv("coldata.csv", row.names=1) #make DESeqDataSet, perform abundance analysis dds <- DESeqDataSetFromMatrix(cts, coldata, ~SeasonLoc) dds <- estimateSizeFactors(dds, type="poscounts") dds <- DESeq(dds, fitType="parametric",betaPrior = TRUE) resultsNames(dds) [1] "Intercept" "SeasonLocPermDry" "SeasonLocPermWet" "SeasonLocSahelDry" "SeasonLocSahelWet"
And then I do my contrasts (I think now with the right format after the changes to results/lfcShrink in 1.16.1):
resT <- results(dds, contrast=c("SeasonLoc", "PermDry", "PermWet")) resT <- lfcShrink(dds, contrast=c("SeasonLoc", "PermDry", "PermWet"), res=resT) df <- as.data.frame(resT[order(resT$padj, na.last=NA), ])
My problem then is that I only find one genera to be differentially expressed at p-adj < 0.05, and it something seems off with the log2fold changes. I.E. one taxa that has 8.44x higher abundance in the Dry Season based on raw count data (5.63 higher when accounting for sizeFactors) says it is only 1.50 fold higher in the results table and is found non-significant. Maybe I'm not understanding how the baseMeans are calculated? It's just counts / size factor for each gene, correct? The basemean does or doesn't account for the samples defined by the contrast?
Furthermore, If I try to use the betaPrior=FALSE flag it changesthe resultsNames, giving:
> resultsNames(dds) [1] "Intercept" "SeasonLoc_PermWet_vs_PermDry" "SeasonLoc_SahelDry_vs_PermDry" "SeasonLoc_SahelWet_vs_PermDry"
So then I can't compare "SahelDry" to "SahelWet". (i.e. Dry vs. Wet Season in Location 1).
Because of all this, I feel like I'm off with something (maybe related to the design formula?), and I'm not sure how to make the betaPrior=FALSE implementation work with my contrasts.
Here is a link to my matrix and coldata (I'm doing my analysis from phyloseq, but this was easier to share):
https://www.dropbox.com/s/jz2r4dvcrh9nhba/DEseqMatrix.zip?dl=0
Here also is my sessionInfo() if it helps:
> sessionInfo()
R version 3.4.0 Patched (2017-06-09 r72776)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.2.1 dplyr_0.7.0 DESeq2_1.16.1 phyloseq_1.20.0
[5] SummarizedExperiment_1.6.3 DelayedArray_0.2.7 matrixStats_0.52.2 Biobase_2.36.2
[9] GenomicRanges_1.28.3 GenomeInfoDb_1.12.2 IRanges_2.10.2 S4Vectors_0.14.3
[13] BiocGenerics_0.22.0
loaded via a namespace (and not attached):
[1] jsonlite_1.5 splines_3.4.0 foreach_1.4.3 assertthat_0.2.0 Formula_1.2-1
[6] latticeExtra_0.6-28 GenomeInfoDbData_0.99.0 RSQLite_1.1-2 backports_1.1.0 lattice_0.20-35
[11] glue_1.0.0 digest_0.6.12 RColorBrewer_1.1-2 XVector_0.16.0 checkmate_1.8.2
[16] colorspace_1.3-2 htmltools_0.3.6 Matrix_1.2-10 plyr_1.8.4 XML_3.98-1.7
[21] genefilter_1.58.1 zlibbioc_1.22.0 xtable_1.8-2 scales_0.4.1 BiocParallel_1.10.1
[26] htmlTable_1.9 tibble_1.3.3 annotate_1.54.0 mgcv_1.8-17 nnet_7.3-12
[31] lazyeval_0.2.0 survival_2.41-3 magrittr_1.5 memoise_1.1.0 nlme_3.1-131
[36] MASS_7.3-47 foreign_0.8-68 vegan_2.4-3 tools_3.4.0 data.table_1.10.4
[41] stringr_1.2.0 locfit_1.5-9.1 munsell_0.4.3 cluster_2.0.6 AnnotationDbi_1.38.1
[46] Biostrings_2.44.1 ade4_1.7-6 compiler_3.4.0 rlang_0.1.1 rhdf5_2.20.0
[51] grid_3.4.0 RCurl_1.95-4.8 iterators_1.0.8 biomformat_1.4.0 htmlwidgets_0.8
[56] igraph_1.0.1 labeling_0.3 bitops_1.0-6 base64enc_0.1-3 gtable_0.2.0
[61] codetools_0.2-15 multtest_2.32.0 DBI_0.6-1 R6_2.2.1 reshape2_1.4.2
[66] gridExtra_2.2.1 knitr_1.16 Hmisc_4.0-3 permute_0.9-4 ape_4.1
[71] stringi_1.1.5 Rcpp_0.12.11 geneplotter_1.54.0 rpart_4.1-11 acepack_1.4.1
Any input would be greatly appreciated (sorry for the length).
Thanks,
Ben
Hi Michael, thanks for checking in.
I think something is weird though with the betaprior T/F call. i.e. if I do a contrast of SahelDry to SahelWet with betaPrior=FALSE or betaPrior=TRUE
I get:
This genera is a really low abundance one in these samples (i.e 19 reads in two samples of SahelWet to 30 reads in 1 sample of SahelDry)---I'm not sure why it comes up so differently.
I've also played with the fitTypes, but it doesn't seem to change things greatly.
Any thoughts?
-Ben
I couldn't find this gene when running the code on my machine. Anyway here is the code I would recommend