I've been following DESeq2 vignette, manuals, and different posts/blog from DESeq2 community that I came up with my analysis. Apologies as this will be a lengthy post
I have a 2x2 factorial experiment on drought tolerance in rice under reproductive stage with four biological replicates
I have 2 genotypes:
- Swarna - elite rice mega variety but drought susceptible
- Near-Isogenic Lines (NIL-drought tolerant) which resulted from a cross between a donor parent and an elite mega variety Swarna.
NIL is basically of Swarna background but with an introgressed segment on chromosome 1. There is a major QTL identified on chromosome 1 which confers drought tolerance at reproductive stage.
I have 2 conditions obviously; Drought and Control (it's a greenhouse set-up for both conditions).
I have two tissues under consideration, flag leaf, and panicles but I'm analyzing them independently at this stage.
The goal then is, to identify genes that co-localizes in QTL region which might explain the drought tolerant phenotype of NIL.
To do this, I used Salmon for transcript abundance quantification followed by tximport package and import gene-level values (as suggested by the authors). The scripts for that is as follow;
tx.all <- tximport(myfiles, type = "salmon", tx2gene = tx2gene, reader = read_tsv)
I then constructed the deseq table
colData <- data.frame(geno=rep(c("NIL","Swarna"),each=8),condition=rep(rep(c("Control","Drought"),each=4),times=2)) rownames(colData) <- colnames(tx.all$counts) dds <- DESeqDataSetFromTximport(tx.all, colData, formula(~geno+condition+geno:condition))
In order to extract multiple condition effects, I combined the factors of interest into a single factor
dds$group<-factor(paste0(dds$geno, dds$condition)) design(dds) <- ~group
I applied the most minimal filtering
nrow(dds) dds <- dds[ rowSums(counts(dds)) > 1, ] nrow(dds)
And then I ran differential expression analysis
dds<-DESeq(dds, parallel = TRUE) resultsNames(dds)  "Intercept" "groupNILControl" "groupNILDrought" "groupSwarnaControl"  "groupSwarnaDrought"
The comes the result generation. Since we wanted to identify genes that could most likely explain the drought tolerant in NIL compared to Swarna (susceptible), I set-up different contrast test. I have four contrast to be exact
contrast=c("group","NILControl","SwarnaControl") contrast=c("group","NILDrought","NILControl") contrast=c("group","SwarnaDrought","SwarnaControl") contrast=c("group","NILDrought","SwarnaDrought")
I extracted results for each contrast at the default parameter. The most interesting contrast for us would be contrast=c("group","NILDrought","SwarnaDrought"). Thus for illustration purposes, Ill be focusing on this contrast.
res<-results(dds, contrast=c("group","NILDrought","SwarnaDrought"), parallel = TRUE) res summary(res)
At the default parameter which is at adjusted p-value of 0.1 we identified genes within the QTL region in all specified contrast.
To be more strict on which genes are considered significant, I lowered the false discovery rate for all contrast as follows;
res.05 <- results(dds, contrast=c("group","NILDrought","SwarnaDrought"), alpha=.05, parallel = TRUE) table(res.05$padj < 0.05) summary(res.05) sum(res.05$padj < 0.05, na.rm=TRUE)
Again, we got a list of DEGs within the QTL region on all of the four contrast.
And then I raised the log2 fold change
resLFC1 <- results(dds, contrast=c("group","NILDrought","SwarnaDrought"), lfcThreshold = 1, alpha = 0.05, parallel = TRUE) table(resLFC1$padj < 0.05) summary(resLFC1) sum( resLFC1$padj < 0.05, na.rm=TRUE )
From this additional parameter, we only got a result from two contrast (below) that would explain treatment difference only regardless of the genotype
contrast=c("group","NILDrought","NILControl") and contrast=c("group","SwarnaDrought","SwarnaControl")
We did not get any result from the other two contrast (below) that would explain genotypic difference which for us is the most important contrast. I did lower the alpha =0.1 from this parameter but we still did not get any genes within the QTL region.
contrast=c("group","NILControl","SwarnaControl") and contrast=c("group","NILDrought","SwarnaDrought")
1. Was there something wrong in the way I set-up the design formula and was my grouping appropriate?
2. Was my approach to achieved the goal using the specific contrast method did not address the question of identifying DEGs between NIL and Swarna under drought stress? Would there be a better approach that I should have done to handle this?
3. In defining which genes are differentially express, is it strictly that we should invoke the log2 fold change (LFC=1) parameter with the specified alpha = 0.1 or 0.05? What is the implication in result generation if we did not invoke the log2 fold change (PFC=1) parameter?
4. Is it possible to say that the results generated from the default setting of adjusted p-value = 0.1 or even at 0.05 differentially express without invoking the LFC parameter? Can we use only this parameter for DEGs generation?
5. To say that it is biologically and statistically significant, what are the ultimate parameters that we should use?
Please and kindly enlighten me on these issues and if possible provide suggestions
My deepest and sincerest gratitude,
R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale:  LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252  LC_MONETARY=English_United States.1252 LC_NUMERIC=C  LC_TIME=English_United States.1252 attached base packages:  parallel stats4 stats graphics grDevices utils datasets methods base other attached packages:  readr_1.1.0 tximport_1.2.0 genefilter_1.56.0  pheatmap_1.0.8 RColorBrewer_1.1-2 ggplot2_2.2.1  gplots_3.0.1 DESeq2_1.14.1 SummarizedExperiment_1.4.0  Biobase_2.34.0 GenomicRanges_1.26.4 GenomeInfoDb_1.10.3  IRanges_2.8.1 S4Vectors_0.12.1 BiocGenerics_0.20.0 loaded via a namespace (and not attached):  Rcpp_0.12.10 locfit_1.5-9.1 lattice_0.20-34 snow_0.4-2  gtools_3.5.0 assertthat_0.1 digest_0.6.12 R6_2.2.0  plyr_1.8.4 backports_1.0.5 acepack_1.4.1 RSQLite_1.1-2  zlibbioc_1.20.0 lazyeval_0.2.0 data.table_1.10.4 annotate_1.52.1  gdata_2.17.0 rpart_4.1-10 Matrix_1.2-7.1 checkmate_1.8.2  labeling_0.3 splines_3.3.2 BiocParallel_1.8.1 geneplotter_1.52.0  stringr_1.2.0 foreign_0.8-67 htmlwidgets_0.8 RCurl_1.95-4.8  munsell_0.4.3 base64enc_0.1-3 htmltools_0.3.5 nnet_7.3-12  tibble_1.2 gridExtra_2.2.1 htmlTable_1.9 Hmisc_4.0-2  XML_3.98-1.5 bitops_1.0-6 grid_3.3.2 xtable_1.8-2  gtable_0.2.0 DBI_0.6 magrittr_1.5 scales_0.4.1  KernSmooth_2.23-15 stringi_1.1.3 XVector_0.14.1 latticeExtra_0.6-28  Formula_1.2-1 tools_3.3.2 hms_0.3 survival_2.39-5  AnnotationDbi_1.36.2 colorspace_1.3-2 cluster_2.0.5 caTools_1.17.1  memoise_1.0.0 knitr_1.15.1