DESeq2 offers and suggested gene-level differential expression methods. Using the tximport vignette to import transcript level estimates, is it okay not to sum them to the gene level and use the transcript level estimates for DESeq2 standard workflow? This would be an additional question in the line of my previous question.
I have a 2x2 factorial reproductive-stage drought stress experiment in rice. I have 2 genotypes (NIL-tolerant, Swarna-sensitive) and 2 conditions (Drought and Control). I have 4 biological replicates. We sampled two different tissues; flag leaf and panicle at maximum booting. I'm trying to analyze the data per tissue first.
The codes I used for transcript level differential expression methods are as follows:
tx.all <- tximport(myfiles, type = "salmon", txOut = TRUE, reader = read_tsv) colData <- data.frame(geno=rep(c("NIL","Swarna"),each=8),condition=rep(rep(c("Control","Drought"),each=4),times=2)) rownames(colData) <- colnames(tx.all$counts) dds <- DESeqDataSetFromTximport(tx.all, colData, formula(~geno+condition+geno:condition)) dds$group<-factor(paste0(dds$geno, dds$condition)) design(dds) <- ~group nrow(dds) dds <- dds[ rowSums(counts(dds)) > 1, ] nrow(dds) dds<-DESeq(dds) > resultsNames(dds) [1] "Intercept" "groupNILControl" "groupNILDrought" "groupSwarnaControl" [5] "groupSwarnaDrought" res<-results(dds, contrast=c("group","NILDrought","SwarnaDrought")) res mcols(res, use.names=TRUE) summary(res)
Please comment of the codes that I used as I'm very new to Bioconductor and RNA-Seq analysis. Do I have to sum them up to the gene level before doing differential expression analysis using the standard DESeq2 workflow?
Please do advise.
Thanks and best regards.
Sincerely,
Asher
R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] readr_1.1.0 tximport_1.2.0 genefilter_1.56.0 [4] pheatmap_1.0.8 RColorBrewer_1.1-2 ggplot2_2.2.1 [7] gplots_3.0.1 DESeq2_1.14.1 SummarizedExperiment_1.4.0 [10] Biobase_2.34.0 GenomicRanges_1.26.4 GenomeInfoDb_1.10.3 [13] IRanges_2.8.1 S4Vectors_0.12.1 BiocGenerics_0.20.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.10 locfit_1.5-9.1 lattice_0.20-34 gtools_3.5.0 [5] assertthat_0.1 digest_0.6.12 R6_2.2.0 plyr_1.8.4 [9] backports_1.0.5 acepack_1.4.1 RSQLite_1.1-2 zlibbioc_1.20.0 [13] lazyeval_0.2.0 data.table_1.10.4 annotate_1.52.1 gdata_2.17.0 [17] rpart_4.1-10 Matrix_1.2-7.1 checkmate_1.8.2 splines_3.3.2 [21] BiocParallel_1.8.1 geneplotter_1.52.0 stringr_1.2.0 foreign_0.8-67 [25] htmlwidgets_0.8 RCurl_1.95-4.8 munsell_0.4.3 base64enc_0.1-3 [29] htmltools_0.3.5 nnet_7.3-12 tibble_1.2 gridExtra_2.2.1 [33] htmlTable_1.9 Hmisc_4.0-2 XML_3.98-1.5 bitops_1.0-6 [37] grid_3.3.2 xtable_1.8-2 gtable_0.2.0 DBI_0.6 [41] magrittr_1.5 scales_0.4.1 KernSmooth_2.23-15 stringi_1.1.3 [45] XVector_0.14.1 latticeExtra_0.6-28 Formula_1.2-1 tools_3.3.2 [49] hms_0.3 survival_2.39-5 AnnotationDbi_1.36.2 colorspace_1.3-2 [53] cluster_2.0.5 caTools_1.17.1 memoise_1.0.0 knitr_1.15.1
I can;t honestly figure out how to "examine the plotCounts() for the DE transcripts." I saw in some of the workflows about quickly visualizing the counts for a particular gene using these codes that I tailored based on my dataset:
topGene <- rownames(res)[which.min(res$padj)]
x11(15,15)
plotCounts(dds, gene=topGene, intgroup=c("group"))
I'm not sure if this how to go by examining the plotCounts for DE transcripts.
Please advise.
Thanks and best regards.
Sincerely,
Asher
Here '
gene
' just looks up the row number or row name of the dds. So even if you had transcripts, you could put the row number or transcript name there and it would plot the counts. I could have named that argument 'row
' and it would be more accurate, but it's too late to change (would break user's code).This is a different topic to ask.
First, do you need at least seven replicates in order to invoke the "minReplicatesForReplace" parameter in dds or when replacing the outlier value using "ddsClean <- replaceOutliersWithTrimmedMean(dds)?"
I have four biological replicates and I'm getting few hundreds of outliers. I'm thinking to replace the outliers but I'm not sure if that's allowed as my replication is less than 7.
Lastly, as DESeq2 "chooses an optimal threshold that maximizes the number of genes found at a user-specified target FDR," so it is better to have the independentFiltering = TRUE in order to detect more differentially express genes?
Please advise.
Thank you so much for patiently helping me go through the workflow.
Sincerely,
Asher
minReplicatesForReplace (default of 7) dictates when DESeq() will automatically replace outliers.
I would not recommend outlier replacement with 4 biological replicates.
Yes, independentFiltering=TRUE will give you as many or more DE genes as FALSE. That is how the procedure is defined.
Hi Michael, I was wondering if you have made any progress on testing differential expression at the transcript level with DESeq2.
Nothing that's ready for release beyond the above. It works, but can be made even better, but that work is in progress.
I'm also currently benchmarking differential transcript usage or splicing event methods, like DRIMSeq, SUPPA, IsoformSwitchAnalyzeR, and others.