Question: Transcript level differential expression analysis
0
23 months ago by
tarun20
tarun20 wrote:

DESeq2 offers and suggested gene-level differential expression methods. Using the tximport vignette to import transcript level estimates, is it okay not to sum them to the gene level and use the transcript level estimates for DESeq2 standard workflow? This would be an additional question in the line of my previous question.

I have a 2x2 factorial reproductive-stage drought stress experiment in rice. I have 2 genotypes (NIL-tolerant, Swarna-sensitive) and 2 conditions (Drought and Control). I have 4 biological replicates. We sampled two different tissues; flag leaf and panicle at maximum booting. I'm trying to analyze the data per tissue first.

The codes I used for transcript level differential expression methods are as follows:

tx.all <- tximport(myfiles, type = "salmon", txOut = TRUE, reader = read_tsv)

colData <- data.frame(geno=rep(c("NIL","Swarna"),each=8),condition=rep(rep(c("Control","Drought"),each=4),times=2))

rownames(colData) <- colnames(tx.all$counts) dds <- DESeqDataSetFromTximport(tx.all, colData, formula(~geno+condition+geno:condition)) dds$group<-factor(paste0(dds$geno, dds$condition))

design(dds) <- ~group

nrow(dds)
dds <- dds[ rowSums(counts(dds)) > 1, ]
nrow(dds)

dds<-DESeq(dds)
> resultsNames(dds)
[1] "Intercept"          "groupNILControl"    "groupNILDrought"    "groupSwarnaControl"

res

mcols(res, use.names=TRUE)

summary(res)

Please comment of the codes that I used as I'm very new to Bioconductor and RNA-Seq analysis. Do I have to sum them up to the gene level before doing differential expression analysis using the standard DESeq2 workflow?

Thanks and best regards.

Sincerely,
Asher

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[4] pheatmap_1.0.8             RColorBrewer_1.1-2         ggplot2_2.2.1
[7] gplots_3.0.1               DESeq2_1.14.1              SummarizedExperiment_1.4.0
[10] Biobase_2.34.0             GenomicRanges_1.26.4       GenomeInfoDb_1.10.3
[13] IRanges_2.8.1              S4Vectors_0.12.1           BiocGenerics_0.20.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.10         locfit_1.5-9.1       lattice_0.20-34      gtools_3.5.0
[5] assertthat_0.1       digest_0.6.12        R6_2.2.0             plyr_1.8.4
[9] backports_1.0.5      acepack_1.4.1        RSQLite_1.1-2        zlibbioc_1.20.0
[13] lazyeval_0.2.0       data.table_1.10.4    annotate_1.52.1      gdata_2.17.0
[17] rpart_4.1-10         Matrix_1.2-7.1       checkmate_1.8.2      splines_3.3.2
[21] BiocParallel_1.8.1   geneplotter_1.52.0   stringr_1.2.0        foreign_0.8-67
[25] htmlwidgets_0.8      RCurl_1.95-4.8       munsell_0.4.3        base64enc_0.1-3
[29] htmltools_0.3.5      nnet_7.3-12          tibble_1.2           gridExtra_2.2.1
[33] htmlTable_1.9        Hmisc_4.0-2          XML_3.98-1.5         bitops_1.0-6
[37] grid_3.3.2           xtable_1.8-2         gtable_0.2.0         DBI_0.6
[41] magrittr_1.5         scales_0.4.1         KernSmooth_2.23-15   stringi_1.1.3
[45] XVector_0.14.1       latticeExtra_0.6-28  Formula_1.2-1        tools_3.3.2
[49] hms_0.3              survival_2.39-5      AnnotationDbi_1.36.2 colorspace_1.3-2
[53] cluster_2.0.5        caTools_1.17.1       memoise_1.0.0        knitr_1.15.1       
modified 23 months ago by Michael Love22k • written 23 months ago by tarun20
Answer: Transcript level differential expression analysis
1
23 months ago by
Michael Love22k
United States
Michael Love22k wrote:

Asher,

I have started examining the behavior of DESeq2 at the transcript-level, and while not being designed for this level of analysis, it seems to perform fairly well. We're working on some modifications which should make it perform even better, but it's not far behind the methods designed for transcript-level in my testing. You could also try methods designed explicitly for transcript-level like BitSeq and sleuth. For DESeq2, as with gene-level analysis, I would recommend after running that you examine the plotCounts() for the DE transcripts.

I can;t honestly figure out how to "examine the plotCounts() for the DE transcripts." I saw in some of the workflows about quickly visualizing the counts for a particular gene using these codes that I tailored based on my dataset:

x11(15,15)
plotCounts(dds, gene=topGene, intgroup=c("group"))

I'm not sure if this how to go by examining the plotCounts for DE transcripts.

Thanks and best regards.

Sincerely,

Asher

Here 'gene' just looks up the row number or row name of the dds. So even if you had transcripts, you could put the row number or transcript name there and it would plot the counts. I could have named that argument 'row' and it would be more accurate, but it's too late to change (would break user's code).

This is a different topic to ask.

First, do you need at least seven replicates in order to invoke the "minReplicatesForReplace" parameter in dds or when replacing the outlier value using "ddsClean <- replaceOutliersWithTrimmedMean(dds)?"

I have four biological replicates and I'm getting few hundreds of outliers. I'm thinking to replace the outliers but I'm not sure if that's allowed as my replication is less than 7.

Lastly, as DESeq2 "chooses an optimal threshold that maximizes the number of genes found at a user-specified target FDR," so it is better to have the independentFiltering = TRUE in order to detect more differentially express genes?

Thank you so much for patiently helping me go through the workflow.

Sincerely,

Asher

minReplicatesForReplace (default of 7) dictates when DESeq() will automatically replace outliers.

I would not recommend outlier replacement with 4 biological replicates.

Yes, independentFiltering=TRUE will give you as many or more DE genes as FALSE. That is how the procedure is defined.

Hi Michael, I was wondering if you have made any progress on testing differential expression at the transcript level with DESeq2.

Nothing that's ready for release beyond the above. It works, but can be made even better, but that work is in progress.

I'm also currently benchmarking differential transcript usage or splicing event methods, like DRIMSeq, SUPPA, IsoformSwitchAnalyzeR, and others.