Question: Transcript level differential expression analysis
gravatar for tarun2
2.6 years ago by
tarun20 wrote:

DESeq2 offers and suggested gene-level differential expression methods. Using the tximport vignette to import transcript level estimates, is it okay not to sum them to the gene level and use the transcript level estimates for DESeq2 standard workflow? This would be an additional question in the line of my previous question.

I have a 2x2 factorial reproductive-stage drought stress experiment in rice. I have 2 genotypes (NIL-tolerant, Swarna-sensitive) and 2 conditions (Drought and Control). I have 4 biological replicates. We sampled two different tissues; flag leaf and panicle at maximum booting. I'm trying to analyze the data per tissue first.

The codes I used for transcript level differential expression methods are as follows:

tx.all <- tximport(myfiles, type = "salmon", txOut = TRUE, reader = read_tsv)

colData <- data.frame(geno=rep(c("NIL","Swarna"),each=8),condition=rep(rep(c("Control","Drought"),each=4),times=2))

rownames(colData) <- colnames(tx.all$counts)

dds <- DESeqDataSetFromTximport(tx.all, colData, formula(~geno+condition+geno:condition))

dds$group<-factor(paste0(dds$geno, dds$condition))

design(dds) <- ~group

dds <- dds[ rowSums(counts(dds)) > 1, ]

> resultsNames(dds)
[1] "Intercept"          "groupNILControl"    "groupNILDrought"    "groupSwarnaControl"
[5] "groupSwarnaDrought"

res<-results(dds, contrast=c("group","NILDrought","SwarnaDrought"))


mcols(res, use.names=TRUE)


Please comment of the codes that I used as I'm very new to Bioconductor and RNA-Seq analysis. Do I have to sum them up to the gene level before doing differential expression analysis using the standard DESeq2 workflow?

Please do advise.

Thanks and best regards.



R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readr_1.1.0                tximport_1.2.0             genefilter_1.56.0         
 [4] pheatmap_1.0.8             RColorBrewer_1.1-2         ggplot2_2.2.1             
 [7] gplots_3.0.1               DESeq2_1.14.1              SummarizedExperiment_1.4.0
[10] Biobase_2.34.0             GenomicRanges_1.26.4       GenomeInfoDb_1.10.3       
[13] IRanges_2.8.1              S4Vectors_0.12.1           BiocGenerics_0.20.0       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10         locfit_1.5-9.1       lattice_0.20-34      gtools_3.5.0        
 [5] assertthat_0.1       digest_0.6.12        R6_2.2.0             plyr_1.8.4          
 [9] backports_1.0.5      acepack_1.4.1        RSQLite_1.1-2        zlibbioc_1.20.0     
[13] lazyeval_0.2.0       data.table_1.10.4    annotate_1.52.1      gdata_2.17.0        
[17] rpart_4.1-10         Matrix_1.2-7.1       checkmate_1.8.2      splines_3.3.2       
[21] BiocParallel_1.8.1   geneplotter_1.52.0   stringr_1.2.0        foreign_0.8-67      
[25] htmlwidgets_0.8      RCurl_1.95-4.8       munsell_0.4.3        base64enc_0.1-3     
[29] htmltools_0.3.5      nnet_7.3-12          tibble_1.2           gridExtra_2.2.1     
[33] htmlTable_1.9        Hmisc_4.0-2          XML_3.98-1.5         bitops_1.0-6        
[37] grid_3.3.2           xtable_1.8-2         gtable_0.2.0         DBI_0.6             
[41] magrittr_1.5         scales_0.4.1         KernSmooth_2.23-15   stringi_1.1.3       
[45] XVector_0.14.1       latticeExtra_0.6-28  Formula_1.2-1        tools_3.3.2         
[49] hms_0.3              survival_2.39-5      AnnotationDbi_1.36.2 colorspace_1.3-2    
[53] cluster_2.0.5        caTools_1.17.1       memoise_1.0.0        knitr_1.15.1       
ADD COMMENTlink modified 2.6 years ago by Michael Love26k • written 2.6 years ago by tarun20
Answer: Transcript level differential expression analysis
gravatar for Michael Love
2.6 years ago by
Michael Love26k
United States
Michael Love26k wrote:


I have started examining the behavior of DESeq2 at the transcript-level, and while not being designed for this level of analysis, it seems to perform fairly well. We're working on some modifications which should make it perform even better, but it's not far behind the methods designed for transcript-level in my testing. You could also try methods designed explicitly for transcript-level like BitSeq and sleuth. For DESeq2, as with gene-level analysis, I would recommend after running that you examine the plotCounts() for the DE transcripts.

ADD COMMENTlink written 2.6 years ago by Michael Love26k

I can;t honestly figure out how to "examine the plotCounts() for the DE transcripts." I saw in some of the workflows about quickly visualizing the counts for a particular gene using these codes that I tailored based on my dataset:

topGene <- rownames(res)[which.min(res$padj)]
plotCounts(dds, gene=topGene, intgroup=c("group"))

I'm not sure if this how to go by examining the plotCounts for DE transcripts.

Please advise.

Thanks and best regards.




ADD REPLYlink written 2.6 years ago by tarun20

Here 'gene' just looks up the row number or row name of the dds. So even if you had transcripts, you could put the row number or transcript name there and it would plot the counts. I could have named that argument 'row' and it would be more accurate, but it's too late to change (would break user's code).

ADD REPLYlink written 2.6 years ago by Michael Love26k

This is a different topic to ask. 

First, do you need at least seven replicates in order to invoke the "minReplicatesForReplace" parameter in dds or when replacing the outlier value using "ddsClean <- replaceOutliersWithTrimmedMean(dds)?"

I have four biological replicates and I'm getting few hundreds of outliers. I'm thinking to replace the outliers but I'm not sure if that's allowed as my replication is less than 7.

Lastly, as DESeq2 "chooses an optimal threshold that maximizes the number of genes found at a user-specified target FDR," so it is better to have the independentFiltering = TRUE in order to detect more differentially express genes? 

Please advise.

Thank you so much for patiently helping me go through the workflow.



ADD REPLYlink written 2.6 years ago by tarun20

minReplicatesForReplace (default of 7) dictates when DESeq() will automatically replace outliers.

I would not recommend outlier replacement with 4 biological replicates.

Yes, independentFiltering=TRUE will give you as many or more DE genes as FALSE. That is how the procedure is defined.

ADD REPLYlink written 2.6 years ago by Michael Love26k

Hi Michael, I was wondering if you have made any progress on testing differential expression at the transcript level with DESeq2.

ADD REPLYlink written 19 months ago by swebb10

Nothing that's ready for release beyond the above. It works, but can be made even better, but that work is in progress.

I'm also currently benchmarking differential transcript usage or splicing event methods, like DRIMSeq, SUPPA, IsoformSwitchAnalyzeR, and others.

ADD REPLYlink written 19 months ago by Michael Love26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 205 users visited in the last hour