Transcript level differential expression analysis
1
0
Entering edit mode
tarun2 • 0
@tarun2-11885
Last seen 3.0 years ago
United States

DESeq2 offers and suggested gene-level differential expression methods. Using the tximport vignette to import transcript level estimates, is it okay not to sum them to the gene level and use the transcript level estimates for DESeq2 standard workflow? This would be an additional question in the line of my previous question.

I have a 2x2 factorial reproductive-stage drought stress experiment in rice. I have 2 genotypes (NIL-tolerant, Swarna-sensitive) and 2 conditions (Drought and Control). I have 4 biological replicates. We sampled two different tissues; flag leaf and panicle at maximum booting. I'm trying to analyze the data per tissue first.

The codes I used for transcript level differential expression methods are as follows:

tx.all <- tximport(myfiles, type = "salmon", txOut = TRUE, reader = read_tsv)

colData <- data.frame(geno=rep(c("NIL","Swarna"),each=8),condition=rep(rep(c("Control","Drought"),each=4),times=2))

rownames(colData) <- colnames(tx.all$counts)

dds <- DESeqDataSetFromTximport(tx.all, colData, formula(~geno+condition+geno:condition))

dds$group<-factor(paste0(dds$geno, dds$condition))

design(dds) <- ~group

nrow(dds)
dds <- dds[ rowSums(counts(dds)) > 1, ]
nrow(dds)

dds<-DESeq(dds)
> resultsNames(dds)
[1] "Intercept"          "groupNILControl"    "groupNILDrought"    "groupSwarnaControl"
[5] "groupSwarnaDrought"

res<-results(dds, contrast=c("group","NILDrought","SwarnaDrought"))

res

mcols(res, use.names=TRUE)

summary(res)

Please comment of the codes that I used as I'm very new to Bioconductor and RNA-Seq analysis. Do I have to sum them up to the gene level before doing differential expression analysis using the standard DESeq2 workflow?

Please do advise.

Thanks and best regards.

Sincerely,
Asher

 

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readr_1.1.0                tximport_1.2.0             genefilter_1.56.0         
 [4] pheatmap_1.0.8             RColorBrewer_1.1-2         ggplot2_2.2.1             
 [7] gplots_3.0.1               DESeq2_1.14.1              SummarizedExperiment_1.4.0
[10] Biobase_2.34.0             GenomicRanges_1.26.4       GenomeInfoDb_1.10.3       
[13] IRanges_2.8.1              S4Vectors_0.12.1           BiocGenerics_0.20.0       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10         locfit_1.5-9.1       lattice_0.20-34      gtools_3.5.0        
 [5] assertthat_0.1       digest_0.6.12        R6_2.2.0             plyr_1.8.4          
 [9] backports_1.0.5      acepack_1.4.1        RSQLite_1.1-2        zlibbioc_1.20.0     
[13] lazyeval_0.2.0       data.table_1.10.4    annotate_1.52.1      gdata_2.17.0        
[17] rpart_4.1-10         Matrix_1.2-7.1       checkmate_1.8.2      splines_3.3.2       
[21] BiocParallel_1.8.1   geneplotter_1.52.0   stringr_1.2.0        foreign_0.8-67      
[25] htmlwidgets_0.8      RCurl_1.95-4.8       munsell_0.4.3        base64enc_0.1-3     
[29] htmltools_0.3.5      nnet_7.3-12          tibble_1.2           gridExtra_2.2.1     
[33] htmlTable_1.9        Hmisc_4.0-2          XML_3.98-1.5         bitops_1.0-6        
[37] grid_3.3.2           xtable_1.8-2         gtable_0.2.0         DBI_0.6             
[41] magrittr_1.5         scales_0.4.1         KernSmooth_2.23-15   stringi_1.1.3       
[45] XVector_0.14.1       latticeExtra_0.6-28  Formula_1.2-1        tools_3.3.2         
[49] hms_0.3              survival_2.39-5      AnnotationDbi_1.36.2 colorspace_1.3-2    
[53] cluster_2.0.5        caTools_1.17.1       memoise_1.0.0        knitr_1.15.1       
deseq2 deseqdataset transcripts • 4.1k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Asher,

I have started examining the behavior of DESeq2 at the transcript-level, and while not being designed for this level of analysis, it seems to perform fairly well. We're working on some modifications which should make it perform even better, but it's not far behind the methods designed for transcript-level in my testing. You could also try methods designed explicitly for transcript-level like BitSeq and sleuth. For DESeq2, as with gene-level analysis, I would recommend after running that you examine the plotCounts() for the DE transcripts.

ADD COMMENT
0
Entering edit mode

I can;t honestly figure out how to "examine the plotCounts() for the DE transcripts." I saw in some of the workflows about quickly visualizing the counts for a particular gene using these codes that I tailored based on my dataset:

topGene <- rownames(res)[which.min(res$padj)]
x11(15,15)
plotCounts(dds, gene=topGene, intgroup=c("group"))

I'm not sure if this how to go by examining the plotCounts for DE transcripts.

Please advise.

Thanks and best regards.

Sincerely,

Asher

 

ADD REPLY
0
Entering edit mode

Here 'gene' just looks up the row number or row name of the dds. So even if you had transcripts, you could put the row number or transcript name there and it would plot the counts. I could have named that argument 'row' and it would be more accurate, but it's too late to change (would break user's code).

ADD REPLY
0
Entering edit mode

This is a different topic to ask. 

First, do you need at least seven replicates in order to invoke the "minReplicatesForReplace" parameter in dds or when replacing the outlier value using "ddsClean <- replaceOutliersWithTrimmedMean(dds)?"

I have four biological replicates and I'm getting few hundreds of outliers. I'm thinking to replace the outliers but I'm not sure if that's allowed as my replication is less than 7.

Lastly, as DESeq2 "chooses an optimal threshold that maximizes the number of genes found at a user-specified target FDR," so it is better to have the independentFiltering = TRUE in order to detect more differentially express genes? 

Please advise.

Thank you so much for patiently helping me go through the workflow.

Sincerely,

Asher

ADD REPLY
0
Entering edit mode

minReplicatesForReplace (default of 7) dictates when DESeq() will automatically replace outliers.

I would not recommend outlier replacement with 4 biological replicates.

Yes, independentFiltering=TRUE will give you as many or more DE genes as FALSE. That is how the procedure is defined.

ADD REPLY
0
Entering edit mode

Hi Michael, I was wondering if you have made any progress on testing differential expression at the transcript level with DESeq2.

ADD REPLY
1
Entering edit mode

Nothing that's ready for release beyond the above. It works, but can be made even better, but that work is in progress.

I'm also currently benchmarking differential transcript usage or splicing event methods, like DRIMSeq, SUPPA, IsoformSwitchAnalyzeR, and others.

ADD REPLY

Login before adding your answer.

Traffic: 771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6