Hi,
I am very new to RNAseq, bioconductor, and R. So far, I have successfully established my workflow for DGE of my data. I used Salmon to generate an index against the reference transcriptome of the purple sea urchin, S. purpuratus. I further used Salmon to create transcript-level count tables, then tximport, and DESeq2 to aggregate and perform gene level analysis. I am now trying to run a DTU analysis,following the published workflow rnaseqDTU.
I was successful in getting DRIMSeq to work:
> d
An object of class dmDStest
with 3176 genes and 6 samples
* data accessors: counts(), samples()
design()
mean_expression(), common_precision(), genewise_precision()
proportions(), coefficients()
results()
The following code closely follows the workflow, but removes any rows with NA p-values:
res <- DRIMSeq::results(d)
res.txp <- DRIMSeq::results(d, level="feature")
no.na <- function(x) ifelseis.na(x), 1, x)
res$pvalue <- no.na(res$pvalue)
res.txp$pvalue <- no.na(res.txp$pvalue)
pScreen <- res$pvalue
names(pScreen) <- res$gene_id
pConfirmation <- matrix(res.txp$pvalue, ncol=1)
rownames(pConfirmation) <- res.txp$feature_id
tx2gene <- res.txp[,c("feature_id", "gene_id")]
stageRObj <- stageRTx(pScreen=pScreen, pConfirmation=pConfirmation,
pScreenAdjusted=FALSE, tx2gene=tx2gene)
At this point, here is how the various objects look:
> head(pScreen)
WHL22.100887 WHL22.100947 WHL22.101051 WHL22.101602 WHL22.101608 WHL22.101638
0.9086200 0.3521248 0.8200199 0.9546744 1.0000000 0.4342966
>head(pConfirmation)
[,1]
WHL22.100887.2 0.9086200
WHL22.100887.0 0.9086200
WHL22.100947.10 0.1502162
WHL22.100947.5 0.3915644
WHL22.100947.1 0.3868770
WHL22.101051.3 0.7987938
> head(tx2gene)
feature_id gene_id
1 WHL22.100887.2 WHL22.100887
2 WHL22.100887.0 WHL22.100887
3 WHL22.100947.10 WHL22.100947
4 WHL22.100947.5 WHL22.100947
5 WHL22.100947.1 WHL22.100947
6 WHL22.101051.3 WHL22.101051
> stageRObj
stageRTx object, containing:
- 3176 screening hypothesis p-values
- 1 confirmation hypothesis for 7178 transcripts
So far, so good. However, my next step
stageRObj <- stageWiseAdjustment(stageRObj, method="dtu", alpha=0.05)
results in an error:
> stageRObj <- stageWiseAdjustment(stageRObj, method="dtu", alpha=0.05)
Error in `[<-`(`*tmp*`, idCon, 1, value = unlist(txLevelAdjustments)) :
subscript out of bounds
> traceback()
4: .stageWiseTest(pScreen = pScreen, pConfirmation = pConfirmation,
alpha = alpha, method = method, pScreenAdjusted = pScreenAdjusted,
tx2gene = tx2gene, ...)
3: .local(object, method, alpha, ...)
2: stageWiseAdjustment(stageRObj, method = "dtu", alpha = 0.05)
1: stageWiseAdjustment(stageRObj, method = "dtu", alpha = 0.05)
I can't figure out what the problem is. Any advice on how to resolve this issue is appreciated.
Update: Following the alternative workflow using DEXSeq,
sample.data <- DRIMSeq::samples(d)
count.data <- round(as.matrix(counts(d)[,-c(1:2)]))
dxd <- DEXSeqDataSet(countData=count.data,
sampleData=sample.data,
design=~sample + exon + condition:exon,
featureID=counts(d)$feature_id,
groupID=counts(d)$gene_id)
dxr <- DEXSeqResults(dxd, independentFiltering=FALSE)
qval <- perGeneQValue(dxr)
dxr.g <- data.frame(gene=names(qval),qval)
columns <- c("featureID","groupID","pvalue")
dxr <- as.data.frame(dxr[,columns])
head(dxr)
pConfirmation <- matrix(dxr$pvalue,ncol=1)
dimnames(pConfirmation) <- list(dxr$featureID,"transcript")
pScreen <- qval
names(pScreen) <- names(pScreen)
tx2gene <- as.data.frame(dxr[,c("featureID", "groupID")])
for (i in 1:2) tx2gene[,i] <- tx2gene[,I]
I arrive at
> head(pConfirmation)
transcript
WHL22.100887.2 0.901867665
WHL22.100887.0 0.895193392
WHL22.100947.10 0.005544253
WHL22.100947.5 0.168976435
WHL22.100947.1 0.076028153
WHL22.101051.3 0.560764702
> head(pScreen)
WHL22.100887 WHL22.100947 WHL22.101051 WHL22.101602 WHL22.101608 WHL22.101638
1.000000e+00 8.381301e-02 1.000000e+00 1.000000e+00 9.301416e-06 1.000000e+00
> head(tx2gene)
featureID groupID
WHL22.100887:WHL22.100887.2 WHL22.100887.2 WHL22.100887
WHL22.100887:WHL22.100887.0 WHL22.100887.0 WHL22.100887
WHL22.100947:WHL22.100947.10 WHL22.100947.10 WHL22.100947
WHL22.100947:WHL22.100947.5 WHL22.100947.5 WHL22.100947
WHL22.100947:WHL22.100947.1 WHL22.100947.1 WHL22.100947
WHL22.101051:WHL22.101051.3 WHL22.101051.3 WHL22.101051
From there, I run into the same issue:
> stageRObj <- stageRTx(pScreen=pScreen, pConfirmation=pConfirmation,
+ pScreenAdjusted=TRUE, tx2gene=tx2gene)
> stageRObj <- stageWiseAdjustment(stageRObj, method="dtu", alpha=0.05)
Error in `[<-`(`*tmp*`, idCon, 1, value = unlist(txLevelAdjustments)) :
subscript out of bounds
> traceback()
4: .stageWiseTest(pScreen = pScreen, pConfirmation = pConfirmation,
alpha = alpha, method = method, pScreenAdjusted = pScreenAdjusted,
tx2gene = tx2gene, ...)
3: .local(object, method, alpha, ...)
2: stageWiseAdjustment(stageRObj, method = "dtu", alpha = 0.05)
1: stageWiseAdjustment(stageRObj, method = "dtu", alpha = 0.05)
Thank you,
Derk
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS 10.14.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Users/derk/anaconda3/envs/salmon/lib/R/lib/libRblas.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] stageR_1.4.0 DRIMSeq_1.10.1
[3] GenomicFeatures_1.34.3 AnnotationDbi_1.44.0
[5] fission_1.2.0 apeglm_1.4.2
[7] PoiClaClu_1.0.2.1 RColorBrewer_1.1-2
[9] pheatmap_1.0.12 hexbin_1.27.2
[11] ggplot2_3.1.0 dplyr_0.7.8
[13] tximportData_1.10.0 readr_1.3.1
[15] tximport_1.10.1 DESeq2_1.22.2
[17] SummarizedExperiment_1.12.0 DelayedArray_0.8.0
[19] BiocParallel_1.16.5 matrixStats_0.54.0
[21] Biobase_2.42.0 GenomicRanges_1.34.0
[23] GenomeInfoDb_1.18.1 IRanges_2.16.0
[25] S4Vectors_0.20.1 BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 bit64_0.9-7 httr_1.4.0
[4] progress_1.2.0 numDeriv_2016.8-1 tools_3.5.1
[7] backports_1.1.3 utf8_1.1.4 R6_2.3.0
[10] rpart_4.1-13 Hmisc_4.2-0 DBI_1.0.0
[13] lazyeval_0.2.1 colorspace_1.4-0 nnet_7.3-12
[16] withr_2.1.2 prettyunits_1.0.2 tidyselect_0.2.5
[19] gridExtra_2.3 bit_1.1-14 compiler_3.5.1
[22] cli_1.0.1 htmlTable_1.13.1 rtracklayer_1.42.1
[25] labeling_0.3 scales_1.0.0 checkmate_1.9.1
[28] genefilter_1.64.0 Rsamtools_1.34.0 stringr_1.3.1
[31] digest_0.6.18 foreign_0.8-71 XVector_0.22.0
[34] base64enc_0.1-3 pkgconfig_2.0.2 htmltools_0.3.6
[37] limma_3.38.3 bbmle_1.0.20 htmlwidgets_1.3
[40] rlang_0.3.1 rstudioapi_0.9.0 RSQLite_2.1.1
[43] bindr_0.1.1 jsonlite_1.6 acepack_1.4.1
[46] RCurl_1.95-4.11 magrittr_1.5 GenomeInfoDbData_1.2.0
[49] Formula_1.2-3 Matrix_1.2-15 Rcpp_1.0.0
[52] munsell_0.5.0 fansi_0.4.0 edgeR_3.24.3
[55] stringi_1.2.4 yaml_2.2.0 MASS_7.3-51.1
[58] zlibbioc_1.28.0 plyr_1.8.4 grid_3.5.1
[61] blob_1.1.1 crayon_1.3.4 lattice_0.20-38
[64] Biostrings_2.50.2 splines_3.5.1 annotate_1.60.0
[67] hms_0.4.2 locfit_1.5-9.1 knitr_1.21
[70] pillar_1.3.1 biomaRt_2.38.0 geneplotter_1.60.0
[73] reshape2_1.4.3 XML_3.98-1.16 glue_1.3.0
[76] latticeExtra_0.6-28 data.table_1.12.0 BiocManager_1.30.4
[79] gtable_0.2.0 purrr_0.3.0 assertthat_0.2.0
[82] emdbook_1.3.10 xfun_0.4 xtable_1.8-3
[85] coda_0.19-2 survival_2.43-3 tibble_2.0.1
[88] GenomicAlignments_1.18.1 memoise_1.1.0 bindrcpp_0.2.2
[91] cluster_2.0.7-1
It turns out I should have paid more attention to the following:
My geneIDs and transcript IDs use "." as a separator, and dont't have a fixed number of characters'. Therefore, I used a sub() and gsub() to replace the offending characters with a "p": old:
new:
With this substitution,
now works fine.
I would have like to add this as an "answer", but for some reason it is nearly impossible to enter text using the Add Answer function.
Yes, the different annotations have been a problem for
stageR
, and I apologize for this inconvenience. This should be fixed in the following major upgrade of the package. Let us know, on the BioC support site, if you have any additional questions.Yes, the different annotations have been a problem for
stageR
, and I apologize for this inconvenience. This should be fixed in the following major upgrade of the package. Let us know, on the BioC support site, if you have any additional questions.