stageR: Error in stageWiseAdjustment
1
1
Entering edit mode
d-joester ▴ 20
@d-joester-19665
Last seen 5.1 years ago

Hi,

I am very new to RNAseq, bioconductor, and R. So far, I have successfully established my workflow for DGE of my data. I used Salmon to generate an index against the reference transcriptome of the purple sea urchin, S. purpuratus. I further used Salmon to create transcript-level count tables, then tximport, and DESeq2 to aggregate and perform gene level analysis. I am now trying to run a DTU analysis,following the published workflow rnaseqDTU.

I was successful in getting DRIMSeq to work:

> d
An object of class dmDStest 
with 3176 genes and 6 samples
* data accessors: counts(), samples()
  design()
  mean_expression(), common_precision(), genewise_precision()
  proportions(), coefficients()
  results()

The following code closely follows the workflow, but removes any rows with NA p-values:

res <- DRIMSeq::results(d)
res.txp <- DRIMSeq::results(d, level="feature")

no.na <- function(x) ifelseis.na(x), 1, x)
res$pvalue <- no.na(res$pvalue)
res.txp$pvalue <- no.na(res.txp$pvalue)

pScreen <- res$pvalue
names(pScreen) <- res$gene_id
pConfirmation <- matrix(res.txp$pvalue, ncol=1)
rownames(pConfirmation) <- res.txp$feature_id
tx2gene <- res.txp[,c("feature_id", "gene_id")]

stageRObj <- stageRTx(pScreen=pScreen, pConfirmation=pConfirmation,
                      pScreenAdjusted=FALSE, tx2gene=tx2gene)

At this point, here is how the various objects look:

> head(pScreen)
WHL22.100887 WHL22.100947 WHL22.101051 WHL22.101602 WHL22.101608 WHL22.101638 
   0.9086200    0.3521248    0.8200199    0.9546744    1.0000000    0.4342966 

>head(pConfirmation)
                     [,1]
WHL22.100887.2  0.9086200
WHL22.100887.0  0.9086200
WHL22.100947.10 0.1502162
WHL22.100947.5  0.3915644
WHL22.100947.1  0.3868770
WHL22.101051.3  0.7987938

> head(tx2gene)
       feature_id      gene_id
1  WHL22.100887.2 WHL22.100887
2  WHL22.100887.0 WHL22.100887
3 WHL22.100947.10 WHL22.100947
4  WHL22.100947.5 WHL22.100947
5  WHL22.100947.1 WHL22.100947
6  WHL22.101051.3 WHL22.101051

> stageRObj
stageRTx object, containing: 
- 3176 screening hypothesis p-values 
- 1 confirmation hypothesis for 7178 transcripts

So far, so good. However, my next step

stageRObj <- stageWiseAdjustment(stageRObj, method="dtu", alpha=0.05)

results in an error:

> stageRObj <- stageWiseAdjustment(stageRObj, method="dtu", alpha=0.05)
Error in `[<-`(`*tmp*`, idCon, 1, value = unlist(txLevelAdjustments)) : 
  subscript out of bounds

> traceback()
4: .stageWiseTest(pScreen = pScreen, pConfirmation = pConfirmation, 
       alpha = alpha, method = method, pScreenAdjusted = pScreenAdjusted, 
       tx2gene = tx2gene, ...)
3: .local(object, method, alpha, ...)
2: stageWiseAdjustment(stageRObj, method = "dtu", alpha = 0.05)
1: stageWiseAdjustment(stageRObj, method = "dtu", alpha = 0.05)

I can't figure out what the problem is. Any advice on how to resolve this issue is appreciated.

Update: Following the alternative workflow using DEXSeq,

sample.data <- DRIMSeq::samples(d)
count.data <- round(as.matrix(counts(d)[,-c(1:2)]))
dxd <- DEXSeqDataSet(countData=count.data,
                     sampleData=sample.data,
                     design=~sample + exon + condition:exon,
                     featureID=counts(d)$feature_id,
                     groupID=counts(d)$gene_id)

dxr <- DEXSeqResults(dxd, independentFiltering=FALSE)
qval <- perGeneQValue(dxr)
dxr.g <- data.frame(gene=names(qval),qval)

columns <- c("featureID","groupID","pvalue")
dxr <- as.data.frame(dxr[,columns])
head(dxr)

pConfirmation <- matrix(dxr$pvalue,ncol=1)
dimnames(pConfirmation) <- list(dxr$featureID,"transcript")
pScreen <- qval
names(pScreen) <- names(pScreen)
tx2gene <- as.data.frame(dxr[,c("featureID", "groupID")])
for (i in 1:2) tx2gene[,i] <- tx2gene[,I]

I arrive at

> head(pConfirmation)
                 transcript
WHL22.100887.2  0.901867665
WHL22.100887.0  0.895193392
WHL22.100947.10 0.005544253
WHL22.100947.5  0.168976435
WHL22.100947.1  0.076028153
WHL22.101051.3  0.560764702

> head(pScreen)
WHL22.100887 WHL22.100947 WHL22.101051 WHL22.101602 WHL22.101608 WHL22.101638 
1.000000e+00 8.381301e-02 1.000000e+00 1.000000e+00 9.301416e-06 1.000000e+00 

> head(tx2gene)
                                   featureID      groupID
WHL22.100887:WHL22.100887.2   WHL22.100887.2 WHL22.100887
WHL22.100887:WHL22.100887.0   WHL22.100887.0 WHL22.100887
WHL22.100947:WHL22.100947.10 WHL22.100947.10 WHL22.100947
WHL22.100947:WHL22.100947.5   WHL22.100947.5 WHL22.100947
WHL22.100947:WHL22.100947.1   WHL22.100947.1 WHL22.100947
WHL22.101051:WHL22.101051.3   WHL22.101051.3 WHL22.101051

From there, I run into the same issue:

> stageRObj <- stageRTx(pScreen=pScreen, pConfirmation=pConfirmation,
+                       pScreenAdjusted=TRUE, tx2gene=tx2gene)

> stageRObj <- stageWiseAdjustment(stageRObj, method="dtu", alpha=0.05)
Error in `[<-`(`*tmp*`, idCon, 1, value = unlist(txLevelAdjustments)) : 
  subscript out of bounds
> traceback()
4: .stageWiseTest(pScreen = pScreen, pConfirmation = pConfirmation, 
       alpha = alpha, method = method, pScreenAdjusted = pScreenAdjusted, 
       tx2gene = tx2gene, ...)
3: .local(object, method, alpha, ...)
2: stageWiseAdjustment(stageRObj, method = "dtu", alpha = 0.05)
1: stageWiseAdjustment(stageRObj, method = "dtu", alpha = 0.05)

Thank you,

Derk

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS  10.14.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Users/derk/anaconda3/envs/salmon/lib/R/lib/libRblas.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] stageR_1.4.0                DRIMSeq_1.10.1             
 [3] GenomicFeatures_1.34.3      AnnotationDbi_1.44.0       
 [5] fission_1.2.0               apeglm_1.4.2               
 [7] PoiClaClu_1.0.2.1           RColorBrewer_1.1-2         
 [9] pheatmap_1.0.12             hexbin_1.27.2              
[11] ggplot2_3.1.0               dplyr_0.7.8                
[13] tximportData_1.10.0         readr_1.3.1                
[15] tximport_1.10.1             DESeq2_1.22.2              
[17] SummarizedExperiment_1.12.0 DelayedArray_0.8.0         
[19] BiocParallel_1.16.5         matrixStats_0.54.0         
[21] Biobase_2.42.0              GenomicRanges_1.34.0       
[23] GenomeInfoDb_1.18.1         IRanges_2.16.0             
[25] S4Vectors_0.20.1            BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-6             bit64_0.9-7              httr_1.4.0              
 [4] progress_1.2.0           numDeriv_2016.8-1        tools_3.5.1             
 [7] backports_1.1.3          utf8_1.1.4               R6_2.3.0                
[10] rpart_4.1-13             Hmisc_4.2-0              DBI_1.0.0               
[13] lazyeval_0.2.1           colorspace_1.4-0         nnet_7.3-12             
[16] withr_2.1.2              prettyunits_1.0.2        tidyselect_0.2.5        
[19] gridExtra_2.3            bit_1.1-14               compiler_3.5.1          
[22] cli_1.0.1                htmlTable_1.13.1         rtracklayer_1.42.1      
[25] labeling_0.3             scales_1.0.0             checkmate_1.9.1         
[28] genefilter_1.64.0        Rsamtools_1.34.0         stringr_1.3.1           
[31] digest_0.6.18            foreign_0.8-71           XVector_0.22.0          
[34] base64enc_0.1-3          pkgconfig_2.0.2          htmltools_0.3.6         
[37] limma_3.38.3             bbmle_1.0.20             htmlwidgets_1.3         
[40] rlang_0.3.1              rstudioapi_0.9.0         RSQLite_2.1.1           
[43] bindr_0.1.1              jsonlite_1.6             acepack_1.4.1           
[46] RCurl_1.95-4.11          magrittr_1.5             GenomeInfoDbData_1.2.0  
[49] Formula_1.2-3            Matrix_1.2-15            Rcpp_1.0.0              
[52] munsell_0.5.0            fansi_0.4.0              edgeR_3.24.3            
[55] stringi_1.2.4            yaml_2.2.0               MASS_7.3-51.1           
[58] zlibbioc_1.28.0          plyr_1.8.4               grid_3.5.1              
[61] blob_1.1.1               crayon_1.3.4             lattice_0.20-38         
[64] Biostrings_2.50.2        splines_3.5.1            annotate_1.60.0         
[67] hms_0.4.2                locfit_1.5-9.1           knitr_1.21              
[70] pillar_1.3.1             biomaRt_2.38.0           geneplotter_1.60.0      
[73] reshape2_1.4.3           XML_3.98-1.16            glue_1.3.0              
[76] latticeExtra_0.6-28      data.table_1.12.0        BiocManager_1.30.4      
[79] gtable_0.2.0             purrr_0.3.0              assertthat_0.2.0        
[82] emdbook_1.3.10           xfun_0.4                 xtable_1.8-3            
[85] coda_0.19-2              survival_2.43-3          tibble_2.0.1            
[88] GenomicAlignments_1.18.1 memoise_1.1.0            bindrcpp_0.2.2          
[91] cluster_2.0.7-1
stageR rnaseqDTU DRIMSeq DEXSeq • 1.6k views
ADD COMMENT
0
Entering edit mode

It turns out I should have paid more attention to the following:

Because of how the stageR package will combine transcript and gene names, we need to strip the gene and transcript version numbers from their Ensembl IDs (this is done by keeping only the first 15 characters of the gene and transcript IDs).

My geneIDs and transcript IDs use "." as a separator, and dont't have a fixed number of characters'. Therefore, I used a sub() and gsub() to replace the offending characters with a "p": old:

pScreen <- res$pvalue
names(pScreen) <- res$gene_id
pConfirmation <- matrix(res.txp$pvalue, ncol=1)
rownames(pConfirmation) <- res.txp$feature_id
tx2gene <- res.txp[,c("feature_id", "gene_id")]

new:

pScreen <- res$pvalue
names(pScreen) <- sub("(\\.)","p",res$gene_id,perl=TRUE) 
pConfirmation <- matrix(res.txp$pvalue, ncol=1)
rownames(pConfirmation) <- gsub("(\\.)","p",res.txp$feature_id,perl=TRUE) 
tx2gene <- res.txp[,c("feature_id", "gene_id")]
for (i in 1:2) tx2gene[,i] <- gsub("(\\.)","p",tx2gene[,i],perl=TRUE)

With this substitution,

stageRObj <- stageRTx(pScreen=pScreen, pConfirmation=pConfirmation,
                      pScreenAdjusted=FALSE, tx2gene=tx2gene)
stageRObj <- stageWiseAdjustment(stageRObj, method="dtu", alpha=0.05)

now works fine.

I would have like to add this as an "answer", but for some reason it is nearly impossible to enter text using the Add Answer function.

ADD REPLY
0
Entering edit mode

Yes, the different annotations have been a problem for stageR, and I apologize for this inconvenience. This should be fixed in the following major upgrade of the package. Let us know, on the BioC support site, if you have any additional questions.

ADD REPLY
0
Entering edit mode

Yes, the different annotations have been a problem for stageR, and I apologize for this inconvenience. This should be fixed in the following major upgrade of the package. Let us know, on the BioC support site, if you have any additional questions.

ADD REPLY
1
Entering edit mode
d-joester ▴ 20
@d-joester-19665
Last seen 5.1 years ago

See answer in comment. "Add Answer" text entry is broken.

ADD COMMENT

Login before adding your answer.

Traffic: 992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6