Question: Number of IRanges elements in TCGA HiSeqV2 dataset
gravatar for biomiha
16 months ago by
biomiha10 wrote:

I am trying to analyse certain aspects of the TCGA RNA seq dataset. I downloaded a RangedSummarizedExperiment file (.Rdata) from the recount2 website:

I am now trying to filter the subset dataset to include only protein coding regions. If I look at:

rowData(rse_gene)$symbol               I get an IRanges object with 58037 elements corresponding to the transcripts but if I unlist this object it suddenly becomes a vector with length = 58716. Sub-setting is then impossible because I get indexes that are out of bounds.

Can anyone clarify this discrepancy please?

ADD COMMENTlink modified 16 months ago by Leonardo Collado Torres610 • written 16 months ago by biomiha10
gravatar for Leonardo Collado Torres
16 months ago by
United States
Leonardo Collado Torres610 wrote:


The `symbol` is a CharacterList. Some might have more than one symbol as shown below.



> library(recount)

> rowData(rse_gene_SRP009615)$symbol
CharacterList of length 58037
[["ENSG00000000003"]] TSPAN6
[["ENSG00000000005"]] TNMD
[["ENSG00000000419"]] DPM1
[["ENSG00000000457"]] SCYL3
[["ENSG00000000460"]] C1orf112
[["ENSG00000000938"]] FGR
[["ENSG00000000971"]] CFH
[["ENSG00000001036"]] FUCA2
[["ENSG00000001084"]] GCLC
[["ENSG00000001167"]] NFYA
<58027 more elements>

> table(elementNROWS(rowData(rse_gene_SRP009615)$symbol))

    1     2     3     4     5     6     7     8 
57460   517    44     4     4     3     4     1 

> sum(table(elementNROWS(rowData(rse_gene_SRP009615)$symbol)) * 1:8)
[1] 58716

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] recount_1.2.1              SummarizedExperiment_1.6.3 DelayedArray_0.2.7        
 [4] matrixStats_0.52.2         Biobase_2.36.2             GenomicRanges_1.28.3      
 [7] GenomeInfoDb_1.12.1        IRanges_2.10.2             S4Vectors_0.14.3          
[10] BiocGenerics_0.22.0       

loaded via a namespace (and not attached):
 [1] httr_1.2.1               jsonlite_1.5             splines_3.4.0            foreach_1.4.3           
 [5] GenomicFiles_1.12.0      Formula_1.2-1            bumphunter_1.17.2        latticeExtra_0.6-28     
 [9] doRNG_1.6.6              derfinder_1.10.4         BSgenome_1.44.0          GenomeInfoDbData_0.99.0 
[13] Rsamtools_1.28.0         RSQLite_1.1-2            backports_1.1.0          lattice_0.20-35         
[17] downloader_0.4           digest_0.6.12            RColorBrewer_1.1-2       XVector_0.16.0          
[21] checkmate_1.8.2          qvalue_2.8.0             colorspace_1.3-2         htmltools_0.3.6         
[25] Matrix_1.2-10            plyr_1.8.4               GEOquery_2.42.0          XML_3.98-1.7            
[29] biomaRt_2.32.0           zlibbioc_1.22.0          xtable_1.8-2             scales_0.4.1            
[33] BiocParallel_1.10.1      htmlTable_1.9            tibble_1.3.3             pkgmaker_0.22           
[37] ggplot2_2.2.1            GenomicFeatures_1.28.2   nnet_7.3-12              lazyeval_0.2.0          
[41] survival_2.41-3          magrittr_1.5             memoise_1.1.0            foreign_0.8-68          
[45] tools_3.4.0              registry_0.3             data.table_1.10.4        stringr_1.2.0           
[49] munsell_0.4.3            locfit_1.5-9.1           cluster_2.0.6            rngtools_1.2.4          
[53] AnnotationDbi_1.38.1     Biostrings_2.44.1        compiler_3.4.0           rlang_0.1.1             
[57] grid_3.4.0               RCurl_1.95-4.8           iterators_1.0.8          VariantAnnotation_1.22.1
[61] htmlwidgets_0.8          bitops_1.0-6             base64enc_0.1-3          rentrez_1.1.0           
[65] derfinderHelper_1.10.0   gtable_0.2.0             codetools_0.2-15         DBI_0.6-1               
[69] reshape2_1.4.2           R6_2.2.1                 GenomicAlignments_1.12.1 gridExtra_2.2.1         
[73] knitr_1.16               rtracklayer_1.36.3       Hmisc_4.0-3              stringi_1.1.5           
[77] Rcpp_0.12.11             rpart_4.1-11             acepack_1.4.1           
ADD COMMENTlink modified 16 months ago • written 16 months ago by Leonardo Collado Torres610

Awesome! Thank you sir.

ADD REPLYlink written 16 months ago by biomiha10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 209 users visited in the last hour