Number of IRanges elements in TCGA HiSeqV2 dataset
1
0
Entering edit mode
biomiha ▴ 20
@biomiha-11346
Last seen 3.4 years ago
UK/Cambridge

I am trying to analyse certain aspects of the TCGA RNA seq dataset. I downloaded a RangedSummarizedExperiment file (.Rdata) from the recount2 website: https://jhubiostatistics.shinyapps.io/recount/

I am now trying to filter the subset dataset to include only protein coding regions. If I look at:

rowData(rse_gene)$symbol               I get an IRanges object with 58037 elements corresponding to the transcripts but if I unlist this object it suddenly becomes a vector with length = 58716. Sub-setting is then impossible because I get indexes that are out of bounds.

Can anyone clarify this discrepancy please?

r tcga recount iranges • 1.4k views
ADD COMMENT
2
Entering edit mode
@lcolladotor
Last seen 3 days ago
United States

Hi,

The `symbol` is a CharacterList. Some might have more than one symbol as shown below.

Best,

Leonardo

> library(recount)

> rowData(rse_gene_SRP009615)$symbol
CharacterList of length 58037
[["ENSG00000000003"]] TSPAN6
[["ENSG00000000005"]] TNMD
[["ENSG00000000419"]] DPM1
[["ENSG00000000457"]] SCYL3
[["ENSG00000000460"]] C1orf112
[["ENSG00000000938"]] FGR
[["ENSG00000000971"]] CFH
[["ENSG00000001036"]] FUCA2
[["ENSG00000001084"]] GCLC
[["ENSG00000001167"]] NFYA
...
<58027 more elements>

> table(elementNROWS(rowData(rse_gene_SRP009615)$symbol))

    1     2     3     4     5     6     7     8 
57460   517    44     4     4     3     4     1 

> sum(table(elementNROWS(rowData(rse_gene_SRP009615)$symbol)) * 1:8)
[1] 58716

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] recount_1.2.1              SummarizedExperiment_1.6.3 DelayedArray_0.2.7        
 [4] matrixStats_0.52.2         Biobase_2.36.2             GenomicRanges_1.28.3      
 [7] GenomeInfoDb_1.12.1        IRanges_2.10.2             S4Vectors_0.14.3          
[10] BiocGenerics_0.22.0       

loaded via a namespace (and not attached):
 [1] httr_1.2.1               jsonlite_1.5             splines_3.4.0            foreach_1.4.3           
 [5] GenomicFiles_1.12.0      Formula_1.2-1            bumphunter_1.17.2        latticeExtra_0.6-28     
 [9] doRNG_1.6.6              derfinder_1.10.4         BSgenome_1.44.0          GenomeInfoDbData_0.99.0 
[13] Rsamtools_1.28.0         RSQLite_1.1-2            backports_1.1.0          lattice_0.20-35         
[17] downloader_0.4           digest_0.6.12            RColorBrewer_1.1-2       XVector_0.16.0          
[21] checkmate_1.8.2          qvalue_2.8.0             colorspace_1.3-2         htmltools_0.3.6         
[25] Matrix_1.2-10            plyr_1.8.4               GEOquery_2.42.0          XML_3.98-1.7            
[29] biomaRt_2.32.0           zlibbioc_1.22.0          xtable_1.8-2             scales_0.4.1            
[33] BiocParallel_1.10.1      htmlTable_1.9            tibble_1.3.3             pkgmaker_0.22           
[37] ggplot2_2.2.1            GenomicFeatures_1.28.2   nnet_7.3-12              lazyeval_0.2.0          
[41] survival_2.41-3          magrittr_1.5             memoise_1.1.0            foreign_0.8-68          
[45] tools_3.4.0              registry_0.3             data.table_1.10.4        stringr_1.2.0           
[49] munsell_0.4.3            locfit_1.5-9.1           cluster_2.0.6            rngtools_1.2.4          
[53] AnnotationDbi_1.38.1     Biostrings_2.44.1        compiler_3.4.0           rlang_0.1.1             
[57] grid_3.4.0               RCurl_1.95-4.8           iterators_1.0.8          VariantAnnotation_1.22.1
[61] htmlwidgets_0.8          bitops_1.0-6             base64enc_0.1-3          rentrez_1.1.0           
[65] derfinderHelper_1.10.0   gtable_0.2.0             codetools_0.2-15         DBI_0.6-1               
[69] reshape2_1.4.2           R6_2.2.1                 GenomicAlignments_1.12.1 gridExtra_2.2.1         
[73] knitr_1.16               rtracklayer_1.36.3       Hmisc_4.0-3              stringi_1.1.5           
[77] Rcpp_0.12.11             rpart_4.1-11             acepack_1.4.1           
> 
ADD COMMENT
0
Entering edit mode

Awesome! Thank you sir.

ADD REPLY

Login before adding your answer.

Traffic: 712 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6