Search
Question: Number of IRanges elements in TCGA HiSeqV2 dataset
0
gravatar for biomiha
5 months ago by
biomiha10
UK/Cambridge
biomiha10 wrote:

I am trying to analyse certain aspects of the TCGA RNA seq dataset. I downloaded a RangedSummarizedExperiment file (.Rdata) from the recount2 website: https://jhubiostatistics.shinyapps.io/recount/

I am now trying to filter the subset dataset to include only protein coding regions. If I look at:

rowData(rse_gene)$symbol               I get an IRanges object with 58037 elements corresponding to the transcripts but if I unlist this object it suddenly becomes a vector with length = 58716. Sub-setting is then impossible because I get indexes that are out of bounds.

Can anyone clarify this discrepancy please?

ADD COMMENTlink modified 5 months ago by Leonardo Collado Torres540 • written 5 months ago by biomiha10
2
gravatar for Leonardo Collado Torres
5 months ago by
United States
Leonardo Collado Torres540 wrote:

Hi,

The `symbol` is a CharacterList. Some might have more than one symbol as shown below.

Best,

Leonardo

> library(recount)

> rowData(rse_gene_SRP009615)$symbol
CharacterList of length 58037
[["ENSG00000000003"]] TSPAN6
[["ENSG00000000005"]] TNMD
[["ENSG00000000419"]] DPM1
[["ENSG00000000457"]] SCYL3
[["ENSG00000000460"]] C1orf112
[["ENSG00000000938"]] FGR
[["ENSG00000000971"]] CFH
[["ENSG00000001036"]] FUCA2
[["ENSG00000001084"]] GCLC
[["ENSG00000001167"]] NFYA
...
<58027 more elements>

> table(elementNROWS(rowData(rse_gene_SRP009615)$symbol))

    1     2     3     4     5     6     7     8 
57460   517    44     4     4     3     4     1 

> sum(table(elementNROWS(rowData(rse_gene_SRP009615)$symbol)) * 1:8)
[1] 58716

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] recount_1.2.1              SummarizedExperiment_1.6.3 DelayedArray_0.2.7        
 [4] matrixStats_0.52.2         Biobase_2.36.2             GenomicRanges_1.28.3      
 [7] GenomeInfoDb_1.12.1        IRanges_2.10.2             S4Vectors_0.14.3          
[10] BiocGenerics_0.22.0       

loaded via a namespace (and not attached):
 [1] httr_1.2.1               jsonlite_1.5             splines_3.4.0            foreach_1.4.3           
 [5] GenomicFiles_1.12.0      Formula_1.2-1            bumphunter_1.17.2        latticeExtra_0.6-28     
 [9] doRNG_1.6.6              derfinder_1.10.4         BSgenome_1.44.0          GenomeInfoDbData_0.99.0 
[13] Rsamtools_1.28.0         RSQLite_1.1-2            backports_1.1.0          lattice_0.20-35         
[17] downloader_0.4           digest_0.6.12            RColorBrewer_1.1-2       XVector_0.16.0          
[21] checkmate_1.8.2          qvalue_2.8.0             colorspace_1.3-2         htmltools_0.3.6         
[25] Matrix_1.2-10            plyr_1.8.4               GEOquery_2.42.0          XML_3.98-1.7            
[29] biomaRt_2.32.0           zlibbioc_1.22.0          xtable_1.8-2             scales_0.4.1            
[33] BiocParallel_1.10.1      htmlTable_1.9            tibble_1.3.3             pkgmaker_0.22           
[37] ggplot2_2.2.1            GenomicFeatures_1.28.2   nnet_7.3-12              lazyeval_0.2.0          
[41] survival_2.41-3          magrittr_1.5             memoise_1.1.0            foreign_0.8-68          
[45] tools_3.4.0              registry_0.3             data.table_1.10.4        stringr_1.2.0           
[49] munsell_0.4.3            locfit_1.5-9.1           cluster_2.0.6            rngtools_1.2.4          
[53] AnnotationDbi_1.38.1     Biostrings_2.44.1        compiler_3.4.0           rlang_0.1.1             
[57] grid_3.4.0               RCurl_1.95-4.8           iterators_1.0.8          VariantAnnotation_1.22.1
[61] htmlwidgets_0.8          bitops_1.0-6             base64enc_0.1-3          rentrez_1.1.0           
[65] derfinderHelper_1.10.0   gtable_0.2.0             codetools_0.2-15         DBI_0.6-1               
[69] reshape2_1.4.2           R6_2.2.1                 GenomicAlignments_1.12.1 gridExtra_2.2.1         
[73] knitr_1.16               rtracklayer_1.36.3       Hmisc_4.0-3              stringi_1.1.5           
[77] Rcpp_0.12.11             rpart_4.1-11             acepack_1.4.1           
> 
ADD COMMENTlink modified 5 months ago • written 5 months ago by Leonardo Collado Torres540

Awesome! Thank you sir.

ADD REPLYlink written 5 months ago by biomiha10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 112 users visited in the last hour