Number of IRanges elements in TCGA HiSeqV2 dataset
        1 
    
    
    
        
        
        
        
            
                
                
                    
                        
    
    
        
        @biomiha-11346
        
Last seen 25 days ago
        UK/Cambridge
     
 
                    
                
                    
                        I am trying to analyse certain aspects of the TCGA RNA seq dataset. I downloaded a RangedSummarizedExperiment file (.Rdata) from the recount2 website: https://jhubiostatistics.shinyapps.io/recount/ 
I am now trying to filter the subset dataset to include only protein coding regions. If I look at:
rowData(rse_gene)$symbol               I get an IRanges object with 58037 elements corresponding to the transcripts but if I unlist this object it suddenly becomes a vector with length = 58716. Sub-setting is then impossible because I get indexes that are out of bounds.
Can anyone clarify this discrepancy please?
                    
                 
                 
                
                
                    
                    
    
        
        
            r
         
        
    
        
        
            tcga
         
        
    
        
        
            recount
         
        
    
        
        
            iranges
         
        
    
    
        • 1.9k views
    
 
                
                 
                
                
 
             
            
            
         
     
 
     
    
        
            
                
 
    
    
    
    
        
        
        
        
            
                
                
                    
                        
    
    
        
        @lcolladotor
        
Last seen 3 months ago
        United States
     
 
                    
                
                    
                        Hi,
The `symbol` is a CharacterList. Some might have more than one symbol as shown below.
Best,
Leonardo
> library(recount)
> rowData(rse_gene_SRP009615)$symbol
CharacterList of length 58037
[["ENSG00000000003"]] TSPAN6
[["ENSG00000000005"]] TNMD
[["ENSG00000000419"]] DPM1
[["ENSG00000000457"]] SCYL3
[["ENSG00000000460"]] C1orf112
[["ENSG00000000938"]] FGR
[["ENSG00000000971"]] CFH
[["ENSG00000001036"]] FUCA2
[["ENSG00000001084"]] GCLC
[["ENSG00000001167"]] NFYA
...
<58027 more elements>
> table(elementNROWS(rowData(rse_gene_SRP009615)$symbol))
    1     2     3     4     5     6     7     8 
57460   517    44     4     4     3     4     1 
> sum(table(elementNROWS(rowData(rse_gene_SRP009615)$symbol)) * 1:8)
[1] 58716
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] recount_1.2.1              SummarizedExperiment_1.6.3 DelayedArray_0.2.7        
 [4] matrixStats_0.52.2         Biobase_2.36.2             GenomicRanges_1.28.3      
 [7] GenomeInfoDb_1.12.1        IRanges_2.10.2             S4Vectors_0.14.3          
[10] BiocGenerics_0.22.0       
loaded via a namespace (and not attached):
 [1] httr_1.2.1               jsonlite_1.5             splines_3.4.0            foreach_1.4.3           
 [5] GenomicFiles_1.12.0      Formula_1.2-1            bumphunter_1.17.2        latticeExtra_0.6-28     
 [9] doRNG_1.6.6              derfinder_1.10.4         BSgenome_1.44.0          GenomeInfoDbData_0.99.0 
[13] Rsamtools_1.28.0         RSQLite_1.1-2            backports_1.1.0          lattice_0.20-35         
[17] downloader_0.4           digest_0.6.12            RColorBrewer_1.1-2       XVector_0.16.0          
[21] checkmate_1.8.2          qvalue_2.8.0             colorspace_1.3-2         htmltools_0.3.6         
[25] Matrix_1.2-10            plyr_1.8.4               GEOquery_2.42.0          XML_3.98-1.7            
[29] biomaRt_2.32.0           zlibbioc_1.22.0          xtable_1.8-2             scales_0.4.1            
[33] BiocParallel_1.10.1      htmlTable_1.9            tibble_1.3.3             pkgmaker_0.22           
[37] ggplot2_2.2.1            GenomicFeatures_1.28.2   nnet_7.3-12              lazyeval_0.2.0          
[41] survival_2.41-3          magrittr_1.5             memoise_1.1.0            foreign_0.8-68          
[45] tools_3.4.0              registry_0.3             data.table_1.10.4        stringr_1.2.0           
[49] munsell_0.4.3            locfit_1.5-9.1           cluster_2.0.6            rngtools_1.2.4          
[53] AnnotationDbi_1.38.1     Biostrings_2.44.1        compiler_3.4.0           rlang_0.1.1             
[57] grid_3.4.0               RCurl_1.95-4.8           iterators_1.0.8          VariantAnnotation_1.22.1
[61] htmlwidgets_0.8          bitops_1.0-6             base64enc_0.1-3          rentrez_1.1.0           
[65] derfinderHelper_1.10.0   gtable_0.2.0             codetools_0.2-15         DBI_0.6-1               
[69] reshape2_1.4.2           R6_2.2.1                 GenomicAlignments_1.12.1 gridExtra_2.2.1         
[73] knitr_1.16               rtracklayer_1.36.3       Hmisc_4.0-3              stringi_1.1.5           
[77] Rcpp_0.12.11             rpart_4.1-11             acepack_1.4.1           
>  
                    
                 
                 
                
                
                 
                
                
 
             
            
            
         
     
 
         
        
 
    
    
        
            
                 Login  before adding your answer.
         
    
    
         
        
            
        
     
    
    Traffic: 1273 users visited in the last hour
         
    
    
        
    
    
 
Awesome! Thank you sir.