duplicate exonrank information in library("Homo.sapiens")
1
0
Entering edit mode
biojl • 0
@biojl-25058
Last seen 3.0 years ago
Spain

Hi, I am trying to extract some basic information using the library Homo.sapiens. One of the variables I am trying to obtain is EXONRANK. When I select a particular transcript (only one shown for simplicity) I obtain different rows because the column EXONRANK has different values for the same transcript (REFSEQ) which I do not understand. Is this working as intended? Is it something obvious that I am missing?

Thanks in advance


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 
library("Homo.sapiens")
library(tidyr)
library(dplyr) #Load here so it does not interfere with the other select function
keys="NM_000341"

#Extract the relevant information from the database
raw_data <- AnnotationDbi::select(Homo.sapiens, keys=keys, columns=c("EXONCHROM","SYMBOL","REFSEQ",
"EXONRANK", "EXONSTART","EXONEND", "EXONSTRAND"), keytype="REFSEQ")

raw_data
      REFSEQ SYMBOL EXONCHROM EXONSTRAND EXONSTART  EXONEND EXONRANK
1  NM_000341 SLC3A1      chr2          +  44502597 44503104        1
2  NM_000341 SLC3A1      chr2          +  44507855 44508034        2
3  NM_000341 SLC3A1      chr2          +  44508526 44508680        3
4  NM_000341 SLC3A1      chr2          +  44513171 44513296        4
5  NM_000341 SLC3A1      chr2          +  44527110 44527229        5
6  NM_000341 SLC3A1      chr2          +  44528142 44528556        6
7  NM_000341 SLC3A1      chr2          +  44528142 44528266        6
8  NM_000341 SLC3A1      chr2          +  44531282 44531477        7
9  NM_000341 SLC3A1      chr2          +  44539725 44539929        8
10 NM_000341 SLC3A1      chr2          +  44539725 44539892        8
11 NM_000341 SLC3A1      chr2          +  44540974 44542382        9
12 NM_000341 SLC3A1      chr2          +  44540974 44541090        9
13 NM_000341 SLC3A1      chr2          +  44545257 44545894       10
14 NM_000341 SLC3A1      chr2          +  44547338 44547962       10
15 NM_000341 SLC3A1      chr2          +  44512222 44513296        1
16 NM_000341 SLC3A1      chr2          +  44527110 44527229        2
17 NM_000341 SLC3A1      chr2          +  44528142 44528266        3
18 NM_000341 SLC3A1      chr2          +  44531282 44531477        4
19 NM_000341 SLC3A1      chr2          +  44539725 44539892        5
20 NM_000341 SLC3A1      chr2          +  44540974 44541090        6
21 NM_000341 SLC3A1      chr2          +  44547338 44547962        7
22 NM_000341 SLC3A1      chr2          +  44530945 44531477        1
23 NM_000341 SLC3A1      chr2          +  44539725 44539892        2
24 NM_000341 SLC3A1      chr2          +  44540974 44541090        3
25 NM_000341 SLC3A1      chr2          +  44547338 44547962        4




sessionInfo( )

sessionInfo( )
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] dplyr_1.0.5                            
 [2] tidyr_1.1.2                            
 [3] Homo.sapiens_1.3.1                     
 [4] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [5] org.Hs.eg.db_3.10.0                    
 [6] GO.db_3.10.0                           
 [7] OrganismDbi_1.28.0                     
 [8] GenomicFeatures_1.38.2                 
 [9] GenomicRanges_1.38.0                   
[10] GenomeInfoDb_1.22.1                    
[11] AnnotationDbi_1.48.0                   
[12] IRanges_2.20.2                         
[13] S4Vectors_0.24.4                       
[14] Biobase_2.46.0                         
[15] BiocGenerics_0.32.0                    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6                  lattice_0.20-41            
 [3] prettyunits_1.1.1           Rsamtools_2.2.3            
 [5] Biostrings_2.54.0           assertthat_0.2.1           
 [7] utf8_1.1.4                  BiocFileCache_1.10.2       
 [9] R6_2.5.0                    RSQLite_2.2.4              
[11] httr_1.4.2                  pillar_1.5.0               
[13] zlibbioc_1.32.0             rlang_0.4.10               
[15] progress_1.2.2              curl_4.2                   
[17] blob_1.2.1                  Matrix_1.3-2               
[19] BiocParallel_1.20.1         stringr_1.4.0              
[21] RCurl_1.98-1.3              bit_4.0.4                  
[23] biomaRt_2.42.1              DelayedArray_0.12.3        
[25] compiler_3.6.3              rtracklayer_1.46.0         
[27] pkgconfig_2.0.3             askpass_1.1                
[29] openssl_1.4.3               tidyselect_1.1.0           
[31] SummarizedExperiment_1.16.1 tibble_3.1.0               
[33] GenomeInfoDbData_1.2.2      matrixStats_0.58.0         
[35] XML_3.99-0.3                fansi_0.4.2                
[37] crayon_1.4.1                dbplyr_2.1.0               
[39] GenomicAlignments_1.22.1    bitops_1.0-6               
[41] rappdirs_0.3.3              RBGL_1.62.1                
[43] grid_3.6.3                  lifecycle_1.0.0            
[45] DBI_1.1.1                   magrittr_2.0.1             
[47] graph_1.64.0                stringi_1.5.3              
[49] cachem_1.0.4                XVector_0.26.0             
[51] ellipsis_0.3.1              generics_0.1.0             
[53] vctrs_0.3.6                 tools_3.6.3                
[55] bit64_4.0.5                 glue_1.4.2                 
[57] purrr_0.3.4                 hms_1.0.0                  
[59] fastmap_1.1.0               BiocManager_1.30.10        
[61] memoise_2.0.0
Homo.sapiens • 963 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 6 hours ago
United States

I think you are assuming that the underlying TxDb object uses RefSeq IDs for the transcripts, which is incorrect. Instead, for hg19, UCSC used some random transcript IDs that they seem to have made up themselves. So the exons are ranked within the UCSC transcripts, not the RefSeq transcripts:

> AnnotationDbi::select(Homo.sapiens, keys=keys, columns=c("EXONCHROM","SYMBOL","REFSEQ",
"EXONRANK", "EXONSTART","EXONEND", "EXONSTRAND","TXNAME"), keytype="REFSEQ")
'select()' returned 1:many mapping between keys and columns
      REFSEQ SYMBOL EXONCHROM EXONSTRAND EXONSTART  EXONEND EXONRANK     TXNAME
1  NM_000341 SLC3A1      chr2          +  44502597 44503104        1 uc002rty.3
2  NM_000341 SLC3A1      chr2          +  44507855 44508034        2 uc002rty.3
3  NM_000341 SLC3A1      chr2          +  44508526 44508680        3 uc002rty.3
4  NM_000341 SLC3A1      chr2          +  44513171 44513296        4 uc002rty.3
5  NM_000341 SLC3A1      chr2          +  44527110 44527229        5 uc002rty.3
6  NM_000341 SLC3A1      chr2          +  44528142 44528556        6 uc002rty.3
7  NM_000341 SLC3A1      chr2          +  44502597 44503104        1 uc002rtz.2
8  NM_000341 SLC3A1      chr2          +  44507855 44508034        2 uc002rtz.2
9  NM_000341 SLC3A1      chr2          +  44508526 44508680        3 uc002rtz.2
10 NM_000341 SLC3A1      chr2          +  44513171 44513296        4 uc002rtz.2
11 NM_000341 SLC3A1      chr2          +  44527110 44527229        5 uc002rtz.2
12 NM_000341 SLC3A1      chr2          +  44528142 44528266        6 uc002rtz.2
13 NM_000341 SLC3A1      chr2          +  44531282 44531477        7 uc002rtz.2
14 NM_000341 SLC3A1      chr2          +  44539725 44539929        8 uc002rtz.2
15 NM_000341 SLC3A1      chr2          +  44502597 44503104        1 uc002rua.3
16 NM_000341 SLC3A1      chr2          +  44507855 44508034        2 uc002rua.3
17 NM_000341 SLC3A1      chr2          +  44508526 44508680        3 uc002rua.3
18 NM_000341 SLC3A1      chr2          +  44513171 44513296        4 uc002rua.3
19 NM_000341 SLC3A1      chr2          +  44527110 44527229        5 uc002rua.3
20 NM_000341 SLC3A1      chr2          +  44528142 44528266        6 uc002rua.3
21 NM_000341 SLC3A1      chr2          +  44531282 44531477        7 uc002rua.3
22 NM_000341 SLC3A1      chr2          +  44539725 44539892        8 uc002rua.3
23 NM_000341 SLC3A1      chr2          +  44540974 44542382        9 uc002rua.3
24 NM_000341 SLC3A1      chr2          +  44502597 44503104        1 uc002rub.2
25 NM_000341 SLC3A1      chr2          +  44507855 44508034        2 uc002rub.2
26 NM_000341 SLC3A1      chr2          +  44508526 44508680        3 uc002rub.2
27 NM_000341 SLC3A1      chr2          +  44513171 44513296        4 uc002rub.2
28 NM_000341 SLC3A1      chr2          +  44527110 44527229        5 uc002rub.2
29 NM_000341 SLC3A1      chr2          +  44528142 44528266        6 uc002rub.2
30 NM_000341 SLC3A1      chr2          +  44531282 44531477        7 uc002rub.2
31 NM_000341 SLC3A1      chr2          +  44539725 44539892        8 uc002rub.2
32 NM_000341 SLC3A1      chr2          +  44540974 44541090        9 uc002rub.2
33 NM_000341 SLC3A1      chr2          +  44545257 44545894       10 uc002rub.2
34 NM_000341 SLC3A1      chr2          +  44502597 44503104        1 uc002ruc.4
35 NM_000341 SLC3A1      chr2          +  44507855 44508034        2 uc002ruc.4
36 NM_000341 SLC3A1      chr2          +  44508526 44508680        3 uc002ruc.4
37 NM_000341 SLC3A1      chr2          +  44513171 44513296        4 uc002ruc.4
38 NM_000341 SLC3A1      chr2          +  44527110 44527229        5 uc002ruc.4
39 NM_000341 SLC3A1      chr2          +  44528142 44528266        6 uc002ruc.4
40 NM_000341 SLC3A1      chr2          +  44531282 44531477        7 uc002ruc.4
41 NM_000341 SLC3A1      chr2          +  44539725 44539892        8 uc002ruc.4
42 NM_000341 SLC3A1      chr2          +  44540974 44541090        9 uc002ruc.4
43 NM_000341 SLC3A1      chr2          +  44547338 44547962       10 uc002ruc.4
44 NM_000341 SLC3A1      chr2          +  44512222 44513296        1 uc002rud.4
45 NM_000341 SLC3A1      chr2          +  44527110 44527229        2 uc002rud.4
46 NM_000341 SLC3A1      chr2          +  44528142 44528266        3 uc002rud.4
47 NM_000341 SLC3A1      chr2          +  44531282 44531477        4 uc002rud.4
48 NM_000341 SLC3A1      chr2          +  44539725 44539892        5 uc002rud.4
49 NM_000341 SLC3A1      chr2          +  44540974 44541090        6 uc002rud.4
50 NM_000341 SLC3A1      chr2          +  44547338 44547962        7 uc002rud.4
51 NM_000341 SLC3A1      chr2          +  44530945 44531477        1 uc002rue.4
52 NM_000341 SLC3A1      chr2          +  44539725 44539892        2 uc002rue.4
53 NM_000341 SLC3A1      chr2          +  44540974 44541090        3 uc002rue.4
54 NM_000341 SLC3A1      chr2          +  44547338 44547962        4 uc002rue.4

If you want to use RefSeq, you need a different TxDb package

## I did this in steps, but you can do it in one shot using makeTxDbPackageFromUCSC()
> z <- makeTxDbFromUCSC(tablename = "refGene")
Download the refGene table ... OK
Download the hgFixed.refLink table ... OK
Extract the 'transcripts' data frame ... OK
Extract the 'splicings' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> makeTxDbPackage(z, "0.01", "me <me@mine.org>", "me", ".", "Artistic-2.0")
Creating package in ./TxDb.Hsapiens.UCSC.hg19.refGene 
> install.packages("TxDb.Hsapiens.UCSC.hg19.refGene/", repos = NULL, type = "source")
Installing package into 'C:/Users/jmacdon/AppData/Roaming/R/win-library/4.0'
(as 'lib' is unspecified)
* installing *source* package 'TxDb.Hsapiens.UCSC.hg19.refGene' ...
** using staged installation
** R
<snip>

## Now that we have the new TxDb, we can put it into the Homo.sapiens package:

> library(TxDb.Hsapiens.UCSC.hg19.refGene)
> TxDb(Homo.sapiens) <- TxDb.Hsapiens.UCSC.hg19.refGene

## and do the query again

> AnnotationDbi::select(Homo.sapiens, keys=keys, columns=c("EXONCHROM","SYMBOL","REFSEQ",
"EXONRANK", "EXONSTART","EXONEND", "EXONSTRAND","TXNAME"), keytype="REFSEQ")
'select()' returned 1:many mapping between keys and columns
      REFSEQ SYMBOL EXONCHROM EXONSTRAND EXONSTART  EXONEND EXONRANK    TXNAME
1  NM_000341 SLC3A1      chr2          +  44502619 44503104        1 NM_000341
2  NM_000341 SLC3A1      chr2          +  44507855 44508034        2 NM_000341
3  NM_000341 SLC3A1      chr2          +  44508526 44508680        3 NM_000341
4  NM_000341 SLC3A1      chr2          +  44513171 44513296        4 NM_000341
5  NM_000341 SLC3A1      chr2          +  44527110 44527229        5 NM_000341
6  NM_000341 SLC3A1      chr2          +  44528142 44528266        6 NM_000341
7  NM_000341 SLC3A1      chr2          +  44531282 44531477        7 NM_000341
8  NM_000341 SLC3A1      chr2          +  44539725 44539892        8 NM_000341
9  NM_000341 SLC3A1      chr2          +  44540974 44541090        9 NM_000341
10 NM_000341 SLC3A1      chr2          +  44547338 44548631       10 NM_000341
ADD COMMENT
0
Entering edit mode

I see! Thanks for the information :)

I am trying to implement your code in hg38 succesfully building and using TxDb.Hsapiens.UCSC.hg38.refGene. However I think that the Homo.sapiens package is only supporting hg19, therefore this line here might not be working:

keys="NM_004937"
TxDb(Homo.sapiens) <- TxDb.Hsapiens.UCSC.hg38.refGene

#Extract the relevant information from the database
raw_data <- AnnotationDbi::select(Homo.sapiens, keys=keys, columns=c("EXONCHROM","SYMBOL","REFSEQ",
"EXONRANK", "EXONSTART","EXONEND", "EXONSTRAND","TXNAME"), keytype="REFSEQ")

TXNAME and REFSEQ do not coincide for some genes (like "CTNS"). Is there a way to use Homo.sapiens in hg38?

raw_data
      REFSEQ SYMBOL EXONCHROM EXONSTRAND EXONSTART EXONEND EXONRANK
1  NM_004937   CTNS     chr17          +   3636468 3636708        1
2  NM_004937   CTNS     chr17          +   3637107 3637316        2
3  NM_004937   CTNS     chr17          +   3640188 3640267        3
4  NM_004937   CTNS     chr17          +   3647444 3647522        4
5  NM_004937   CTNS     chr17          +   3648847 3648931        5
6  NM_004937   CTNS     chr17          +   3654998 3655101        6
7  NM_004937   CTNS     chr17          +   3655221 3655352        7
8  NM_004937   CTNS     chr17          +   3656487 3656586        8
9  NM_004937   CTNS     chr17          +   3656676 3656795        9
10 NM_004937   CTNS     chr17          +   3658005 3658175       10
11 NM_004937   CTNS     chr17          +   3659858 3659975       11
12 NM_004937   CTNS     chr17          +   3660236 3660350       12
13 NM_004937   CTNS     chr17          +   3660617 3663103       13
14 NM_004937   CTNS     chr17          +   3636760 3636831        1
15 NM_004937   CTNS     chr17          +   3637107 3637316        2
16 NM_004937   CTNS     chr17          +   3640188 3640267        3
17 NM_004937   CTNS     chr17          +   3647444 3647522        4
18 NM_004937   CTNS     chr17          +   3648847 3648931        5
19 NM_004937   CTNS     chr17          +   3654998 3655101        6
20 NM_004937   CTNS     chr17          +   3655221 3655352        7
21 NM_004937   CTNS     chr17          +   3656487 3656586        8
22 NM_004937   CTNS     chr17          +   3656676 3656795        9
23 NM_004937   CTNS     chr17          +   3658005 3658175       10
24 NM_004937   CTNS     chr17          +   3659858 3659975       11
25 NM_004937   CTNS     chr17          +   3660236 3660350       12
26 NM_004937   CTNS     chr17          +   3660617 3663103       13
27 NM_004937   CTNS     chr17          +   3636760 3636831        1
28 NM_004937   CTNS     chr17          +   3637107 3637316        2
29 NM_004937   CTNS     chr17          +   3640188 3640267        3
30 NM_004937   CTNS     chr17          +   3647444 3647522        4
31 NM_004937   CTNS     chr17          +   3654998 3655101        5
32 NM_004937   CTNS     chr17          +   3655221 3655352        6
33 NM_004937   CTNS     chr17          +   3656487 3656586        7
34 NM_004937   CTNS     chr17          +   3656676 3656795        8
35 NM_004937   CTNS     chr17          +   3658005 3658175        9
36 NM_004937   CTNS     chr17          +   3659858 3659975       10
37 NM_004937   CTNS     chr17          +   3660236 3663103       11
38 NM_004937   CTNS     chr17          +   3636760 3636831        1
39 NM_004937   CTNS     chr17          +   3637107 3637316        2
40 NM_004937   CTNS     chr17          +   3647444 3647522        3
41 NM_004937   CTNS     chr17          +   3648847 3648931        4
42 NM_004937   CTNS     chr17          +   3654998 3655101        5
43 NM_004937   CTNS     chr17          +   3655221 3655352        6
44 NM_004937   CTNS     chr17          +   3656487 3656586        7
45 NM_004937   CTNS     chr17          +   3656676 3656795        8
46 NM_004937   CTNS     chr17          +   3658005 3658175        9
47 NM_004937   CTNS     chr17          +   3659858 3659975       10
48 NM_004937   CTNS     chr17          +   3660236 3663103       11
49 NM_004937   CTNS     chr17          +   3636760 3636831        1
50 NM_004937   CTNS     chr17          +   3637107 3637316        2
51 NM_004937   CTNS     chr17          +   3640188 3640267        3
52 NM_004937   CTNS     chr17          +   3654998 3655101        4
53 NM_004937   CTNS     chr17          +   3655221 3655352        5
54 NM_004937   CTNS     chr17          +   3656487 3656586        6
55 NM_004937   CTNS     chr17          +   3656676 3656795        7
56 NM_004937   CTNS     chr17          +   3658005 3658175        8
57 NM_004937   CTNS     chr17          +   3659858 3659975        9
58 NM_004937   CTNS     chr17          +   3660236 3663103       10
59 NM_004937   CTNS     chr17          +   3636760 3636831        1
60 NM_004937   CTNS     chr17          +   3637107 3637316        2
61 NM_004937   CTNS     chr17          +   3647444 3647522        3
62 NM_004937   CTNS     chr17          +   3654998 3655101        4
63 NM_004937   CTNS     chr17          +   3655221 3655352        5
64 NM_004937   CTNS     chr17          +   3656487 3656586        6
65 NM_004937   CTNS     chr17          +   3656676 3656795        7
66 NM_004937   CTNS     chr17          +   3658005 3658175        8
67 NM_004937   CTNS     chr17          +   3659858 3659975        9
68 NM_004937   CTNS     chr17          +   3660236 3663103       10
69 NM_004937   CTNS     chr17          +   3636760 3636831        1
70 NM_004937   CTNS     chr17          +   3637107 3637316        2
71 NM_004937   CTNS     chr17          +   3640188 3640267        3
72 NM_004937   CTNS     chr17          +   3647444 3647522        4
73 NM_004937   CTNS     chr17          +   3648847 3648931        5
74 NM_004937   CTNS     chr17          +   3654998 3655101        6
75 NM_004937   CTNS     chr17          +   3655221 3655352        7
76 NM_004937   CTNS     chr17          +   3656487 3656586        8
77 NM_004937   CTNS     chr17          +   3656676 3656795        9
78 NM_004937   CTNS     chr17          +   3658005 3658175       10
79 NM_004937   CTNS     chr17          +   3659858 3659975       11
80 NM_004937   CTNS     chr17          +   3660236 3663103       12
         TXNAME
1  NM_001031681
2  NM_001031681
3  NM_001031681
4  NM_001031681
5  NM_001031681
6  NM_001031681
7  NM_001031681
8  NM_001031681
9  NM_001031681
10 NM_001031681
11 NM_001031681
12 NM_001031681
13 NM_001031681
14 NM_001374492
15 NM_001374492
16 NM_001374492
17 NM_001374492
18 NM_001374492
19 NM_001374492
20 NM_001374492
21 NM_001374492
22 NM_001374492
23 NM_001374492
24 NM_001374492
25 NM_001374492
26 NM_001374492
27 NM_001374493
28 NM_001374493
29 NM_001374493
30 NM_001374493
31 NM_001374493
32 NM_001374493
33 NM_001374493
34 NM_001374493
35 NM_001374493
36 NM_001374493
37 NM_001374493
38 NM_001374494
39 NM_001374494
40 NM_001374494
41 NM_001374494
42 NM_001374494
43 NM_001374494
44 NM_001374494
45 NM_001374494
46 NM_001374494
47 NM_001374494
48 NM_001374494
49 NM_001374495
50 NM_001374495
51 NM_001374495
52 NM_001374495
53 NM_001374495
54 NM_001374495
55 NM_001374495
56 NM_001374495
57 NM_001374495
58 NM_001374495
59 NM_001374496
60 NM_001374496
61 NM_001374496
62 NM_001374496
63 NM_001374496
64 NM_001374496
65 NM_001374496
66 NM_001374496
67 NM_001374496
68 NM_001374496
69    NM_004937
70    NM_004937
71    NM_004937
72    NM_004937
73    NM_004937
74    NM_004937
75    NM_004937
76    NM_004937
77    NM_004937
78    NM_004937
79    NM_004937
80    NM_004937
ADD REPLY
0
Entering edit mode

I found a solution using biomaRt library, however I would like to understand how to make it using TxDb

ADD REPLY
1
Entering edit mode

The main issue here is that you are using select, which is a valid thing to do, but if you have multiple columns you end up getting back more than you might have expected. An alternative is to use <del>transcriptsBy</del> exonsBy instead.

> z <- exonsBy(Homo.sapiens, "tx", use.names = TRUE)
Warning message:
In .set_group_names(grl, use.names, txdb, by) :
  some group names are NAs or duplicated
> z
GRangesList object of length 88816:
$NR_046018
GRanges object with 3 ranges and 3 metadata columns:
      seqnames      ranges strand |   exon_id   exon_name exon_rank
         <Rle>   <IRanges>  <Rle> | <integer> <character> <integer>
  [1]     chr1 11874-12227      + |         1        <NA>         1
  [2]     chr1 12613-12721      + |         2        <NA>         2
  [3]     chr1 13221-14409      + |         3        <NA>         3
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

$NR_036051
GRanges object with 1 range and 3 metadata columns:
      seqnames      ranges strand |   exon_id   exon_name exon_rank
         <Rle>   <IRanges>  <Rle> | <integer> <character> <integer>
  [1]     chr1 30366-30503      + |         4        <NA>         1
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

$NR_036266
GRanges object with 1 range and 3 metadata columns:
      seqnames      ranges strand |   exon_id   exon_name exon_rank
         <Rle>   <IRanges>  <Rle> | <integer> <character> <integer>
  [1]     chr1 30366-30503      + |         4        <NA>         1
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

...
<88813 more elements>

> z[['NM_004937']]
GRanges object with 12 ranges and 3 metadata columns:
       seqnames          ranges strand |   exon_id   exon_name exon_rank
          <Rle>       <IRanges>  <Rle> | <integer> <character> <integer>
   [1]    chr17 3636760-3636831      + |    211669        <NA>         1
   [2]    chr17 3637107-3637316      + |    211670        <NA>         2
   [3]    chr17 3640188-3640267      + |    211671        <NA>         3
   [4]    chr17 3647444-3647522      + |    211672        <NA>         4
   [5]    chr17 3648847-3648931      + |    211673        <NA>         5
   ...      ...             ...    ... .       ...         ...       ...
   [8]    chr17 3656487-3656586      + |    211676        <NA>         8
   [9]    chr17 3656676-3656795      + |    211677        <NA>         9
  [10]    chr17 3658005-3658175      + |    211678        <NA>        10
  [11]    chr17 3659858-3659975      + |    211679        <NA>        11
  [12]    chr17 3660236-3663103      + |    211681        <NA>        12
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

Which you could coerce to something else if you like

> as(z[['NM_004937']], "data.frame")
   seqnames   start     end width strand exon_id exon_name exon_rank
1     chr17 3636760 3636831    72      +  211669      <NA>         1
2     chr17 3637107 3637316   210      +  211670      <NA>         2
3     chr17 3640188 3640267    80      +  211671      <NA>         3
4     chr17 3647444 3647522    79      +  211672      <NA>         4
5     chr17 3648847 3648931    85      +  211673      <NA>         5
6     chr17 3654998 3655101   104      +  211674      <NA>         6
7     chr17 3655221 3655352   132      +  211675      <NA>         7
8     chr17 3656487 3656586   100      +  211676      <NA>         8
9     chr17 3656676 3656795   120      +  211677      <NA>         9
10    chr17 3658005 3658175   171      +  211678      <NA>        10
11    chr17 3659858 3659975   118      +  211679      <NA>        11
12    chr17 3660236 3663103  2868      +  211681      <NA>        12
> as(z[['NM_004937']], "DataFrame")
DataFrame with 12 rows and 4 columns
                          X   exon_id   exon_name exon_rank
                  <GRanges> <integer> <character> <integer>
1   chr17:3636760-3636831:+    211669          NA         1
2   chr17:3637107-3637316:+    211670          NA         2
3   chr17:3640188-3640267:+    211671          NA         3
4   chr17:3647444-3647522:+    211672          NA         4
5   chr17:3648847-3648931:+    211673          NA         5
...                     ...       ...         ...       ...
8   chr17:3656487-3656586:+    211676          NA         8
9   chr17:3656676-3656795:+    211677          NA         9
10  chr17:3658005-3658175:+    211678          NA        10
11  chr17:3659858-3659975:+    211679          NA        11
12  chr17:3660236-3663103:+    211681          NA        12
ADD REPLY
0
Entering edit mode

Much clearer now, thanks a lot!

ADD REPLY

Login before adding your answer.

Traffic: 684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6