rtracklayer accessing UCSC data table not working
2
0
Entering edit mode
@68a21322
Last seen 11 days ago
Germany

Hi,

I am following the "Example 1: the RepeatMasker Track" given by Michael Lawrence in chapter 5.1 of the documentation for the package 'rtracklayer':

library (rtracklayer)
mySession = browserSession("UCSC")
genome(mySession) <- "hg19"
e2f3.tss.grange <- GRanges("chr6", IRanges(20400587, 20403336))
tbl.rmsk <- getTable(
  ucscTableQuery(mySession, track="rmsk",
                  range=e2f3.tss.grange, table="rmsk"))

Running this code, I get the error message: "error in evaluating the argument 'object' in selecting a method for function 'getTable': error in evaluating the argument 'table' in selecting a method for function '%in%': Unknown track: rmsk"

Listing all available tracks reveals that the name for the track might be: "RepeatMasker".

# List all available tracks in the session
available_tracks <- trackNames(mySession)
print("Available Tracks:")
print(sort(available_tracks))

However, when I replace the track name "rmsk" with "RepeatMasker (see below), I still get an error message:

library (rtracklayer)
mySession = browserSession("UCSC")
genome(mySession) <- "hg19"
e2f3.tss.grange <- GRanges("chr6", IRanges(20400587, 20403336))
tbl.rmsk <- getTable(
  ucscTableQuery(mySession, track="RepeatMasker",
                  range=e2f3.tss.grange, table="rmsk"))

command: tbl.rmsk <- getTable(...) error message: "Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': second argument must be a list"

How can I get the example to work?

Kind regards, Andreas

sessioninfo()

R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rtracklayer_1.64.0   GenomicRanges_1.56.1 GenomeInfoDb_1.40.1  IRanges_2.38.1       S4Vectors_0.42.1     BiocGenerics_0.50.0 

loaded via a namespace (and not attached):
 [1] utf8_1.2.4                  generics_0.1.3              SparseArray_1.4.8           bitops_1.0-8                lattice_0.22-6              digest_0.6.37               magrittr_2.0.3             
 [8] grid_4.4.1                  evaluate_1.0.0              fastmap_1.2.0               Matrix_1.7-0                jsonlite_1.8.8              restfulr_0.0.15             httr_1.4.7                 
[15] fansi_1.0.6                 UCSC.utils_1.0.0            XML_3.99-0.17               Biostrings_2.72.1           codetools_0.2-20            abind_1.4-8                 cli_3.6.3                  
[22] rlang_1.1.4                 crayon_1.5.3                XVector_0.44.0              Biobase_2.64.0              DelayedArray_0.30.1         yaml_2.3.10                 S4Arrays_1.4.1             
[29] tools_4.4.1                 parallel_4.4.1              BiocParallel_1.38.0         dplyr_1.1.4                 GenomeInfoDbData_1.2.12     Rsamtools_2.20.0            SummarizedExperiment_1.34.0
[36] curl_5.2.2                  vctrs_0.6.5                 R6_2.5.1                    matrixStats_1.4.1           BiocIO_1.14.0               lifecycle_1.0.4             zlibbioc_1.50.0            
[43] pkgconfig_2.0.3             pillar_1.9.0                glue_1.7.0                  xfun_0.47                   tibble_3.2.1                GenomicAlignments_1.40.0    tidyselect_1.2.1           
[50] MatrixGenerics_1.16.0       rstudioapi_0.16.0           knitr_1.48                  rjson_0.2.23                htmltools_0.5.8.1           rmarkdown_2.28              compiler_4.4.1             
[57] RCurl_1.98-1.16
UCSC rtracklayer getTable • 233 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

The track argument is deprecated now, so you can omit it.

> tbl.rmsk <- getTable(ucscTableQuery(mySession, table = "rmsk", range = e2f3.tss.grange))

> head(tbl.rmsk)
  bin swScore milliDiv milliDel milliIns genoName genoStart  genoEnd   genoLeft strand repName       repClass
1 740      25        0        0        0     chr6  20401747 20401772 -150713295      + GC_rich Low_complexity
2 740     231       36        0        0     chr6  20402594 20402622 -150712445      +  (CCG)n  Simple_repeat
3 740     213       39        0        0     chr6  20402824 20402850 -150712217      +  (CGG)n  Simple_repeat
       repFamily repStart repEnd repLeft id
1 Low_complexity        1     25       0  3
2  Simple_repeat        2     29       0  3
3  Simple_repeat        3     28       0  3
0
Entering edit mode

Dear James,

thank you very much for your response! Your solution is working.

However, it turned out that neither the command 'tbl.rmsk <- getTable(ucscTableQuery(mySession, table = "rmsk", range = e2f3.tss.grange))' nor 'tbl.rmsk <- getTable(ucscTableQuery(mySession, track="RepeatMasker", range=e2f3.tss.grange, table="rmsk"))' were working in the first place due to a problem with the company proxy server.

Kind regards, Andreas

ADD REPLY
0
Entering edit mode
Robert Castelo ★ 3.4k
@rcastelo
Last seen 5 days ago
Barcelona/Universitat Pompeu Fabra

hi, just for completeness, the UCSC RepeatMasker tracks are also available at the AnnotationHub, as GRanges objects; see the vignette of the RepeatMasker annotation package for further details:

> library(AnnotationHub)

> ah <- AnnotationHub()
> query(ah, c("UCSC", "RepeatMasker", "Homo sapiens"))
AnnotationHub with 3 records
# snapshotDate(): 2024-04-30
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH99002"]]' 

             title                                                   
  AH99002  | UCSC RepeatMasker annotations (Mar2020) for Human (hg19)
  AH99003  | UCSC RepeatMasker annotations (Sep2021) for Human (hg38)
  AH111333 | UCSC RepeatMasker annotations (Oct2022) for Human (hg38)
> rmskhg19 <- ah[["AH99002"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache
rmskhg19
GRanges object with 5481341 ranges and 11 metadata columns:
                      seqnames      ranges strand |   swScore  milliDiv
                         <Rle>   <IRanges>  <Rle> | <integer> <numeric>
        [1]               chr1 10001-10468      + |      1504        13
        [2]               chr1 16713-16749      + |       203       162
        [3]               chr1 18907-19048      + |       239       338
        [4]               chr1 19948-20405      + |       652       346
        [5]               chr1 20531-20679      + |       270       331
        ...                ...         ...    ... .       ...       ...
  [5481337] chr22_kb663609_alt 68801-69424      - |      2599       234
  [5481338] chr22_kb663609_alt 69546-69792      - |      2599       234
  [5481339] chr22_kb663609_alt 69793-70091      - |      2344        94
  [5481340] chr22_kb663609_alt 72052-72215      - |       225       327
  [5481341] chr22_kb663609_alt 72185-72376      - |       268       282
             milliDel  milliIns   genoLeft     repName      repClass
            <numeric> <numeric>  <integer> <character>   <character>
        [1]         4        13 -249240153   (CCCTAA)n Simple_repeat
        [2]         0         0 -249233872      (TGG)n Simple_repeat
        [3]       148         0 -249231573         L2a          LINE
        [4]        85        42 -249230216          L3          LINE
        [5]         7        27 -249229942     Plat_L3          LINE
        ...       ...       ...        ...         ...           ...
  [5481337]        73        50      -4589        L1M4          LINE
  [5481338]        73        50      -4221        L1M4          LINE
  [5481339]         0         3      -3922      AluSq2          SINE
  [5481340]       126        30      -1798         L2b          LINE
  [5481341]        73        78      -1637        MIR3          SINE
                repFamily  repStart    repEnd   repLeft
              <character> <integer> <integer> <integer>
        [1] Simple_repeat         1       463         0
        [2] Simple_repeat         1        37         0
        [3]            L2      2942      3104      -322
        [4]           CR1      3042      3519      -970
        [5]           CR1      2802      2947      -639
        ...           ...       ...       ...       ...
  [5481337]            L1     -6290      1227       596
  [5481338]            L1     -6922       595       335
  [5481339]           Alu       -15       298         1
  [5481340]            L2        -2      3373      3192
  [5481341]           MIR       -15       193         3
  -------
  seqinfo: 298 sequences (2 circular) from hg19 genome
ADD COMMENT
0
Entering edit mode

Dear Robert,

thank you for sharing the information. The script for accessing the table 'RepeatMasker' was part of the official documentation for the package "rtracklayer" and used as an example only. But the package "AnnotationHub" sounds interesting for accessing UCSC related data in general!

Kind regards, Andreas

ADD REPLY

Login before adding your answer.

Traffic: 412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6