Importing a bed file
1
0
Entering edit mode
Skylar • 0
@d27434d9
Last seen 3.6 years ago
United States

fname <- file.choose() #C:\Users\reach\Downloads\CpGislands.Hsapiens.hg38.UCSC.bed.gz file.exists(fname) [1] TRUE cpg <- import(fname) Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : scan() expected 'an integer', got 'chr1'

https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=578954849_wF1QP81SIHdfr8b0kmZUOcsZcHYr&clade=mammal&org=Human&db=hg38&hgta_group=regulation&hgta_track=knownGene&hgta_table=0&hgta_regionType=genome&position=chr9%3A133252000-133280861&hgta_outputType=primaryTable&hgta_outFileName=

rtracklayer • 2.1k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

You need to make one more step, which is to ensure that you change the output format field to Bed format on the UCSC website before you download the table. This will open one more webpage that asks you some questions. After which it works as advertised:

> import("../Downloads/CpGislands.Hsapiens.hg38.UCSC.bed.gz")
UCSC track 'tb_cpgIslandExt'
UCSCData object with 31144 ranges and 1 metadata column:
                        seqnames              ranges strand |        name
                           <Rle>           <IRanges>  <Rle> | <character>
      [1]                   chr1 155188537-155192004      * |    CpG:_361
      [2]                   chr1     2226774-2229734      * |    CpG:_366
      [3]                   chr1   36306230-36307408      * |    CpG:_110
      [4]                   chr1   47708823-47710847      * |    CpG:_164
      [5]                   chr1   53737730-53739637      * |    CpG:_221
      ...                    ...                 ...    ... .         ...
  [31140] chr22_KI270734v1_ran..       131010-132049      * |    CpG:_102
  [31141] chr22_KI270734v1_ran..       161257-161626      * |     CpG:_55
  [31142] chr22_KI270735v1_ran..         17221-18098      * |    CpG:_100
  [31143] chr22_KI270738v1_ran..           4413-5280      * |     CpG:_80
  [31144] chr22_KI270738v1_ran..           6226-6467      * |     CpG:_34
  -------
  seqinfo: 332 sequences from an unspecified genome; no seqlengths
ADD COMMENT
0
Entering edit mode

An alternative that might be more useful would be to download directly and make a GRanges object.

> session <- browserSession("UCSC")
> genome(session) <- "hg38"
> z <- getTable(ucscTableQuery(session, track = "CpG Islands", table = "cpgIslandExt"))
> head(z)
  bin chrom chromStart  chromEnd     name length cpgNum gcNum perCpg perGc
1  27  chr1  155188536 155192004 CpG: 361   3468    361  2761   20.8  79.6
2  75  chr1    2226773   2229734 CpG: 366   2961    366  1999   24.7  67.5
3 107  chr1   36306229  36307408 CpG: 110   1179    110   824   18.7  69.9
4 118  chr1   47708822  47710847 CpG: 164   2025    164  1268   16.2  62.6
5 124  chr1   53737729  53739637 CpG: 221   1908    221  1347   23.2  70.6
6 210  chr1  144179071 144179313  CpG: 20    242     20   172   16.5  71.1
  obsExp
1   0.73
2   1.08
3   0.77
4   0.83
5   0.93
6   0.68

> zz <- GRanges(z[,2], IRanges(z[,3],z[,4]), name = z[,5], cpgNum = z[,7], gcNum = z[,8], perCpg = z[,9], perGc = z[,10])
> zz
GRanges object with 31144 ranges and 5 metadata columns:
          seqnames              ranges strand |        name    cpgNum     gcNum
             <Rle>           <IRanges>  <Rle> | <character> <numeric> <numeric>
      [1]     chr1 155188536-155192004      * |    CpG: 361       361      2761
      [2]     chr1     2226773-2229734      * |    CpG: 366       366      1999
      [3]     chr1   36306229-36307408      * |    CpG: 110       110       824
      [4]     chr1   47708822-47710847      * |    CpG: 164       164      1268
      [5]     chr1   53737729-53739637      * |    CpG: 221       221      1347
      ...      ...                 ...    ... .         ...       ...       ...
  [31140]     chr2 242003255-242004412      * |     CpG: 79        79       749
  [31141]     chr2 242006589-242010686      * |    CpG: 263       263      2483
  [31142]     chr2 242045490-242045723      * |     CpG: 16        16       150
  [31143]     chr2 242046614-242047706      * |    CpG: 170       170       848
  [31144]     chr2 242088149-242089411      * |    CpG: 149       149       875
             perCpg     perGc
          <numeric> <numeric>
      [1]      20.8      79.6
      [2]      24.7      67.5
      [3]      18.7      69.9
      [4]      16.2      62.6
      [5]      23.2      70.6
      ...       ...       ...
  [31140]      13.7      64.7
  [31141]      12.8      60.6
  [31142]      13.7      64.4
  [31143]      31.1      77.7
  [31144]      23.6      69.3
  -------
  seqinfo: 332 sequences from an unspecified genome; no seqlengths

Depending on whether or not you care about the extra information you can add.

ADD REPLY
0
Entering edit mode

This worked! Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6