Question: using the "promoters" function with an "OrganismDb" to generate "GRanges" with "REFSEQ" rather than UCSC gene names
1
gravatar for efoss
3.8 years ago by
efoss10
United States
efoss10 wrote:

 

I would like to use the "promoters" function in the "GenomicRanges" package to generate "GRanges" objects, but I would like these objects to have "REFSEQ" names rather than UCSC-style names. 

This gives me UCSC-style names: 

library(OrganismDbi)

a1 <- promoters(Mus.musculus)

Trying the following sort of approach, based on looking at the columns in Mus.musculus, allows me to change some of the metadata columns, but I quickly end up with error messages saying that the columns I'm trying to add are unavailable: 

a2 <- promoters(Mus.musculus, columns = c("TXID", "TXSTART", "GENEID"))

Thank you. 

Eric

ADD COMMENTlink modified 3.8 years ago by Robert Castelo2.3k • written 3.8 years ago by efoss10
Answer: using the "promoters" function with an "OrganismDb" to generate "GRanges" with "
1
gravatar for Robert Castelo
3.8 years ago by
Robert Castelo2.3k
Spain/Barcelona/Universitat Pompeu Fabra
Robert Castelo2.3k wrote:

hi Eric,

I would say you need to use the GenomicFeatures package that allows one to fetch and build custom annotation TxDb objects. You can, for instance, build the TxDb package from the 'refGene' track at the UCSC using the function  makeTxDbFromUCSC() that will return you a TxDb object, or makeTxDbPackageFromUCSC() if you want to have an actual package file and install it. Look up the documentation to see how to call them. You can then use the promoters() function on the TxDb objects. The vignette of GenomicFeatures has also specific examples.

Another route that may be interesting for you is to go directly to the source of refseq annotations which is at NCBI, download the GFF and create a TxDb object from the GFF file with the function makeTxDbFromGFF(). Here is an example:

library(GenomicFeatures)

url <- "ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/GFF/ref_GRCm38.p3_top_level.gff3.gz"
download.file(url, "ref_GRCm38.p3_top_level.gff3.gz", method="curl")

txdbMm38 <- makeTxDbFromGFF(file=gzfile("ref_GRCm38.p3_top_level.gff3.gz"))
                                      
a1 <- promoters(txdbMm38)

head(a1)
GRanges object with 6 ranges and 2 metadata columns:
         seqnames             ranges strand |     tx_id     tx_name
            <Rle>          <IRanges>  <Rle> | <integer> <character>
  [1] NC_000067.6 [3355323, 3357522]      + |         1 XR_865166.1
  [2] NC_000067.6 [4520817, 4523016]      + |         2      Gm7357
  [3] NC_000067.6 [4593021, 4595220]      + |         3 XR_865167.1
  [4] NC_000067.6 [4593021, 4595220]      + |         4 XR_865168.1
  [5] NC_000067.6 [4607962, 4610161]      + |         5      Gm7369
  [6] NC_000067.6 [4614900, 4617099]      + |         6 XR_373196.1
  -------
  seqinfo: 163 sequences from an unspecified genome; no seqlengths

 

cheers,

robert.

ADD COMMENTlink written 3.8 years ago by Robert Castelo2.3k

Thanks so very much, Robert! That really helps. 

Eric

ADD REPLYlink written 3.8 years ago by efoss10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 199 users visited in the last hour