Does biovizBase contain a genesymbol data set for other model organisms?
Entering edit mode
moldach ▴ 20
Last seen 3.9 years ago
Canada/Montreal/Douglas Mental Health I…

I would like to use ggbio for C elegans and am following along with this tutorial for biovizBase.

When adding a gene model track the code looks like this:

#load gene symbol : GRanges, one gene/row
data(genesymbol, package = "biovizBase")

Looking at the help for ?genesymbol it looks like this is a dataset for humans. The package Homo.sapiens only contains data for Homo sapiens. OrganismDb, which hosts Homo.sapiens, looks like it only has information for two other models: Mus musculus, and Rattus norvegicus.

Is it possible to get a gene model track for C elegans with ggbio?

Adding the reference track doesn't seem to be an issue; I searched BSgenome.Hsapiens.UCSC.hg19 on Bioconductor and under Packages found under BSgenome there is a package called BSgenome.Celegans.UCSC.ce11.

biovizBase ggbio • 740 views
Entering edit mode

Just to clarify, the genesymbol dataset is really just a convenience for demonstration purposes. Jim is right: ggbio should directly support TxDb and other annotation objects.

Entering edit mode
Last seen 7 hours ago
United States

You can do that on the fly, using the TxDb and the OrgDb for that organism. Here is a function that I wrote recently to create a gene plot, followed by coverage plots for zero or more BAM files, that I use for non-model organisms.

makeGGplot <- function(gene, bams = NULL, txdb, orgdb, bamnames, geom = "gapped.pair"){
    ## gene is gene symbol
    ## bams is a vector of bamfile paths. Ensure they are indexed
    ## txdb and orgdb are obvious
    ## bamnames is a vector of names for resulting ggplot
    ## geom = "line" is in general faster
    require("biovizBase", quietly = TRUE, character.only = TRUE)
    require("ggbio", quietly = TRUE, character.only = TRUE)
    egid <- mapIds(orgdb, gene, "ENTREZID", "SYMBOL")
    gnrng <- genes(txdb, single.strand.genes.only = FALSE)[[egid]]
    if(length(gnrng) > 1) warning("Doubledog!")
    tx <- crunch(txdb, which = gnrng)
    tx <- split(tx, tx$tx_id)
    lbs <- select(txdb, names(tx), c("TXNAME","TXID","GENEID"), "TXID")
    lbs$SYMBOL <- mapIds(orgdb, lbs$GENEID, "SYMBOL", "ENTREZID")
    ind <- lbs$GENEID %in% egid
    lbs <- lbs[ind,]
    tx <- tx[ind]
    names(tx) <- lbs$SYMBOL
    gnplot <- ggplot() +, c(list(data = tx, names.expr = "SYMBOL")))
    } else {
        gnplot <- list(gnplot)
        gnrng <- keepSeqlevels(gnrng, as.character(seqnames(gnrng)))
        bl <- BamFileList(bams)
        bamplot <- lapply(bl, autoplot, which = gnrng, geom=geom)
        names(bamplot) <- bamnames
        tracks(c(gnplot, bamplot), heights = c(4, rep(1, length(bams))), label.text.angle = 0, xlim = gnrng,
               label.text.cex = 0.75)

Doing say

> hub <- AnnotationHub()
  |======================================================================| 100%

snapshotDate(): 2019-10-29
> query(hub, c("elegans","txdb"))
AnnotationHub with 8 records
# snapshotDate(): 2019-10-29 
# $dataprovider: UCSC
# $species: Caenorhabditis elegans
# $rdataclass: TxDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH52249"]]' 

  AH52249 | TxDb.Celegans.UCSC.ce11.refGene.sqlite
  AH52250 | TxDb.Celegans.UCSC.ce6.ensGene.sqlite 
  AH53778 | TxDb.Celegans.UCSC.ce11.refGene.sqlite
  AH57984 | TxDb.Celegans.UCSC.ce11.refGene.sqlite
  AH61790 | TxDb.Celegans.UCSC.ce11.refGene.sqlite
  AH66167 | TxDb.Celegans.UCSC.ce11.refGene.sqlite
  AH70583 | TxDb.Celegans.UCSC.ce11.ensGene.sqlite
  AH70584 | TxDb.Celegans.UCSC.ce11.refGene.sqlite
> txdb <- hub[["AH70584"]]
> query(hub, c("elegans","orgdb"))
AnnotationHub with 2 records
# snapshotDate(): 2019-10-29 
# $dataprovider:
# $species: Drosophila elegans, Caenorhabditis elegans
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH75747"]]' 

  AH75747 |             
  AH76581 |
> orgdb <- hub[["AH75747"]]
> makeGGplot("homt-1", NULL, txdb, orgdb, "nobams")

Seems to work OK.


Login before adding your answer.

Traffic: 615 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6