Question

about library(Homo.sapiens) from TxDb.Hsapiens.UCSC.hg18.knownGene

0

Entering edit mode

Bogdan ▴ 670

@bogdan-2367

Last seen 6 months ago

Palo Alto, CA, USA

Dear all,

a suggestion on how I could use the library(Homo.sapiens) for hg18 and not
for hg19 (that seems to be the default). thank you very much,

-- bogdan

ps : I followed the example given in the booklet at http://www.bioconductor.org/packages/release/bioc/vignettes/OrganismDbi/inst/doc/OrganismDbi.pdf, although it does not fully work. thanks !

annotate • 1.6k views

ADD COMMENT • link 9.4 years ago Bogdan ▴ 670

score 1 · Answer 1 · 2014-11-20

I think you would have to define what 'it does not fully work' means. I just built a Homo.sapiens package using the hg18 TxDb package:

> gd <- list(join1 = c(GO.db = "GOID", org.Hs.eg.db = "GO"), join2 = (c(org.Hs.eg.db = "ENTREZID",TxDb.Hsapiens.UCSC.hg18.knownGene = "GENEID")))

> makeOrganismPackage(pkgname = "Homo.sapiens.old", graphData=gd, organism="Homo sapiens", version = "0.0.1", maintainer = "me<me@mine.org>", author="me", destDir=destination, license="Artistic-2.0")
Creating package in /data3/tmp/Rtmp8b7rmk/file735f3070ed94/Homo.sapiens.old
> install.packages("/data3/tmp/Rtmp8b7rmk/file735f3070ed94/Homo.sapiens.old", repos=NULL)

And then

> select(Homo.sapiens.old, "1", c("TXID", "GOID", "CHR","CHRLOC"), "ENTREZID")
   ENTREZID CHR    CHRLOC CHRLOCCHR EVIDENCE ONTOLOGY  TXID       GOID
1         1  19 -58858172        19       ND       MF 58334 GO:0003674
2         1  19 -58858172        19       ND       MF 58335 GO:0003674
3         1  19 -58858172        19      IDA       CC 58334 GO:0005576
4         1  19 -58858172        19      IDA       CC 58335 GO:0005576
5         1  19 -58858172        19      IDA       CC 58334 GO:0005615
6         1  19 -58858172        19      IDA       CC 58335 GO:0005615
7         1  19 -58858172        19       ND       BP 58334 GO:0008150
8         1  19 -58858172        19       ND       BP 58335 GO:0008150
9         1  19 -58858172        19      IDA       CC 58334 GO:0070062
10        1  19 -58858172        19      IDA       CC 58335 GO:0070062
11        1  19 -58858172        19      IDA       CC 58334 GO:0072562
12        1  19 -58858172        19      IDA       CC 58335 GO:0072562

> library(Homo.sapiens)
Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
> select(Homo.sapiens, "1", c("TXID", "GOID", "CHR","CHRLOC"), "ENTREZID")
   ENTREZID CHR    CHRLOC CHRLOCCHR EVIDENCE ONTOLOGY  TXID       GOID
1         1  19 -58858172        19       ND       MF 70455 GO:0003674
2         1  19 -58858172        19       ND       MF 70456 GO:0003674
3         1  19 -58858172        19      IDA       CC 70455 GO:0005576
4         1  19 -58858172        19      IDA       CC 70456 GO:0005576
5         1  19 -58858172        19      IDA       CC 70455 GO:0005615
6         1  19 -58858172        19      IDA       CC 70456 GO:0005615
7         1  19 -58858172        19       ND       BP 70455 GO:0008150
8         1  19 -58858172        19       ND       BP 70456 GO:0008150
9         1  19 -58858172        19      IDA       CC 70455 GO:0070062
10        1  19 -58858172        19      IDA       CC 70456 GO:0070062
11        1  19 -58858172        19      IDA       CC 70455 GO:0072562
12        1  19 -58858172        19      IDA       CC 70456 GO:0072562

So I am getting TXIDs from the old TxDb.Hsapiens.UCSC.hg18.knownGene, so by definition this does 'work', if by 'work' we mean 'extract data from the various databases that we are intending to draw said data from'. However, the problem here is that the org.Hs.eg.db and GO.db data are all current data, so things like the CHR and CHRLOC, which come from org.Hs.eg.db will be based on hg19, not hg18.

> z <- transcriptsBy(TxDb.Hsapiens.UCSC.hg18.knownGene)
> z[[1]]
GRanges object with 2 ranges and 2 metadata columns:
      seqnames               ranges strand |     tx_id     tx_name
         <Rle>            <IRanges>  <Rle> | <integer> <character>
  [1]    chr19 [63549984, 63556677]      - |     58334  uc002qsd.2
  [2]    chr19 [63551644, 63565932]      - |     58335  uc002qsf.1
  -------
  seqinfo: 49 sequences (1 circular) from hg18 genome

I don't know how you could create a purely hg18 based Homo.sapiens package without going back to some pre-2009 version of Bioconductor to get all the other annotation data, or I guess you could try to download really old data from NCBI and build an outdated org.Hs.eg.db and GO.db package, but it all seems rather pointless. We have updated information about the genome, and where genes are located on it, so why ignore all that and annotate things based on what we thought was true circa 2007?

score 0 · Answer 2 · 2014-11-20

0