querying annotation hub for common TF, in common cell line returns no results....
1
0
Entering edit mode
@chrisclarkson100-11114
Last seen 22 months ago
United Kingdom

I am trying to query ENCODE for all the datasets it has on CTCF in the mm9 genome. I would have thought that this would be a fairly standard thing but it comes up with no results...

hub=Annotationhub()

ctcf_imr90 <- query(hub, c("ENCODE","CTCF", "Mus Musculus", "mm9"))

 

This produces no results so I tried omitting CTCF and it returned many results.... This isn't the end of the world because I can just get the datasets I'm interested in directly from ENCODE but it's a bit of a pain and I would prefer if I could get it the Bioconductor way. Is it common for annotation queries as common as this to not return any results?

annotate annotationhub • 1.3k views
ADD COMMENT
0
Entering edit mode
shepherl 3.8k
@lshep
Last seen 5 hours ago
United States

Could you provide a little detail about what you were trying to accomplish and what type of file or output you expected?

The AnnotationHub::query() function only queries AnnotationHub Resources in Bioconductor; It is not an interactive query with web based hosts.

The query you provided is too specific as it queries information about the Resource but not the resource itself which is why adding "CTCF" produced no results. There are currently 10 resources related to "ENCODE" and "Mus Musculus"/"mm9" as shown by using your query without "CTCF"

> hub = AnnotationHub()
snapshotDate(): 2017-09-07
> mm = query(hub, c("ENCODE", "Mus Musculus", "mm9"))
> length(mm)
[1] 10
> mm
AnnotationHub with 10 records
# snapshotDate(): 2017-09-07 
# $dataprovider: UCSC
# $species: Mus musculus
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH6119"]]' 

           title            
  AH6119 | Caltech Histone  
  AH6120 | Caltech TFBS     
  AH6121 | LICR Histone     
  AH6122 | NHGRI BiP        
  AH6123 | PSU DNaseI HS    
  AH6124 | PSU Histone      
  AH6125 | PSU TFBS         
  AH6126 | Stan/Yale Histone
  AH6128 | UW DNaseI DGF    
  AH6129 | UW DNaseI HS     
>

You can get further information about these resources by looking at mcols provided in AnnotationHub.

> names(mcols(mm))
 [1] "title"              "dataprovider"       "species"           
 [4] "taxonomyid"         "genome"             "description"       
 [7] "coordinate_1_based" "maintainer"         "rdatadateadded"    
[10] "preparerclass"      "tags"               "rdataclass"        
[13] "rdatapath"          "sourceurl"          "sourcetype"        
> 
> 
> mm$description
 [1] "GRanges object from UCSC track 'Caltech Histone'"  
 [2] "GRanges object from UCSC track 'Caltech TFBS'"     
 [3] "GRanges object from UCSC track 'LICR Histone'"     
 [4] "GRanges object from UCSC track 'NHGRI BiP'"        
 [5] "GRanges object from UCSC track 'PSU DNaseI HS'"    
 [6] "GRanges object from UCSC track 'PSU Histone'"      
 [7] "GRanges object from UCSC track 'PSU TFBS'"         
 [8] "GRanges object from UCSC track 'Stan/Yale Histone'"
 [9] "GRanges object from UCSC track 'UW DNaseI DGF'"    
[10] "GRanges object from UCSC track 'UW DNaseI HS'"   

> mm$sourceurl
 [1] "rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/mm9/database/wgEncodeCaltechHist"
 [2] "rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/mm9/database/wgEncodeCaltechTfbs"
 [3] "rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/mm9/database/wgEncodeLicrHistone"
 [4] "rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/mm9/database/wgEncodeNhgriBip"   
 [5] "rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/mm9/database/wgEncodePsuDnase"   
 [6] "rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/mm9/database/wgEncodePsuHistone" 
 [7] "rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/mm9/database/wgEncodePsuTfbs"    
 [8] "rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/mm9/database/wgEncodeSydhHist"   
 [9] "rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/mm9/database/wgEncodeUwDgf"      
[10] "rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/mm9/database/wgEncodeUwDnase"

If there is an Annotation or file format you feel would be useful we can certainly look at adding it to the Resources provided by Bioconductor AnnotationHub. That is why I was curious at what file/format you expected?

It may not be what you are looking for but it is also worth mentioning, AnnotationHub and Bioconductor provide OrgDb and TxDb objects/packages as well with certain genomic information that could be manipulated.
Considering you were interested in "CTCF". Something like the following:

ah = AnnotationHub() 
orgs <- subset(ah, ah$rdataclass == "OrgDb")
mouse = query(orgs, "Mus Musculus")[[1]]
mouse
select(mouse, keys="13018",  columns=c("SYMBOL","REFSEQ"), keytype="ENTREZID")

library(TxDb.Mmusculus.UCSC.mm9.knownGene)
txdb <- TxDb.Mmusculus.UCSC.mm9.knownGene
txdb
ADD COMMENT

Login before adding your answer.

Traffic: 698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6