Question: querying annotation hub for common TF, in common cell line returns no results....
gravatar for chrisclarkson100
14 days ago by
chrisclarkson10030 wrote:

I am trying to query ENCODE for all the datasets it has on CTCF in the mm9 genome. I would have thought that this would be a fairly standard thing but it comes up with no results...


ctcf_imr90 <- query(hub, c("ENCODE","CTCF", "Mus Musculus", "mm9"))


This produces no results so I tried omitting CTCF and it returned many results.... This isn't the end of the world because I can just get the datasets I'm interested in directly from ENCODE but it's a bit of a pain and I would prefer if I could get it the Bioconductor way. Is it common for annotation queries as common as this to not return any results?

ADD COMMENTlink modified 11 days ago by shepherl ♦♦ 380 • written 14 days ago by chrisclarkson10030
gravatar for shepherl
11 days ago by
shepherl ♦♦ 380
shepherl ♦♦ 380 wrote:

Could you provide a little detail about what you were trying to accomplish and what type of file or output you expected?

The AnnotationHub::query() function only queries AnnotationHub Resources in Bioconductor; It is not an interactive query with web based hosts.

The query you provided is too specific as it queries information about the Resource but not the resource itself which is why adding "CTCF" produced no results. There are currently 10 resources related to "ENCODE" and "Mus Musculus"/"mm9" as shown by using your query without "CTCF"

> hub = AnnotationHub()
snapshotDate(): 2017-09-07
> mm = query(hub, c("ENCODE", "Mus Musculus", "mm9"))
> length(mm)
[1] 10
> mm
AnnotationHub with 10 records
# snapshotDate(): 2017-09-07 
# $dataprovider: UCSC
# $species: Mus musculus
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH6119"]]' 

  AH6119 | Caltech Histone  
  AH6120 | Caltech TFBS     
  AH6121 | LICR Histone     
  AH6122 | NHGRI BiP        
  AH6123 | PSU DNaseI HS    
  AH6124 | PSU Histone      
  AH6125 | PSU TFBS         
  AH6126 | Stan/Yale Histone
  AH6128 | UW DNaseI DGF    
  AH6129 | UW DNaseI HS     

You can get further information about these resources by looking at mcols provided in AnnotationHub.

> names(mcols(mm))
 [1] "title"              "dataprovider"       "species"           
 [4] "taxonomyid"         "genome"             "description"       
 [7] "coordinate_1_based" "maintainer"         "rdatadateadded"    
[10] "preparerclass"      "tags"               "rdataclass"        
[13] "rdatapath"          "sourceurl"          "sourcetype"        
> mm$description
 [1] "GRanges object from UCSC track 'Caltech Histone'"  
 [2] "GRanges object from UCSC track 'Caltech TFBS'"     
 [3] "GRanges object from UCSC track 'LICR Histone'"     
 [4] "GRanges object from UCSC track 'NHGRI BiP'"        
 [5] "GRanges object from UCSC track 'PSU DNaseI HS'"    
 [6] "GRanges object from UCSC track 'PSU Histone'"      
 [7] "GRanges object from UCSC track 'PSU TFBS'"         
 [8] "GRanges object from UCSC track 'Stan/Yale Histone'"
 [9] "GRanges object from UCSC track 'UW DNaseI DGF'"    
[10] "GRanges object from UCSC track 'UW DNaseI HS'"   

> mm$sourceurl
 [1] "rtracklayer://"
 [2] "rtracklayer://"
 [3] "rtracklayer://"
 [4] "rtracklayer://"   
 [5] "rtracklayer://"   
 [6] "rtracklayer://" 
 [7] "rtracklayer://"    
 [8] "rtracklayer://"   
 [9] "rtracklayer://"      
[10] "rtracklayer://"

If there is an Annotation or file format you feel would be useful we can certainly look at adding it to the Resources provided by Bioconductor AnnotationHub. That is why I was curious at what file/format you expected?

It may not be what you are looking for but it is also worth mentioning, AnnotationHub and Bioconductor provide OrgDb and TxDb objects/packages as well with certain genomic information that could be manipulated.
Considering you were interested in "CTCF". Something like the following:

ah = AnnotationHub() 
orgs <- subset(ah, ah$rdataclass == "OrgDb")
mouse = query(orgs, "Mus Musculus")[[1]]
select(mouse, keys="13018",  columns=c("SYMBOL","REFSEQ"), keytype="ENTREZID")

txdb <- TxDb.Mmusculus.UCSC.mm9.knownGene
ADD COMMENTlink written 11 days ago by shepherl ♦♦ 380
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 272 users visited in the last hour