Are AH52264, AH66175, and AH70594 duplicates?
1
0
Entering edit mode
Aditya ▴ 160
@aditya-7667
Last seen 2.5 years ago
Germany

Are AH52264, AH66175 and AH70594 duplicates, arisen due to automated snapshotting of the same resource at different dates?

ah <- AnnotationHub()            # all
ucsc <- query(ah, 'UCSC')        # ucsc 
txdbs <- query(ucsc, 'TxDb')     # txdb 
txdbs
       title                                    
     AH52263 | TxDb.Mmusculus.UCSC.mm10.ensGene.sqlite  
     AH52264 | TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite
     AH52265 | TxDb.Mmusculus.UCSC.mm9.knownGene.sqlite 
     AH66175 | TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite
     AH70594 | TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite
AnnotationHub • 1.4k views
ADD COMMENT
2
Entering edit mode
shepherl 4.1k
@lshep
Last seen 1 hour ago
United States

We rebuild the TxDb per release cycle to account for any changes; A new version but for the same genome build. The most recent will likely be what you want but we don't strictly replace so people could replicate older analysis.

ADD COMMENT
0
Entering edit mode

Thank you Lori :-) .

ADD REPLY
1
Entering edit mode

Also note that this information is available, so you can check for yourself.

> z <- query(hub, c("UCSC","TxDb","mus musculus"))
> z
AnnotationHub with 5 records
# snapshotDate(): 2019-10-17 
# $dataprovider: UCSC
# $species: Mus musculus
# $rdataclass: TxDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH52263"]]' 

            title                                    
  AH52263 | TxDb.Mmusculus.UCSC.mm10.ensGene.sqlite  
  AH52264 | TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite
  AH52265 | TxDb.Mmusculus.UCSC.mm9.knownGene.sqlite 
  AH66175 | TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite
  AH70594 | TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite

> names(mcols(z))
 [1] "title"              "dataprovider"       "species"           
 [4] "taxonomyid"         "genome"             "description"       
 [7] "coordinate_1_based" "maintainer"         "rdatadateadded"    
[10] "preparerclass"      "tags"               "rdataclass"        
[13] "rdatapath"          "sourceurl"          "sourcetype"        

> mcols(z)[,c("title","rdatadateadded")]
DataFrame with 5 rows and 2 columns
                                            title rdatadateadded
                                      <character>    <character>
AH52263   TxDb.Mmusculus.UCSC.mm10.ensGene.sqlite     2016-12-22
AH52264 TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite     2016-12-22
AH52265  TxDb.Mmusculus.UCSC.mm9.knownGene.sqlite     2016-12-22
AH66175 TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite     2018-10-22
AH70594 TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite     2019-05-01
ADD REPLY
0
Entering edit mode

Thankyou James. Knowing it is policy (for backward compatibility) rather than artifact helps to appreciate the redundancy :-)

ADD REPLY

Login before adding your answer.

Traffic: 913 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6