Question: Are AH52264, AH66175, and AH70594 duplicates?
0
gravatar for Aditya
4 weeks ago by
Aditya120
Germany
Aditya120 wrote:

Are AH52264, AH66175 and AH70594 duplicates, arisen due to automated snapshotting of the same resource at different dates?

ah <- AnnotationHub()            # all
ucsc <- query(ah, 'UCSC')        # ucsc 
txdbs <- query(ucsc, 'TxDb')     # txdb 
txdbs
       title                                    
     AH52263 | TxDb.Mmusculus.UCSC.mm10.ensGene.sqlite  
     AH52264 | TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite
     AH52265 | TxDb.Mmusculus.UCSC.mm9.knownGene.sqlite 
     AH66175 | TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite
     AH70594 | TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite
annotationhub • 63 views
ADD COMMENTlink modified 4 weeks ago by shepherl ♦♦ 1.7k • written 4 weeks ago by Aditya120
Answer: Are AH52264, AH66175, and AH70594 duplicates?
2
gravatar for shepherl
4 weeks ago by
shepherl ♦♦ 1.7k
United States
shepherl ♦♦ 1.7k wrote:

We rebuild the TxDb per release cycle to account for any changes; A new version but for the same genome build. The most recent will likely be what you want but we don't strictly replace so people could replicate older analysis.

ADD COMMENTlink written 4 weeks ago by shepherl ♦♦ 1.7k

Thank you Lori :-) .

ADD REPLYlink written 4 weeks ago by Aditya120
1

Also note that this information is available, so you can check for yourself.

> z <- query(hub, c("UCSC","TxDb","mus musculus"))
> z
AnnotationHub with 5 records
# snapshotDate(): 2019-10-17 
# $dataprovider: UCSC
# $species: Mus musculus
# $rdataclass: TxDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH52263"]]' 

            title                                    
  AH52263 | TxDb.Mmusculus.UCSC.mm10.ensGene.sqlite  
  AH52264 | TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite
  AH52265 | TxDb.Mmusculus.UCSC.mm9.knownGene.sqlite 
  AH66175 | TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite
  AH70594 | TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite

> names(mcols(z))
 [1] "title"              "dataprovider"       "species"           
 [4] "taxonomyid"         "genome"             "description"       
 [7] "coordinate_1_based" "maintainer"         "rdatadateadded"    
[10] "preparerclass"      "tags"               "rdataclass"        
[13] "rdatapath"          "sourceurl"          "sourcetype"        

> mcols(z)[,c("title","rdatadateadded")]
DataFrame with 5 rows and 2 columns
                                            title rdatadateadded
                                      <character>    <character>
AH52263   TxDb.Mmusculus.UCSC.mm10.ensGene.sqlite     2016-12-22
AH52264 TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite     2016-12-22
AH52265  TxDb.Mmusculus.UCSC.mm9.knownGene.sqlite     2016-12-22
AH66175 TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite     2018-10-22
AH70594 TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite     2019-05-01
ADD REPLYlink written 4 weeks ago by James W. MacDonald51k

Thankyou James. Knowing it is policy (for backward compatibility) rather than artifact helps to appreciate the redundancy :-)

ADD REPLYlink written 29 days ago by Aditya120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 436 users visited in the last hour