Genome versions of the VcfFiles from AnnotationHub
1
1
Entering edit mode
sskimb ▴ 10
@sskimb-10162
Last seen 5.3 years ago

I found that the genome build of the VcfFiles downloaded from AnnotationHub is obscure: the genome and tags fields mismatch as shown below.

> library(AnnotationHub)
> ah <- AnnotationHub()
> vfs <- query(ah, 'VcfFile')
> mcols(vfs)[, c(1,5,7)]
DataFrame with 8 rows and 3 columns
                                                 title      genome               tags
                                           <character> <character>        <character>
AH50420                        clinvar_20160203.vcf.gz        hg19 dbSNP, GRCh37, VCF
AH50421                   clinvar_20160203_papu.vcf.gz        hg19 dbSNP, GRCh38, VCF
AH50422            common_and_clinical_20160203.vcf.gz        hg19 dbSNP, GRCh37, VCF
AH50423 common_no_known_medical_impact_20160203.vcf.gz        hg19 dbSNP, GRCh38, VCF
AH50424                        clinvar_20160203.vcf.gz        hg19 dbSNP, GRCh37, VCF
AH50425                   clinvar_20160203_papu.vcf.gz        hg19 dbSNP, GRCh38, VCF
AH50426            common_and_clinical_20160203.vcf.gz        hg19 dbSNP, GRCh37, VCF
AH50427 common_no_known_medical_impact_20160203.vcf.gz        hg19 dbSNP, GRCh38, VCF
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=ko_KR.UTF-8       LC_NUMERIC=C               LC_TIME=ko_KR.UTF-8        LC_COLLATE=ko_KR.UTF-8     LC_MONETARY=ko_KR.UTF-8    LC_MESSAGES=ko_KR.UTF-8   
 [7] LC_PAPER=ko_KR.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=ko_KR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] VariantAnnotation_1.16.4   Rsamtools_1.22.0           Biostrings_2.38.4          XVector_0.10.0             SummarizedExperiment_1.0.2 GenomicRanges_1.22.4      
 [7] GenomeInfoDb_1.6.3         AnnotationHub_2.2.5        org.Hs.eg.db_3.2.3         RSQLite_1.0.0              DBI_0.3.1                  AnnotationDbi_1.32.3      
[13] IRanges_2.4.8              S4Vectors_0.8.11           Biobase_2.30.0             BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.4                  BiocInstaller_1.20.1         futile.logger_1.4.1          GenomicFeatures_1.22.13      bitops_1.0-6                
 [6] futile.options_1.0.0         tools_3.2.3                  zlibbioc_1.16.0              biomaRt_2.26.1               digest_0.6.9                
[11] BSgenome_1.38.0              shiny_0.13.2                 curl_0.9.7                   rtracklayer_1.30.4           httr_1.1.0                  
[16] R6_2.1.2                     XML_3.98-1.4                 BiocParallel_1.4.3           lambda.r_1.1.7               codetools_0.2-14            
[21] GenomicAlignments_1.6.3      htmltools_0.3.5              mime_0.4                     interactiveDisplayBase_1.8.0 xtable_1.8-2                
[26] httpuv_1.3.3                 RCurl_1.95-4.8  
annotation software error vcf annotationhub • 1.4k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States

Hi,

Thanks for the report. There were 2 bugs in the generation of the metadata. The first was that 'genome' was hard coded as hg19 and the second was that 'tags' c("GRCh37", "GRCh38") were being recycled for the 8 files instead of being rep'd out (i.e., 4 GRCh37 files followed by 4 GRCh38).

The resources are fine, saved as vcf files in S3 so no changed needed there. I've fixed the metadata data and a snapshot date of

> hub <- AnnotationHub()
updating metadata: retrieving 1 resource
  |======================================================================| 100%
snapshotDate(): 2016-04-22


should produce the correct results:

> mcols(query(hub, 'VcfFile'))[c("genome", "tags")]
DataFrame with 8 rows and 2 columns
             genome               tags
        <character>        <character>
AH50420      GRCh37 dbSNP, GRCh37, VCF
AH50421      GRCh37 dbSNP, GRCh37, VCF
AH50422      GRCh37 dbSNP, GRCh37, VCF
AH50423      GRCh37 dbSNP, GRCh37, VCF
AH50424      GRCh38 dbSNP, GRCh38, VCF
AH50425      GRCh38 dbSNP, GRCh38, VCF
AH50426      GRCh38 dbSNP, GRCh38, VCF
AH50427      GRCh38 dbSNP, GRCh38, VCF


Valerie

ADD COMMENT

Login before adding your answer.

Traffic: 614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6