Question: Genome versions of the VcfFiles from AnnotationHub
1
gravatar for sskimb
3.6 years ago by
sskimb10
sskimb10 wrote:

I found that the genome build of the VcfFiles downloaded from AnnotationHub is obscure: the genome and tags fields mismatch as shown below.

> library(AnnotationHub)
> ah <- AnnotationHub()
> vfs <- query(ah, 'VcfFile')
> mcols(vfs)[, c(1,5,7)]
DataFrame with 8 rows and 3 columns
                                                 title      genome               tags
                                           <character> <character>        <character>
AH50420                        clinvar_20160203.vcf.gz        hg19 dbSNP, GRCh37, VCF
AH50421                   clinvar_20160203_papu.vcf.gz        hg19 dbSNP, GRCh38, VCF
AH50422            common_and_clinical_20160203.vcf.gz        hg19 dbSNP, GRCh37, VCF
AH50423 common_no_known_medical_impact_20160203.vcf.gz        hg19 dbSNP, GRCh38, VCF
AH50424                        clinvar_20160203.vcf.gz        hg19 dbSNP, GRCh37, VCF
AH50425                   clinvar_20160203_papu.vcf.gz        hg19 dbSNP, GRCh38, VCF
AH50426            common_and_clinical_20160203.vcf.gz        hg19 dbSNP, GRCh37, VCF
AH50427 common_no_known_medical_impact_20160203.vcf.gz        hg19 dbSNP, GRCh38, VCF
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=ko_KR.UTF-8       LC_NUMERIC=C               LC_TIME=ko_KR.UTF-8        LC_COLLATE=ko_KR.UTF-8     LC_MONETARY=ko_KR.UTF-8    LC_MESSAGES=ko_KR.UTF-8   
 [7] LC_PAPER=ko_KR.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=ko_KR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] VariantAnnotation_1.16.4   Rsamtools_1.22.0           Biostrings_2.38.4          XVector_0.10.0             SummarizedExperiment_1.0.2 GenomicRanges_1.22.4      
 [7] GenomeInfoDb_1.6.3         AnnotationHub_2.2.5        org.Hs.eg.db_3.2.3         RSQLite_1.0.0              DBI_0.3.1                  AnnotationDbi_1.32.3      
[13] IRanges_2.4.8              S4Vectors_0.8.11           Biobase_2.30.0             BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.4                  BiocInstaller_1.20.1         futile.logger_1.4.1          GenomicFeatures_1.22.13      bitops_1.0-6                
 [6] futile.options_1.0.0         tools_3.2.3                  zlibbioc_1.16.0              biomaRt_2.26.1               digest_0.6.9                
[11] BSgenome_1.38.0              shiny_0.13.2                 curl_0.9.7                   rtracklayer_1.30.4           httr_1.1.0                  
[16] R6_2.1.2                     XML_3.98-1.4                 BiocParallel_1.4.3           lambda.r_1.1.7               codetools_0.2-14            
[21] GenomicAlignments_1.6.3      htmltools_0.3.5              mime_0.4                     interactiveDisplayBase_1.8.0 xtable_1.8-2                
[26] httpuv_1.3.3                 RCurl_1.95-4.8  
ADD COMMENTlink modified 3.6 years ago by Valerie Obenchain6.7k • written 3.6 years ago by sskimb10
Answer: Genome versions of the VcfFiles from AnnotationHub
0
gravatar for Valerie Obenchain
3.6 years ago by
United States
Valerie Obenchain6.7k wrote:

Hi,

Thanks for the report. There were 2 bugs in the generation of the metadata. The first was that 'genome' was hard coded as hg19 and the second was that 'tags' c("GRCh37", "GRCh38") were being recycled for the 8 files instead of being rep'd out (i.e., 4 GRCh37 files followed by 4 GRCh38).

The resources are fine, saved as vcf files in S3 so no changed needed there. I've fixed the metadata data and a snapshot date of

> hub <- AnnotationHub()
updating metadata: retrieving 1 resource
  |======================================================================| 100%
snapshotDate(): 2016-04-22


should produce the correct results:

> mcols(query(hub, 'VcfFile'))[c("genome", "tags")]
DataFrame with 8 rows and 2 columns
             genome               tags
        <character>        <character>
AH50420      GRCh37 dbSNP, GRCh37, VCF
AH50421      GRCh37 dbSNP, GRCh37, VCF
AH50422      GRCh37 dbSNP, GRCh37, VCF
AH50423      GRCh37 dbSNP, GRCh37, VCF
AH50424      GRCh38 dbSNP, GRCh38, VCF
AH50425      GRCh38 dbSNP, GRCh38, VCF
AH50426      GRCh38 dbSNP, GRCh38, VCF
AH50427      GRCh38 dbSNP, GRCh38, VCF


Valerie

ADD COMMENTlink written 3.6 years ago by Valerie Obenchain6.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 173 users visited in the last hour