Question: Metadata Missing from vcf when reading in with readVcfAsVRanges
0
gravatar for summerela
4.7 years ago by
summerela0
Seattle, WA, United States
summerela0 wrote:

Hello- 

It's me again. I am trying to read in the kaviar database (vcf format) using VariantAnnotation()'s readVcfAsVRanges. It reads in the file, but gives me an error and does not include any metadata information. 

Here is what the VCF looks like: 

##fileformat=VCFv4.1
##fileDate=20150211
##source=bin/makeVCF.pl
##reference=Kaviar-141127 (hg19)
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele Count">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in data sources">
##INFO=<ID=END,Number=.,Type=Integer,Description="End position">
##INFO=<ID=DS,Number=A,Type=String,Description="Data Sources containing allele">
##contig=<ID=1,assembly="hg19">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       39998059        rs79139830      A       G       .       .       AF=0.0003573;AC=6;AN=16794;DS=phase3-ASW|phase3-LWK|phase3-PUR|phase3-YRI
1       39998084        rs115898201     C       A       .       .       AF=0.0014291;AC=24;AN=16794;DS=phase3-ACB|phase3-ASW|phase3-ESN|phase3-GWD|phase3-MSL|phase3-YRI
1       39998085        rs189533208     G       A       .       .       AF=5.95e-05;AC=1;AN=16794;DS=phase3-MXL
1       39998121        1:39998121_G/A  G       A       .       .       AF=5.95e-05;AC=1;AN=16794;DS=GS000012713
1       39998137        rs555254194     T       C       .       .       AF=5.95e-05;AC=1;AN=16794;DS=phase3-YRI
1       39998147        1:39998147_T/TA T       TA      .       .       AF=5.95e-05;AC=1;AN=16794;DS=GS000015708
1       39998158        1:39998158_C/CA C       CA      .       .       AF=5.95e-05;AC=1;AN=16794;DS=GS000009931
1       39998160        rs182185201     A       C       .       .       AF=0.0007741;AC=13;AN=16794;DS=Malay|UK10K|phase3-CDX|phase3-CHS|phase3-KHV|phase3-STU
1       39998190        1:39998190_G/A  G       A       .       .       AF=5.95e-05;AC=1;AN=16794;DS=UK10K

Here is my command, and the outcome: 

> ##read in edited kaviar vcf 
> kaviar <- readVcfAsVRanges(kaviar, humie_ref)
Warning message:
In .vcf_usertag(map, tag, ...) :
  ScanVcfParam ‘geno’ fields not present: ‘AD’
> kaviar
VRanges object with 242 ranges and 0 metadata columns:
        seqnames               ranges strand         ref              alt     totalDepth       refDepth       altDepth   sampleNames softFilterMatrix
           <Rle>            <IRanges>  <Rle> <character> <characterOrRle> <integerOrRle> <integerOrRle> <integerOrRle> <factorOrRle>         <matrix>
    [1]        1 [39998059, 39998059]      +           A                G           <NA>           <NA>           <NA>          <NA>                 
    [2]        1 [39998084, 39998084]      +           C                A           <NA>           <NA>           <NA>          <NA>                 
    [3]        1 [39998085, 39998085]      +           G                A           <NA>           <NA>           <NA>          <NA>                 
    [4]        1 [39998121, 39998121]      +           G                A           <NA>           <NA>           <NA>          <NA>                 
    [5]        1 [39998137, 39998137]      +           T                C           <NA>           <NA>           <NA>          <NA>                 
    ...      ...                  ...    ...         ...              ...            ...            ...            ...           ...              ...
  [238]        1 [39999275, 39999275]      +           T                A           <NA>           <NA>           <NA>          <NA>                 
  [239]        1 [39999288, 39999288]      +           A                T           <NA>           <NA>           <NA>          <NA>                 
  [240]        1 [39999318, 39999319]      +          AG                A           <NA>           <NA>           <NA>          <NA>                 
  [241]        1 [39999319, 39999320]      +          GG                G           <NA>           <NA>           <NA>          <NA>                 
  [242]        1 [39999330, 39999330]      +           A                G           <NA>           <NA>           <NA>          <NA>                 
  -------
  seqinfo: 1 sequence from hg19 genome; no seqlengths
  hardFilters: NULL

Thanks in advance for your help! 

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] stringi_0.3-1            BiocInstaller_1.16.1     VariantAnnotation_1.12.9 Rsamtools_1.18.2         Biostrings_2.34.1        XVector_0.6.0           
 [7] GenomicRanges_1.18.4     GenomeInfoDb_1.2.4       IRanges_2.0.1            S4Vectors_0.4.0          BiocGenerics_0.12.1     

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.28.1    base64enc_0.1-2         BatchJobs_1.5           BBmisc_1.9              Biobase_2.26.0          BiocParallel_1.0.3     
 [7] biomaRt_2.22.0          bitops_1.0-6            brew_1.0-6              BSgenome_1.34.1         checkmate_1.5.1         codetools_0.2-10       
[13] DBI_0.3.1               digest_0.6.8            fail_1.2                foreach_1.4.2           GenomicAlignments_1.2.1 GenomicFeatures_1.18.3 
[19] iterators_1.0.7         packrat_0.4.3           RCurl_1.95-4.5          RSQLite_1.0.0           rtracklayer_1.26.2      sendmailR_1.2-1        
[25] stringr_0.6.2           tools_3.1.2             XML_3.98-1.1            zlibbioc_1.12.0        

 

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by summerela0
Answer: Metadata Missing from vcf when reading in with readVcfAsVRanges
1
gravatar for Valerie Obenchain
4.7 years ago by
United States
Valerie Obenchain6.7k wrote:

The warning says there is no 'AD' field in your file. You can import metadata fields by providing param=ScanVcfParam(). See ?ScanVcfParam for details on specifying fields and ?ScanVcfHeader for exploring header info.

Valerie

ADD COMMENTlink written 4.7 years ago by Valerie Obenchain6.7k
Answer: Metadata Missing from vcf when reading in with readVcfAsVRanges
1
gravatar for summerela
4.7 years ago by
summerela0
Seattle, WA, United States
summerela0 wrote:

Hi Valerie-

Thanks for your help. Using mcols() instead of info() for the VRanges object worked: 

##set reference 
humie_ref = "hg19"

#########################
## Read in Kaviar File ##
#########################
##add kaviar info fields
kav_params = VRangesScanVcfParam(info= c("AF", "AC", "AN", "DS"))

##read in kaviar vcf as VRange object
kaviar <- readVcfAsVRanges(Kaviar.vcf.gz, humie_ref, param=kav_params)

##rename chromosomes to match with vcf files
kaviar <- renameSeqlevels(kaviar, c("1"="chr1"))

##################################
## Gather VCF files to process  ## 
##################################
##read in everything but sample data for speediness
vcf_param = VRangesScanVcfParam(samples=NA)
vcf <- readVcfAsVRanges(test.vcf, humie_ref, param=vcf_param)

#################
## Match SNP's ##
#################
##create data frames of info to match on
vcf.df = data.frame(chr =as.character(seqnames(vcf)), start = start(vcf), end = end(vcf), ref = as.character(ref(vcf)), alt=alt(vcf), mcols(vcf), stringsAsFactors=FALSE)

kav.df = data.frame(chr =as.character(seqnames(kaviar)), start = start(kaviar), end = end(kaviar), ref = as.character(ref(kaviar)), alt=alt(kaviar), mcols(kaviar), stringsAsFactors=FALSE)
ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by summerela0

Hi,

Great. It looks like you have everything sorted out. Just a couple of FYIs,

- VRangesScanVcfParam() is deprecated so it's best to use ScanVcfParam()

- In a previous post you were using 'metadata' as an argument to ScanVcfParam which is not a valid. Only 'info', 'geno', 'samples' and 'fixed' are valid arguments. The man page ?ScanVcfParam describes the valid fields. Above you are using 'info' in VRangesScanVcfParam() so you've probably figured that out, I just wanted to confirm.

- 'AD' (allelic depth) is a genotype or FORMAT field. The warning you saw with readVcfAsVRanges() was a warning, not an error. The VRanges class is designed to hold variants and by default readVcfAsVRanges() tries to read in the AD field. It's ok if the file doesn't have the field.

- If you haven't tried it yet, scanVcfHeader(file) is useful to get a listing of all fields in the file. If you do

geno(scanVcfHeader(Kav_ranged.vcf.gz))

you'll see a DataFrame of geno information and 'AD' will not be listed.

Valerie

 

 

ADD REPLYlink written 4.7 years ago by Valerie Obenchain6.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 308 users visited in the last hour