Question: Add data to Vcf Info Field
3.4 years ago by
I have a vcf file to which I'm trying to add additional annotations. I wanted to know how I can add additional fields to the Info DataFrame. Is there a helper function that allows me to do so in a straightforward manner or do I need to directly manipulate the Info DataFrame itself?



> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-suse-linux-gnu (64-bit)

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RMySQL_0.9-3                            AnnotationForge_1.8.1                   human.db0_3.0.0                        
 [4] Homo.sapiens_1.1.2                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0                     
 [7] GO.db_3.0.0                             RSQLite_0.11.4                          DBI_0.3.1                              
[10] OrganismDbi_1.8.0                       GenomicFeatures_1.18.1                  AnnotationDbi_1.28.0                   
[13] Biobase_2.26.0                          VariantAnnotation_1.12.1                Rsamtools_1.18.0                       
[16] Biostrings_2.34.0                       XVector_0.6.0                           GenomicRanges_1.18.1                   
[19] GenomeInfoDb_1.2.0                      IRanges_2.0.0                           S4Vectors_0.4.0                        
[22] BiocGenerics_0.12.0                     BiocInstaller_1.16.0                   

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2         BatchJobs_1.4           BBmisc_1.7              BiocParallel_1.0.0      biomaRt_2.22.0         
 [6] bitops_1.0-6            brew_1.0-6              BSgenome_1.34.0         checkmate_1.5.0         codetools_0.2-8        
[11] digest_0.6.4            evaluate_0.5.5          fail_1.2                foreach_1.4.2           formatR_1.0            
[16] GenomicAlignments_1.2.0 graph_1.44.0            iterators_1.0.7         knitr_1.7               RBGL_1.42.0            
[21] RCurl_1.95-4.3          rtracklayer_1.26.1      sendmailR_1.2-1         stringr_0.6.2           tools_3.1.1            
[26] XML_3.98-1.1            yaml_2.1.13             zlibbioc_1.12.0


written 3.4 years ago by Moiz Bootwalla50
3.4 years ago by
Valerie Obenchain ♦♦ 6.4k
You can use the 'info' getter and setter. A list of all getter/setters for the VCF class are on the ?VCF man page.

fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") 

vcf <- readVcf(fl, "hg19")

> names(info(vcf))
[1] "NS" "DP" "AF" "AA" "DB" "H2"

Use the standard '$' to add a variable. You'll see a warning about no corresponding header information.

> info(vcf)$newVar <- 1:5
Warning message:
info fields with no header: newVar 

> names(info(vcf))
[1] "NS"     "DP"     "AF"     "AA"     "DB"     "H2"     "newVar"

You can add a line to the header DataFrame for 'newVar'. The header is accessed with header():

DataFrame with 6 rows and 3 columns
        Number        Type                 Description
   <character> <character>                 <character>
NS           1     Integer Number of Samples With Data
DP           1     Integer                 Total Depth
AF           A       Float            Allele Frequency
AA           1      String            Ancestral Allele
DB           0        Flag dbSNP membership, build 129
H2           0        Flag          HapMap2 membership

To remove instead of add variables, use '[' :

> info(vcf) <- info(vcf)[,1:2]
> info(vcf)
DataFrame with 5 rows and 2 columns
                      NS        DP
               <integer> <integer>
rs6054257              3        14
20:17330_T/A           3        11
rs6040355              2        10
20:1230237_T/.         3        13
microsat1              3         9






written 3.4 years ago by Valerie Obenchain ♦♦ 6.4k

Thanks Valerie. This is exactly what I was looking for.

written 3.4 years ago by Moiz Bootwalla50
