Search
Question: Add data to Vcf Info Field
0
gravatar for Moiz Bootwalla
3.9 years ago by
United States
Moiz Bootwalla50 wrote:

I have a vcf file to which I'm trying to add additional annotations. I wanted to know how I can add additional fields to the Info DataFrame. Is there a helper function that allows me to do so in a straightforward manner or do I need to directly manipulate the Info DataFrame itself?

Thanks,

Moiz

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-suse-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RMySQL_0.9-3                            AnnotationForge_1.8.1                   human.db0_3.0.0                        
 [4] Homo.sapiens_1.1.2                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 org.Hs.eg.db_3.0.0                     
 [7] GO.db_3.0.0                             RSQLite_0.11.4                          DBI_0.3.1                              
[10] OrganismDbi_1.8.0                       GenomicFeatures_1.18.1                  AnnotationDbi_1.28.0                   
[13] Biobase_2.26.0                          VariantAnnotation_1.12.1                Rsamtools_1.18.0                       
[16] Biostrings_2.34.0                       XVector_0.6.0                           GenomicRanges_1.18.1                   
[19] GenomeInfoDb_1.2.0                      IRanges_2.0.0                           S4Vectors_0.4.0                        
[22] BiocGenerics_0.12.0                     BiocInstaller_1.16.0                   

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2         BatchJobs_1.4           BBmisc_1.7              BiocParallel_1.0.0      biomaRt_2.22.0         
 [6] bitops_1.0-6            brew_1.0-6              BSgenome_1.34.0         checkmate_1.5.0         codetools_0.2-8        
[11] digest_0.6.4            evaluate_0.5.5          fail_1.2                foreach_1.4.2           formatR_1.0            
[16] GenomicAlignments_1.2.0 graph_1.44.0            iterators_1.0.7         knitr_1.7               RBGL_1.42.0            
[21] RCurl_1.95-4.3          rtracklayer_1.26.1      sendmailR_1.2-1         stringr_0.6.2           tools_3.1.1            
[26] XML_3.98-1.1            yaml_2.1.13             zlibbioc_1.12.0

 

ADD COMMENTlink modified 3.9 years ago by Valerie Obenchain ♦♦ 6.6k • written 3.9 years ago by Moiz Bootwalla50
2
gravatar for Valerie Obenchain
3.9 years ago by
Valerie Obenchain ♦♦ 6.6k
United States
Valerie Obenchain ♦♦ 6.6k wrote:

Hi,

You can use the 'info' getter and setter. A list of all getter/setters for the VCF class are on the ?VCF man page.

library(VariantAnnotation)
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") 

vcf <- readVcf(fl, "hg19")

> names(info(vcf))
[1] "NS" "DP" "AF" "AA" "DB" "H2"

Use the standard '$' to add a variable. You'll see a warning about no corresponding header information.

> info(vcf)$newVar <- 1:5
Warning message:
info fields with no header: newVar 

> names(info(vcf))
[1] "NS"     "DP"     "AF"     "AA"     "DB"     "H2"     "newVar"

You can add a line to the header DataFrame for 'newVar'. The header is accessed with header():

>info(header(vcf))
DataFrame with 6 rows and 3 columns
        Number        Type                 Description
   <character> <character>                 <character>
NS           1     Integer Number of Samples With Data
DP           1     Integer                 Total Depth
AF           A       Float            Allele Frequency
AA           1      String            Ancestral Allele
DB           0        Flag dbSNP membership, build 129
H2           0        Flag          HapMap2 membership

To remove instead of add variables, use '[' :

> info(vcf) <- info(vcf)[,1:2]
> info(vcf)
DataFrame with 5 rows and 2 columns
                      NS        DP
               <integer> <integer>
rs6054257              3        14
20:17330_T/A           3        11
rs6040355              2        10
20:1230237_T/.         3        13
microsat1              3         9

 

Valerie

 

 

 

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Valerie Obenchain ♦♦ 6.6k

Thanks Valerie. This is exactly what I was looking for.

ADD REPLYlink written 3.9 years ago by Moiz Bootwalla50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 316 users visited in the last hour