I am working in R with the VariantAnnotation package and the vcf file from the ExAC consortium, ExAC.r0.3.1.sites.vep.vcf (release 0.3). My aim is to extract simple data frames for the "AC_Adj" and "AN_Adj" info columns. I have used the below code to achieve this:
# define paramaters on how to filter the vcf file AC.adj.param <- ScanVcfParam(info="AC_Adj") AN.adj.param <- ScanVcfParam(info="AN_Adj") # load ALL allele counts (AN) and alt allele counts (AC) from exac vcf file raw.exac.AC.adj. <- readVcf("/Volumes/Emilly\ 1TB/ExAC.r0.3.1.sites.vep.vcf", "hg19", param=AC.adj.param) raw.exac.AN.adj. <- readVcf("/Volumes/Emilly\ 1TB/ExAC.r0.3.1.sites.vep.vcf", "hg19", param=AN.adj.param) # extract just the allele counts and corressponding chr location with allele tags from exac vcf file - in dataframe/s4 class sclass.exac.AC.adj <- (info(raw.exac.AC.adj.)) sclass.exac.AN.adj <- (info(raw.exac.AN.adj.)) # make the allele counts into a plain data frame - chr positions are only row names and not their own column df.exac.AC.adj <- as.data.frame(sclass.exac.AC.adj) df.exac.AN.adj <- as.data.frame(sclass.exac.AN.adj) # make the chr positions an actual column and remove the row names to make it cleaner df.exac.AC.adj$chr.position <- row.names(df.exac.AC.adj) df.exac.AN.adj$chr.position <- row.names(df.exac.AN.adj) row.names(df.exac.AC.adj) <- NULL row.names(df.exac.AN.adj) <- NULL
However, while most of the row names are correct, some of the row names are returned are rsIDs rather than chr:position.
Essentially, a subset of the data returned looks like this:
AC_Adj chr.position
70 2 1:69395
71 9 1:69409
72 1985 rs140739101
73 4 1:69438
74 7 rs142004627
75 1 1:69474
76 1 1:69478
77 1 1:69486
78 2 1:69487
79 10 1:69489
80 59 rs150690004
While I would like the returned data frame to look like this:
AC_Adj chr.position
70 2 1:69395
71 9 1:69409
72 1985 1:69428
73 4 1:69438
74 7 1:69453
75 1 1:69474
76 1 1:69478
77 1 1:69486
78 2 1:69487
79 10 1:69489
80 59 1:69496
I believe there is some error in my code which is introducing the bug but have not been able to identify it and would greatly appreciate any input as to why this is happening. There may also be a more efficient way to achieve my goal that I have overlooked. Many thanks for any help anyone can provide.
Thank you , that very elegantly solves my issue. I will keep that in mind in future.