Is it possible to speed up VariantAnnotation's rowRanges function when converting to a dataframe?
1
0
Entering edit mode
Ahdee ▴ 50
@ahdee-8938
Last seen 18 months ago
United States

Hi, I'm trying to convert a vcf to a dataframe using the rowRanges function.

df = data.frame ( rowRanges(vcf)  )

it works well however its extremely slow taking up to about 1-3 mins per vcf file; whereas opening the vcf file itself takes only seconds.

Does anyone know of a more efficient/faster way to do this?

thank you.

varian VariantToolsData • 551 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 51 minutes ago
United States

Do you actually need a data.frame? Conversion to a DataFrame is much faster, and it's a very similar object from the user's perspective.

> fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
       vcf <- readVcf(fl, "hg19")
>        vcf <- readVcf(fl, "hg19")
> system.time(data.frame(rowRanges(vcf)))
   user  system elapsed 
 18.122   0.327  18.447 
> system.time(as(rowRanges(vcf), "DataFrame"))
   user  system elapsed 
  0.011   0.000   0.011 

> as(rowRanges(vcf), "DataFrame")
DataFrame with 10376 rows and 6 columns
                      X paramRangeID            REF                ALT
              <GRanges>     <factor> <DNAStringSet> <DNAStringSetList>
rs7410291   22:50300078           NA              A                  G
rs147922003 22:50300086           NA              C                  T
rs114143073 22:50300101           NA              G                  A
rs141778433 22:50300113           NA              C                  T
rs182170314 22:50300166           NA              C                  T
...                 ...          ...            ...                ...
rs187302552 22:50999536           NA              A                  G
rs9628178   22:50999538           NA              A                  G
rs5770892   22:50999681           NA              A                  G
rs144055359 22:50999830           NA              G                  A
rs114526001 22:50999964           NA              G                  C
                 QUAL      FILTER
            <numeric> <character>
rs7410291         100        PASS
rs147922003       100        PASS
rs114143073       100        PASS
rs141778433       100        PASS
rs182170314       100        PASS
...               ...         ...
rs187302552       100        PASS
rs9628178         100        PASS
rs5770892         100        PASS
rs144055359       100        PASS
rs114526001       100        PASS
>
ADD COMMENT

Login before adding your answer.

Traffic: 1099 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6