Question: How to correct VCF Iranges from width=1 to correct end coordinate
4.5 years ago by
tiffanie.moss0 wrote:

I need to extract the range coordinates from a VCF file. I've been working with a very large VCF file and filtered it over several parameter so that it is now a refines dataset. However, now that I wish to extract the coordinates for the deletions, I find that the IRanges show up with a width of 1, almost as though they are SNPs rather than large deletion events. How can I have IRanges recognizing the true end coordinate of the deletions?

Here is a sample from the data and my session Info is shown below...

> rowData(delly.no11.depth10_50.377_5044.precise.pass)
GRanges object with 373 ranges and 5 metadata columns:
              seqnames                 ranges strand   | paramRangeID            REF             ALT      QUAL      FILTER
                 <Rle>              <IRanges>  <Rle>   |     <factor> <DNAStringSet> <CharacterList> <numeric> <character>
  DEL00119550        1 [  9903713,   9903713]      *   |         <NA>              N           <DEL>      <NA>        PASS
  DEL00139228        1 [ 11524865,  11524865]      *   |         <NA>              N           <DEL>      <NA>        PASS
  DEL00085052        1 [ 20398921,  20398921]      *   |         <NA>              N           <DEL>      <NA>        PASS
  DEL00051725        1 [117858481, 117858481]      *   |         <NA>              N           <DEL>      <NA>        PASS
  DEL00033442        1 [130125517, 130125517]      *   |         <NA>              N           <DEL>      <NA>        PASS

  seqinfo: 33 sequences from the genome; no seqlengths
> info(delly.no11.depth10_50.377_5044.precise.pass)
DataFrame with 373 rows and 15 columns
                    CIEND         CIPOS        CHR2       END        PE      MAPQ        SR       SRQ
            <IntegerList> <IntegerList> <character> <integer> <integer> <integer> <integer> <numeric>
DEL00119550        -18,18        -18,18           1   9905275        12        60         3  0.975248
DEL00139228      -121,121      -121,121           1  11525275        17        45         7  0.897321
DEL00085052        -10,10        -10,10           1  20399442        28        60         6  1.000000
DEL00051725          -8,8          -8,8           1 117861275        20        60         2  0.974227
DEL00033442      -103,103      -103,103           1 130126131        23        28        11  0.938710
                     CT IMPRECISE   PRECISE     SVLEN      SVTYPE         SVMETHOD
            <character> <logical> <logical> <integer> <character>      <character>
DEL00119550        3to5     FALSE      TRUE      1562         DEL EMBL.DELLYv0.5.6
DEL00139228        3to5     FALSE      TRUE       410         DEL EMBL.DELLYv0.5.6
DEL00085052        3to5     FALSE      TRUE       521         DEL EMBL.DELLYv0.5.6
DEL00051725        3to5     FALSE      TRUE      2794         DEL EMBL.DELLYv0.5.6
DEL00033442        3to5     FALSE      TRUE       614         DEL EMBL.DELLYv0.5.6
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GenomicFeatures_1.18.2   AnnotationDbi_1.28.1     Biobase_2.26.0           ggplot2_1.0.0           
 [5] VariantAnnotation_1.12.4 Rsamtools_1.18.2         GenomicRanges_1.18.3     GenomeInfoDb_1.2.3      
 [9] Biostrings_2.34.0        XVector_0.6.0            IRanges_2.0.0            S4Vectors_0.4.0         
[13] BiocGenerics_0.12.1     

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2         BatchJobs_1.5           BBmisc_1.8              BiocParallel_1.0.0      biomaRt_2.22.0         
 [6] bitops_1.0-6            brew_1.0-6              BSgenome_1.34.0         checkmate_1.5.0         codetools_0.2-9        
[11] colorspace_1.2-4        DBI_0.3.1               digest_0.6.4            fail_1.2                foreach_1.4.2          
[16] GenomicAlignments_1.2.1 grid_3.1.2              gtable_0.1.2            iterators_1.0.7         MASS_7.3-35            
[21] munsell_0.4.2           plyr_1.8.1              proto_0.3-10            Rcpp_0.11.3             RCurl_1.95-4.3         
[26] reshape2_1.4            RSQLite_1.0.0           rtracklayer_1.26.2      scales_0.2.4            sendmailR_1.2-1        
[31] stringr_0.6.2           tools_3.1.2             XML_3.98-1.1            zlibbioc_1.12.0        
ADD COMMENTlink written 4.5 years ago by tiffanie.moss0

Can you also show the code that you used to create/import the SV calls as a VCF in R?

ADD REPLYlink written 4.5 years ago by Julian Gehring1.3k

Hi Tiffanie, I see deletions of width 1 in your VCF object. Why do you think the end coordinates are wrong and need to be corrected? What do you mean by "true end coordinate of the deletion"? Thanks.  H.

ADD REPLYlink written 4.4 years ago by Hervé Pagès ♦♦ 13k
