VariantAnnotation CollapsedVCF trouble
2
0
Entering edit mode
@fabian-grammes-6591
Last seen 4.2 years ago

Dear All

I'm trying to access columns from the CollapsedVCF object generated by VariantAnnotation::readVcf. I have data for 3847 SNPs in the object, but when sub-setting I get 2 additional entries which I don't know where they come from. Looking at the CollapsedVCF object I can see that there seems to be inconsistency in the length listData (see below). 

Help would be much appreciated!

cheers, Fabian 

> dim(my_vcf)
[1] 3847  190
> dim(info(my_vcf))
[1] 3847   20
> length(unlist(info(my_vcf)$AF))
[1] 3849

str(info(my_vcf))
Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
  ..@ rownames       : NULL
  ..@ nrows          : int 3847
  ..@ listData       :List of 20
  .. ..$ AC             :Formal class 'CompressedIntegerList' [package "IRanges"] with 5 slots
  .. .. .. ..@ elementType    : chr "integer"
  .. .. .. ..@ elementMetadata: NULL
  .. .. .. ..@ metadata       : list()
  .. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. .. .. .. ..@ end            : int [1:3847] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. .. ..@ elementType    : chr "integer"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ unlistData     : int [1:3849] 234 27 15 195 96 89 121 79 6 3 ...
  .. ..$ AF             :Formal class 'CompressedNumericList' [package "IRanges"] with 5 slots
  .. .. .. ..@ elementType    : chr "numeric"
  .. .. .. ..@ elementMetadata: NULL
  .. .. .. ..@ metadata       : list()
  .. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. .. .. .. ..@ end            : int [1:3847] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. .. ..@ elementType    : chr "integer"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ unlistData     : num [1:3849] 0.616 0.082 0.04 0.595 0.471 ...
  .. ..$ AN             : int [1:3847] 380 330 374 328 204 200 140 314 348 324 ...
  .. ..$ BaseQRankSum   : num [1:3847] 0.337 0.715 0.361 0 0 0.727 -0.684 0.55 0.054 0.406 ...
  .. ..$ ClippingRankSum: num [1:3847] 0.06 -0.528 0.358 0.278 0.358 0.72 0.48 0.322 0.369 0.736 ...
  .. ..$ DP             : int [1:3847] 3962 5270 1444 958 342 321 249 609 4623 2274 ...
  .. ..$ DS             : logi [1:3847] FALSE FALSE FALSE FALSE FALSE FALSE ...
  .. ..$ END            : int [1:3847] NA NA NA NA NA NA NA NA NA NA ...
  .. ..$ ExcessHet      : num [1:3847] 9.3468 2.9037 4.4809 0.0002 0 ...
  .. ..$ FS             : num [1:3847] 0 2.51 3.32 0 6.41 ...
  .. ..$ HaplotypeScore : num [1:3847] NA NA NA NA NA NA NA NA NA NA ...
  .. ..$ InbreedingCoeff: num [1:3847] -0.0929 -0.0686 -0.0657 0.205 0.3577 ...
  .. ..$ MLEAC          :Formal class 'CompressedIntegerList' [package "IRanges"] with 5 slots
  .. .. .. ..@ elementType    : chr "integer"
  .. .. .. ..@ elementMetadata: NULL
  .. .. .. ..@ metadata       : list()
  .. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. .. .. .. ..@ end            : int [1:3847] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. .. ..@ elementType    : chr "integer"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ unlistData     : int [1:3849] 234 29 16 209 108 105 122 88 6 3 ...
  .. ..$ MLEAF          :Formal class 'CompressedNumericList' [package "IRanges"] with 5 slots
  .. .. .. ..@ elementType    : chr "numeric"
  .. .. .. ..@ elementMetadata: NULL
  .. .. .. ..@ metadata       : list()
  .. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. .. .. .. ..@ end            : int [1:3847] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. .. ..@ elementType    : chr "integer"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ unlistData     : num [1:3849] 0.616 0.088 0.043 0.637 0.529 ...
  .. ..$ MQ             : num [1:3847] 60 60 60 59.5 55.1 ...
  .. ..$ MQRankSum      : num [1:3847] -0.025 -0.082 0 0.421 0.358 0.358 0.48 0.55 0.406 -0.718 ...
  .. ..$ QD             : num [1:3847] 18.3 21.4 15 25.7 28.1 ...
  .. ..$ RAW_MQ         : num [1:3847] NA NA NA NA NA NA NA NA NA NA ...
  .. ..$ ReadPosRankSum : num [1:3847] 0.127 0.736 0.406 0 0.736 0.358 0.736 0.55 -0.553 -0.042 ...
  .. ..$ SOR            : num [1:3847] 0.669 0.663 0.87 0.632 0.306 0.768 0.916 0.563 0.564 1.19 ...
  ..@ elementType    : chr "ANY"
  ..@ elementMetadata: NULL
  ..@ metadata       : list()

 

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] lme4_1.1-12                snpStats_1.22.0           
 [3] Matrix_1.2-6               survival_2.39-5           
 [5] VariantAnnotation_1.18.7   Rsamtools_1.24.0          
 [7] Biostrings_2.40.2          XVector_0.12.1            
 [9] SummarizedExperiment_1.2.3 Biobase_2.32.0            
[11] GenomicRanges_1.24.2       GenomeInfoDb_1.8.3        
[13] IRanges_2.6.1              S4Vectors_0.10.2          
[15] BiocGenerics_0.18.0        Ssa.RefSeq.db_1.2         
[17] RSQLite_1.0.0              DBI_0.4-1                 
[19] limma_3.28.17              dplyr_0.5.0               
[21] plyr_1.8.4                 openxlsx_3.0.0            

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.6             nloptr_1.0.4            compiler_3.3.1         
 [4] GenomicFeatures_1.24.5  bitops_1.0-6            tools_3.3.1            
 [7] zlibbioc_1.18.0         biomaRt_2.28.0          nlme_3.1-128           
[10] tibble_1.1              BSgenome_1.40.1         lattice_0.20-33        
[13] rtracklayer_1.32.2      grid_3.3.1              R6_2.1.2               
[16] AnnotationDbi_1.34.4    XML_3.98-1.4            BiocParallel_1.6.3     
[19] minqa_1.2.4             magrittr_1.5            MASS_7.3-45            
[22] GenomicAlignments_1.8.4 splines_3.3.1           assertthat_0.1         
[25] RCurl_1.95-4.8          lazyeval_0.2.0   

 

 

variantannotation subsetting • 1.1k views
ADD COMMENT
1
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

Presumably there are 2 rows with 2 alts and thus 2 AF values.

You can check for this with code like:

which(lengths(info(my_vcf)$AF) > 1)

And you could use expand(my_vcf) to get an object where there is only one alt per row.

ADD COMMENT
0
Entering edit mode
@fabian-grammes-6591
Last seen 4.2 years ago

Thanks that solves it ! Found two entries with 2 AF values. Really gave me a headache


 

ADD COMMENT

Login before adding your answer.

Traffic: 619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6