Question: VariantAnnotation CollapsedVCF trouble
0
gravatar for Fabian Grammes
3.1 years ago by
Fabian Grammes20 wrote:

Dear All

I'm trying to access columns from the CollapsedVCF object generated by VariantAnnotation::readVcf. I have data for 3847 SNPs in the object, but when sub-setting I get 2 additional entries which I don't know where they come from. Looking at the CollapsedVCF object I can see that there seems to be inconsistency in the length listData (see below). 

Help would be much appreciated!

cheers, Fabian 

> dim(my_vcf)
[1] 3847  190
> dim(info(my_vcf))
[1] 3847   20
> length(unlist(info(my_vcf)$AF))
[1] 3849

str(info(my_vcf))
Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
  ..@ rownames       : NULL
  ..@ nrows          : int 3847
  ..@ listData       :List of 20
  .. ..$ AC             :Formal class 'CompressedIntegerList' [package "IRanges"] with 5 slots
  .. .. .. ..@ elementType    : chr "integer"
  .. .. .. ..@ elementMetadata: NULL
  .. .. .. ..@ metadata       : list()
  .. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. .. .. .. ..@ end            : int [1:3847] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. .. ..@ elementType    : chr "integer"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ unlistData     : int [1:3849] 234 27 15 195 96 89 121 79 6 3 ...
  .. ..$ AF             :Formal class 'CompressedNumericList' [package "IRanges"] with 5 slots
  .. .. .. ..@ elementType    : chr "numeric"
  .. .. .. ..@ elementMetadata: NULL
  .. .. .. ..@ metadata       : list()
  .. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. .. .. .. ..@ end            : int [1:3847] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. .. ..@ elementType    : chr "integer"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ unlistData     : num [1:3849] 0.616 0.082 0.04 0.595 0.471 ...
  .. ..$ AN             : int [1:3847] 380 330 374 328 204 200 140 314 348 324 ...
  .. ..$ BaseQRankSum   : num [1:3847] 0.337 0.715 0.361 0 0 0.727 -0.684 0.55 0.054 0.406 ...
  .. ..$ ClippingRankSum: num [1:3847] 0.06 -0.528 0.358 0.278 0.358 0.72 0.48 0.322 0.369 0.736 ...
  .. ..$ DP             : int [1:3847] 3962 5270 1444 958 342 321 249 609 4623 2274 ...
  .. ..$ DS             : logi [1:3847] FALSE FALSE FALSE FALSE FALSE FALSE ...
  .. ..$ END            : int [1:3847] NA NA NA NA NA NA NA NA NA NA ...
  .. ..$ ExcessHet      : num [1:3847] 9.3468 2.9037 4.4809 0.0002 0 ...
  .. ..$ FS             : num [1:3847] 0 2.51 3.32 0 6.41 ...
  .. ..$ HaplotypeScore : num [1:3847] NA NA NA NA NA NA NA NA NA NA ...
  .. ..$ InbreedingCoeff: num [1:3847] -0.0929 -0.0686 -0.0657 0.205 0.3577 ...
  .. ..$ MLEAC          :Formal class 'CompressedIntegerList' [package "IRanges"] with 5 slots
  .. .. .. ..@ elementType    : chr "integer"
  .. .. .. ..@ elementMetadata: NULL
  .. .. .. ..@ metadata       : list()
  .. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. .. .. .. ..@ end            : int [1:3847] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. .. ..@ elementType    : chr "integer"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ unlistData     : int [1:3849] 234 29 16 209 108 105 122 88 6 3 ...
  .. ..$ MLEAF          :Formal class 'CompressedNumericList' [package "IRanges"] with 5 slots
  .. .. .. ..@ elementType    : chr "numeric"
  .. .. .. ..@ elementMetadata: NULL
  .. .. .. ..@ metadata       : list()
  .. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
  .. .. .. .. .. ..@ end            : int [1:3847] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. .. ..@ elementType    : chr "integer"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ unlistData     : num [1:3849] 0.616 0.088 0.043 0.637 0.529 ...
  .. ..$ MQ             : num [1:3847] 60 60 60 59.5 55.1 ...
  .. ..$ MQRankSum      : num [1:3847] -0.025 -0.082 0 0.421 0.358 0.358 0.48 0.55 0.406 -0.718 ...
  .. ..$ QD             : num [1:3847] 18.3 21.4 15 25.7 28.1 ...
  .. ..$ RAW_MQ         : num [1:3847] NA NA NA NA NA NA NA NA NA NA ...
  .. ..$ ReadPosRankSum : num [1:3847] 0.127 0.736 0.406 0 0.736 0.358 0.736 0.55 -0.553 -0.042 ...
  .. ..$ SOR            : num [1:3847] 0.669 0.663 0.87 0.632 0.306 0.768 0.916 0.563 0.564 1.19 ...
  ..@ elementType    : chr "ANY"
  ..@ elementMetadata: NULL
  ..@ metadata       : list()

 

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] lme4_1.1-12                snpStats_1.22.0           
 [3] Matrix_1.2-6               survival_2.39-5           
 [5] VariantAnnotation_1.18.7   Rsamtools_1.24.0          
 [7] Biostrings_2.40.2          XVector_0.12.1            
 [9] SummarizedExperiment_1.2.3 Biobase_2.32.0            
[11] GenomicRanges_1.24.2       GenomeInfoDb_1.8.3        
[13] IRanges_2.6.1              S4Vectors_0.10.2          
[15] BiocGenerics_0.18.0        Ssa.RefSeq.db_1.2         
[17] RSQLite_1.0.0              DBI_0.4-1                 
[19] limma_3.28.17              dplyr_0.5.0               
[21] plyr_1.8.4                 openxlsx_3.0.0            

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.6             nloptr_1.0.4            compiler_3.3.1         
 [4] GenomicFeatures_1.24.5  bitops_1.0-6            tools_3.3.1            
 [7] zlibbioc_1.18.0         biomaRt_2.28.0          nlme_3.1-128           
[10] tibble_1.1              BSgenome_1.40.1         lattice_0.20-33        
[13] rtracklayer_1.32.2      grid_3.3.1              R6_2.1.2               
[16] AnnotationDbi_1.34.4    XML_3.98-1.4            BiocParallel_1.6.3     
[19] minqa_1.2.4             magrittr_1.5            MASS_7.3-45            
[22] GenomicAlignments_1.8.4 splines_3.3.1           assertthat_0.1         
[25] RCurl_1.95-4.8          lazyeval_0.2.0   

 

 

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Fabian Grammes20
Answer: VariantAnnotation CollapsedVCF trouble
1
gravatar for Michael Lawrence
3.1 years ago by
United States
Michael Lawrence11k wrote:

Presumably there are 2 rows with 2 alts and thus 2 AF values.

You can check for this with code like:

which(lengths(info(my_vcf)$AF) > 1)

And you could use expand(my_vcf) to get an object where there is only one alt per row.

ADD COMMENTlink written 3.1 years ago by Michael Lawrence11k
Answer: VariantAnnotation CollapsedVCF trouble
0
gravatar for Fabian Grammes
3.1 years ago by
Fabian Grammes20 wrote:

Thanks that solves it ! Found two entries with 2 AF values. Really gave me a headache


 

ADD COMMENTlink written 3.1 years ago by Fabian Grammes20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 205 users visited in the last hour