how to make sure PureCN gets somatic status from VCF?
Entering edit mode
Last seen 6.0 years ago


I'm using PureCN.  I've experimented with both the release version (1.2.3) and development version (1.5.9).  Each time, regardless of the version of PureCN, I run it I get the message : "VCF does not contain somatic status and no SNP blacklist provided." .  When I first saw this error, I looked at the "example_vcf.vcf" VCF file provided with PureCN (under the extdata directory in the Package directory) and saw in the INFO column "SOMATIC;VT=SNP" and tried to use that observational knowledge to try to address the message.  I added "SOMATIC;VT=SNP" to the INFO fields in the VCF I am using so the values in the INFO field look this this : "SOMATIC;VT=SNP;SOMETHING;SOMETHING_ELSE" .  I cannot post the VCF here and "SOMETHING" and "SOMETHING_ELSE" stand for private data.  Despite adding that to the INFO fields, I still get the message "VCF does not contain somatic status and no SNP blacklist provided" so my question is : "How do I make sure PureCN gets the 'somatic status' so as to make the message not appear?"

By the way, I am actually running R (and PureCN) inside a docker container with an image I built.  Is there by chance a PureCN official image for improved analysis replication?

Below is my sessionInfo() in case it might be helpful.

Thanks to anyone for any help!






R version 3.3.1 (2016-06-21)

Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux stretch/sid

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] stats4    parallel  methods   stats     graphics  grDevices utils    
[8] datasets  base     

other attached packages:
 [1] PureCN_1.5.9               VariantAnnotation_1.20.1  
 [3] Rsamtools_1.26.1           Biostrings_2.42.0         
 [5] XVector_0.14.0             SummarizedExperiment_1.4.0
 [7] Biobase_2.34.0             GenomicRanges_1.26.1      
 [9] GenomeInfoDb_1.10.1        IRanges_2.8.1             
[11] S4Vectors_0.12.0           BiocGenerics_0.20.0       
[13] DNAcopy_1.48.0            

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8              AnnotationDbi_1.36.0     GenomicAlignments_1.10.0
 [4] zlibbioc_1.20.0          BiocParallel_1.8.1       BSgenome_1.42.0         
 [7] lattice_0.20-34          tools_3.3.1              grid_3.3.1              
[10] data.table_1.9.8         DBI_0.5-1                digest_0.6.10           
[13] Matrix_1.2-7.1           RColorBrewer_1.1-2       rtracklayer_1.34.1      
[16] bitops_1.0-6             biomaRt_2.30.0           RCurl_1.95-4.8          
[19] memoise_1.0.0            RSQLite_1.1              GenomicFeatures_1.26.0  
[22] XML_3.98-1.5            

PureCN • 678 views
Entering edit mode
Last seen 5 months ago
United States

Hi Eddie,

I see that this warning is confusing. It's actually not an error and there is no need to change the VCF. If you used MuTect to generate the VCF, this probably means you ran it without matched normal. This is fine, but to get optimal results, PureCN needs to know if some of the heterozygous SNPs in the VCF have significant mapping biases, i.e. allelic fractions significantly different from 0.5 in normal diploid regions.

Most of the recent work in the developer version went into correcting mapping biases using a pool of normals. All you need to do is to create a panel of normal VCF (for example with GATK CombineVariants tools after running MuTect on the normals) and then provide runAbsoluteCN this combined VCF as described in the developer vignette.

Without a panel of normal VCF, you can provide a BED file to exclude regions with poor mappability. 

No plans for official docker images yet, but definitely something I'm interested in.

Thanks for your report, I will clarify the documentation and warning.



Login before adding your answer.

Traffic: 472 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6