AllelicImbalance with VCFs?
1
0
Entering edit mode
@tim-triche-jr-6322
Last seen 10.2 years ago
Hi all, I'm looking to verify some RNAseq results using (in some cases phased) SNP calls from the CEU trio and from primary samples in-house (case and matched control). AllelicImbalance looks like a terrific library for this, however my BCFs can't be read by Rsamtools::scanBcf. (I get an error "Error: scanBcf: failed to find fmt encoded as '16977' path: /scratch/BAMs/LIU1675A8.hg19.bcf") VariantAnnotation::scanVcf, on the other hand, works fine. So, is there any particular reason why I couldn't hack up the impBcfGL/impBcfGRL method to use VCFs? Also, would it make sense to have a default argument for searchArea that falls back to the assembly in the metadata, if one is specified? The bizarre thing with the BCFs is that I can scan their headers just fine... ! Thanks, Timothy J. Triche, Jr., PhD Jane Anne Nohl Division of Hematology USC/Norris Comprehensive Cancer Center [[alternative HTML version deleted]]
RNASeq Cancer trio AllelicImbalance RNASeq Cancer trio AllelicImbalance • 1.6k views
ADD COMMENT
0
Entering edit mode
@jesper-robert-gadin-6326
Last seen 9.5 years ago
European Union
Hi Tim, Glad to hear you find the package useful also for validation! 1) There is no implementation of scanVcf, because I didn't want the package to depend on too many other packages. Didn't have problems with the scanBcf myself, and therefore saw no reason to include scanVcf. That was the reason, but of course I can add an scanVcf option in the import- method! 2) Did you mean that the default argument for searchArea could include all the ranges in the provided bcf/vcf-file? If so I agree; will include that asap. Expect the new features before next week. /J.R. On Mon, Jan 13, 2014 at 8:21 PM, Tim Triche, Jr. <tim.triche@usc.edu> wrote: > Hi all, > > I'm looking to verify some RNAseq results using (in some cases phased) SNP > calls from the CEU trio and from primary samples in-house (case and matched > control). AllelicImbalance looks like a terrific library for this, however > my BCFs can't be read by Rsamtools::scanBcf. (I get an error "Error: > scanBcf: failed to find fmt encoded as '16977' path: > /scratch/BAMs/LIU1675A8.hg19.bcf") > > VariantAnnotation::scanVcf, on the other hand, works fine. > > So, is there any particular reason why I couldn't hack up > the impBcfGL/impBcfGRL method to use VCFs? Also, would it make sense to > have a default argument for searchArea that falls back to the assembly in > the metadata, if one is specified? > > The bizarre thing with the BCFs is that I can scan their headers just > fine... ! > > Thanks, > > Timothy J. Triche, Jr., PhD > Jane Anne Nohl Division of Hematology > USC/Norris Comprehensive Cancer Center > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hey again, I experimented a little using scanVcf as import-function for ".bcf files", but unfortunately it did not work as well as I hoped. Actually not sure if scanVcf is meant to be used on .bcf files... #took out ex2.vcf that is the example file in the VariantAnnotation-package. Processed it as follows: > asBcf("ex2.vcf",c("20"),"~/Desktop/ex2",overwrite=TRUE) > b <- scanBcf("ex2.bcf") Error: scanBcf: failed to find fmt encoded as '18513' path: ex2.bcf #which gives an error similar to yours #then tried s <- scanVcf("ex2.bcf") #No error! #but the output is to me not really meaningful #My guess is that your bcf file is corrupt in some way. Possibly because you used the asBcf() function. #If you instead use 'Bcftools' for this, the code below is something I know works (important to use same reference as was used when mapping): samtools mpileup -ugSf ../References/Bowtie2Hg19/hg19.fa -r chr1 AllSamples/*.bam | bcftools view -bvcg - > samples-chr1.bcf #Alternatively you can use readVcf on the original .vcf file and then create the ASEset-object like this: VCF <- readVcf("pathToFile") gr <- rowData(VCF) countList <- getAlleleCount(reads, gr, verbose=FALSE) a <- ASEsetFromCountList(rowData = gr, countList) /J.R. On Tue, Jan 14, 2014 at 1:24 PM, Jesper Robert Gadin <j.r.gadin@gmail.com>wrote: > Hi Tim, > > Glad to hear you find the package useful also for validation! > > 1) There is no implementation of scanVcf, because I didn't want the > package to depend on too many other packages. Didn't have problems with the > scanBcf myself, and therefore saw no reason to include scanVcf. That was > the reason, but of course I can add an scanVcf option in the import- method! > > 2) Did you mean that the default argument for searchArea could include > all the ranges in the provided bcf/vcf-file? If so I agree; will include > that asap. > > Expect the new features before next week. > > /J.R. > > > On Mon, Jan 13, 2014 at 8:21 PM, Tim Triche, Jr. <tim.triche@usc.edu>wrote: > >> Hi all, >> >> I'm looking to verify some RNAseq results using (in some cases phased) >> SNP calls from the CEU trio and from primary samples in-house (case and >> matched control). AllelicImbalance looks like a terrific library for this, >> however my BCFs can't be read by Rsamtools::scanBcf. (I get an error >> "Error: scanBcf: failed to find fmt encoded as '16977' path: >> /scratch/BAMs/LIU1675A8.hg19.bcf") >> >> VariantAnnotation::scanVcf, on the other hand, works fine. >> >> So, is there any particular reason why I couldn't hack up >> the impBcfGL/impBcfGRL method to use VCFs? Also, would it make sense to >> have a default argument for searchArea that falls back to the assembly in >> the metadata, if one is specified? >> >> The bizarre thing with the BCFs is that I can scan their headers just >> fine... ! >> >> Thanks, >> >> Timothy J. Triche, Jr., PhD >> Jane Anne Nohl Division of Hematology >> USC/Norris Comprehensive Cancer Center >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Tim, Jesper, No, scanVcf() wasn't intended for reading bcf. It sounds like this is a bug in scanBcf(). I'll look into it and get back to you. Thanks. Valerie On 01/14/2014 11:54 AM, Jesper Robert Gadin wrote: > Hey again, > > I experimented a little using scanVcf as import-function for ".bcf files", > but unfortunately it did not work as well as I hoped. Actually not sure if > scanVcf is meant to be used on .bcf files... > > #took out ex2.vcf that is the example file in the > VariantAnnotation-package. Processed it as follows: >> asBcf("ex2.vcf",c("20"),"~/Desktop/ex2",overwrite=TRUE) >> b <- scanBcf("ex2.bcf") > Error: scanBcf: failed to find fmt encoded as '18513' > path: ex2.bcf > #which gives an error similar to yours > #then tried > s <- scanVcf("ex2.bcf") > #No error! > #but the output is to me not really meaningful > > #My guess is that your bcf file is corrupt in some way. Possibly because > you used the asBcf() function. > #If you instead use 'Bcftools' for this, the code below is something I know > works (important to use same reference as was used when mapping): > samtools mpileup -ugSf ../References/Bowtie2Hg19/hg19.fa -r chr1 > AllSamples/*.bam | bcftools view -bvcg - > samples-chr1.bcf > > #Alternatively you can use readVcf on the original .vcf file and then > create the ASEset-object like this: > VCF <- readVcf("pathToFile") > gr <- rowData(VCF) > countList <- getAlleleCount(reads, gr, verbose=FALSE) > a <- ASEsetFromCountList(rowData = gr, countList) > > /J.R. > > > On Tue, Jan 14, 2014 at 1:24 PM, Jesper Robert Gadin <j.r.gadin at="" gmail.com="">wrote: > >> Hi Tim, >> >> Glad to hear you find the package useful also for validation! >> >> 1) There is no implementation of scanVcf, because I didn't want the >> package to depend on too many other packages. Didn't have problems with the >> scanBcf myself, and therefore saw no reason to include scanVcf. That was >> the reason, but of course I can add an scanVcf option in the import-method! >> >> 2) Did you mean that the default argument for searchArea could include >> all the ranges in the provided bcf/vcf-file? If so I agree; will include >> that asap. >> >> Expect the new features before next week. >> >> /J.R. >> >> >> On Mon, Jan 13, 2014 at 8:21 PM, Tim Triche, Jr. <tim.triche at="" usc.edu="">wrote: >> >>> Hi all, >>> >>> I'm looking to verify some RNAseq results using (in some cases phased) >>> SNP calls from the CEU trio and from primary samples in-house (case and >>> matched control). AllelicImbalance looks like a terrific library for this, >>> however my BCFs can't be read by Rsamtools::scanBcf. (I get an error >>> "Error: scanBcf: failed to find fmt encoded as '16977' path: >>> /scratch/BAMs/LIU1675A8.hg19.bcf") >>> >>> VariantAnnotation::scanVcf, on the other hand, works fine. >>> >>> So, is there any particular reason why I couldn't hack up >>> the impBcfGL/impBcfGRL method to use VCFs? Also, would it make sense to >>> have a default argument for searchArea that falls back to the assembly in >>> the metadata, if one is specified? >>> >>> The bizarre thing with the BCFs is that I can scan their headers just >>> fine... ! >>> >>> Thanks, >>> >>> Timothy J. Triche, Jr., PhD >>> Jane Anne Nohl Division of Hematology >>> USC/Norris Comprehensive Cancer Center >>> >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Valerie Obenchain Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B155 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: vobencha at fhcrc.org Phone: (206) 667-3158 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
It looks like the bcf functions need substantial work. Development has been focused on vcf and bcf has been left in the dust. Unfortunately we don't have the resources to dedicate to this right now. If someone else wants to pick it up that would be great, if not, it has to stay on the back burner. The current bcf code handles 6 geno fields (PL, DP, GQ, SP, GT and GL. (See Rsamtools::.bcf_template). AllelicImbalance must only need a subset of these fields (or none?). If you are starting with a vcf you can selectively import one or more of these geno fields with a param. param <- ScanVcfParam(geno=c("PL", "DP", "GT")) readVcf(fl, "genome", param=param) In this example it looks like the ASEset only needs the rowData (ranges) from the vcf: >>> #Alternatively you can use readVcf on the original .vcf file and then >>> create the ASEset-object like this: >>> VCF <- readVcf("pathToFile") >>> gr <- rowData(VCF) >>> countList <- getAlleleCount(reads, gr, verbose=FALSE) >>> a <- ASEsetFromCountList(rowData = gr, countList) To import just ranges with no info or geno use this param setup: param <- ScanVcfParam(info=NA, geno=NA) readVcf(fl, "genome", param=param) Valerie On 01/15/2014 08:55 AM, Valerie Obenchain wrote: > Hi Tim, Jesper, > > No, scanVcf() wasn't intended for reading bcf. It sounds like this is a > bug in scanBcf(). I'll look into it and get back to you. > > Thanks. > Valerie > > > On 01/14/2014 11:54 AM, Jesper Robert Gadin wrote: >> Hey again, >> >> I experimented a little using scanVcf as import-function for ".bcf >> files", >> but unfortunately it did not work as well as I hoped. Actually not >> sure if >> scanVcf is meant to be used on .bcf files... >> >> #took out ex2.vcf that is the example file in the >> VariantAnnotation-package. Processed it as follows: >>> asBcf("ex2.vcf",c("20"),"~/Desktop/ex2",overwrite=TRUE) >>> b <- scanBcf("ex2.bcf") >> Error: scanBcf: failed to find fmt encoded as '18513' >> path: ex2.bcf >> #which gives an error similar to yours >> #then tried >> s <- scanVcf("ex2.bcf") >> #No error! >> #but the output is to me not really meaningful >> >> #My guess is that your bcf file is corrupt in some way. Possibly because >> you used the asBcf() function. >> #If you instead use 'Bcftools' for this, the code below is something I >> know >> works (important to use same reference as was used when mapping): >> samtools mpileup -ugSf ../References/Bowtie2Hg19/hg19.fa -r chr1 >> AllSamples/*.bam | bcftools view -bvcg - > samples-chr1.bcf >> >> #Alternatively you can use readVcf on the original .vcf file and then >> create the ASEset-object like this: >> VCF <- readVcf("pathToFile") >> gr <- rowData(VCF) >> countList <- getAlleleCount(reads, gr, verbose=FALSE) >> a <- ASEsetFromCountList(rowData = gr, countList) >> >> /J.R. >> >> >> On Tue, Jan 14, 2014 at 1:24 PM, Jesper Robert Gadin >> <j.r.gadin at="" gmail.com="">wrote: >> >>> Hi Tim, >>> >>> Glad to hear you find the package useful also for validation! >>> >>> 1) There is no implementation of scanVcf, because I didn't want the >>> package to depend on too many other packages. Didn't have problems >>> with the >>> scanBcf myself, and therefore saw no reason to include scanVcf. That was >>> the reason, but of course I can add an scanVcf option in the >>> import-method! >>> >>> 2) Did you mean that the default argument for searchArea could include >>> all the ranges in the provided bcf/vcf-file? If so I agree; will include >>> that asap. >>> >>> Expect the new features before next week. >>> >>> /J.R. >>> >>> >>> On Mon, Jan 13, 2014 at 8:21 PM, Tim Triche, Jr. >>> <tim.triche at="" usc.edu="">wrote: >>> >>>> Hi all, >>>> >>>> I'm looking to verify some RNAseq results using (in some cases phased) >>>> SNP calls from the CEU trio and from primary samples in-house (case and >>>> matched control). AllelicImbalance looks like a terrific library >>>> for this, >>>> however my BCFs can't be read by Rsamtools::scanBcf. (I get an error >>>> "Error: scanBcf: failed to find fmt encoded as '16977' path: >>>> /scratch/BAMs/LIU1675A8.hg19.bcf") >>>> >>>> VariantAnnotation::scanVcf, on the other hand, works fine. >>>> >>>> So, is there any particular reason why I couldn't hack up >>>> the impBcfGL/impBcfGRL method to use VCFs? Also, would it make >>>> sense to >>>> have a default argument for searchArea that falls back to the >>>> assembly in >>>> the metadata, if one is specified? >>>> >>>> The bizarre thing with the BCFs is that I can scan their headers just >>>> fine... ! >>>> >>>> Thanks, >>>> >>>> Timothy J. Triche, Jr., PhD >>>> Jane Anne Nohl Division of Hematology >>>> USC/Norris Comprehensive Cancer Center >>>> >>> >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Valerie Obenchain Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B155 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: vobencha at fhcrc.org Phone: (206) 667-3158 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 603 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6