Hi all,
I'm looking to verify some RNAseq results using (in some cases phased)
SNP
calls from the CEU trio and from primary samples in-house (case and
matched
control). AllelicImbalance looks like a terrific library for this,
however
my BCFs can't be read by Rsamtools::scanBcf. (I get an error "Error:
scanBcf: failed to find fmt encoded as '16977' path:
/scratch/BAMs/LIU1675A8.hg19.bcf")
VariantAnnotation::scanVcf, on the other hand, works fine.
So, is there any particular reason why I couldn't hack up
the impBcfGL/impBcfGRL method to use VCFs? Also, would it make sense
to
have a default argument for searchArea that falls back to the assembly
in
the metadata, if one is specified?
The bizarre thing with the BCFs is that I can scan their headers just
fine... !
Thanks,
Timothy J. Triche, Jr., PhD
Jane Anne Nohl Division of Hematology
USC/Norris Comprehensive Cancer Center
[[alternative HTML version deleted]]
Hi Tim,
Glad to hear you find the package useful also for validation!
1) There is no implementation of scanVcf, because I didn't want the
package
to depend on too many other packages. Didn't have problems with the
scanBcf
myself, and therefore saw no reason to include scanVcf. That was the
reason, but of course I can add an scanVcf option in the import-
method!
2) Did you mean that the default argument for searchArea could
include all
the ranges in the provided bcf/vcf-file? If so I agree; will include
that
asap.
Expect the new features before next week.
/J.R.
On Mon, Jan 13, 2014 at 8:21 PM, Tim Triche, Jr. <tim.triche@usc.edu>
wrote:
> Hi all,
>
> I'm looking to verify some RNAseq results using (in some cases
phased) SNP
> calls from the CEU trio and from primary samples in-house (case and
matched
> control). AllelicImbalance looks like a terrific library for this,
however
> my BCFs can't be read by Rsamtools::scanBcf. (I get an error
"Error:
> scanBcf: failed to find fmt encoded as '16977' path:
> /scratch/BAMs/LIU1675A8.hg19.bcf")
>
> VariantAnnotation::scanVcf, on the other hand, works fine.
>
> So, is there any particular reason why I couldn't hack up
> the impBcfGL/impBcfGRL method to use VCFs? Also, would it make
sense to
> have a default argument for searchArea that falls back to the
assembly in
> the metadata, if one is specified?
>
> The bizarre thing with the BCFs is that I can scan their headers
just
> fine... !
>
> Thanks,
>
> Timothy J. Triche, Jr., PhD
> Jane Anne Nohl Division of Hematology
> USC/Norris Comprehensive Cancer Center
>
[[alternative HTML version deleted]]
Hey again,
I experimented a little using scanVcf as import-function for ".bcf
files",
but unfortunately it did not work as well as I hoped. Actually not
sure if
scanVcf is meant to be used on .bcf files...
#took out ex2.vcf that is the example file in the
VariantAnnotation-package. Processed it as follows:
> asBcf("ex2.vcf",c("20"),"~/Desktop/ex2",overwrite=TRUE)
> b <- scanBcf("ex2.bcf")
Error: scanBcf: failed to find fmt encoded as '18513'
path: ex2.bcf
#which gives an error similar to yours
#then tried
s <- scanVcf("ex2.bcf")
#No error!
#but the output is to me not really meaningful
#My guess is that your bcf file is corrupt in some way. Possibly
because
you used the asBcf() function.
#If you instead use 'Bcftools' for this, the code below is something I
know
works (important to use same reference as was used when mapping):
samtools mpileup -ugSf ../References/Bowtie2Hg19/hg19.fa -r chr1
AllSamples/*.bam | bcftools view -bvcg - > samples-chr1.bcf
#Alternatively you can use readVcf on the original .vcf file and then
create the ASEset-object like this:
VCF <- readVcf("pathToFile")
gr <- rowData(VCF)
countList <- getAlleleCount(reads, gr, verbose=FALSE)
a <- ASEsetFromCountList(rowData = gr, countList)
/J.R.
On Tue, Jan 14, 2014 at 1:24 PM, Jesper Robert Gadin
<j.r.gadin@gmail.com>wrote:
> Hi Tim,
>
> Glad to hear you find the package useful also for validation!
>
> 1) There is no implementation of scanVcf, because I didn't want the
> package to depend on too many other packages. Didn't have problems
with the
> scanBcf myself, and therefore saw no reason to include scanVcf. That
was
> the reason, but of course I can add an scanVcf option in the import-
method!
>
> 2) Did you mean that the default argument for searchArea could
include
> all the ranges in the provided bcf/vcf-file? If so I agree; will
include
> that asap.
>
> Expect the new features before next week.
>
> /J.R.
>
>
> On Mon, Jan 13, 2014 at 8:21 PM, Tim Triche, Jr.
<tim.triche@usc.edu>wrote:
>
>> Hi all,
>>
>> I'm looking to verify some RNAseq results using (in some cases
phased)
>> SNP calls from the CEU trio and from primary samples in-house (case
and
>> matched control). AllelicImbalance looks like a terrific library
for this,
>> however my BCFs can't be read by Rsamtools::scanBcf. (I get an
error
>> "Error: scanBcf: failed to find fmt encoded as '16977' path:
>> /scratch/BAMs/LIU1675A8.hg19.bcf")
>>
>> VariantAnnotation::scanVcf, on the other hand, works fine.
>>
>> So, is there any particular reason why I couldn't hack up
>> the impBcfGL/impBcfGRL method to use VCFs? Also, would it make
sense to
>> have a default argument for searchArea that falls back to the
assembly in
>> the metadata, if one is specified?
>>
>> The bizarre thing with the BCFs is that I can scan their headers
just
>> fine... !
>>
>> Thanks,
>>
>> Timothy J. Triche, Jr., PhD
>> Jane Anne Nohl Division of Hematology
>> USC/Norris Comprehensive Cancer Center
>>
>
>
[[alternative HTML version deleted]]
Hi Tim, Jesper,
No, scanVcf() wasn't intended for reading bcf. It sounds like this is
a
bug in scanBcf(). I'll look into it and get back to you.
Thanks.
Valerie
On 01/14/2014 11:54 AM, Jesper Robert Gadin wrote:
> Hey again,
>
> I experimented a little using scanVcf as import-function for ".bcf
files",
> but unfortunately it did not work as well as I hoped. Actually not
sure if
> scanVcf is meant to be used on .bcf files...
>
> #took out ex2.vcf that is the example file in the
> VariantAnnotation-package. Processed it as follows:
>> asBcf("ex2.vcf",c("20"),"~/Desktop/ex2",overwrite=TRUE)
>> b <- scanBcf("ex2.bcf")
> Error: scanBcf: failed to find fmt encoded as '18513'
> path: ex2.bcf
> #which gives an error similar to yours
> #then tried
> s <- scanVcf("ex2.bcf")
> #No error!
> #but the output is to me not really meaningful
>
> #My guess is that your bcf file is corrupt in some way. Possibly
because
> you used the asBcf() function.
> #If you instead use 'Bcftools' for this, the code below is something
I know
> works (important to use same reference as was used when mapping):
> samtools mpileup -ugSf ../References/Bowtie2Hg19/hg19.fa -r chr1
> AllSamples/*.bam | bcftools view -bvcg - > samples-chr1.bcf
>
> #Alternatively you can use readVcf on the original .vcf file and
then
> create the ASEset-object like this:
> VCF <- readVcf("pathToFile")
> gr <- rowData(VCF)
> countList <- getAlleleCount(reads, gr, verbose=FALSE)
> a <- ASEsetFromCountList(rowData = gr, countList)
>
> /J.R.
>
>
> On Tue, Jan 14, 2014 at 1:24 PM, Jesper Robert Gadin <j.r.gadin at="" gmail.com="">wrote:
>
>> Hi Tim,
>>
>> Glad to hear you find the package useful also for validation!
>>
>> 1) There is no implementation of scanVcf, because I didn't want the
>> package to depend on too many other packages. Didn't have problems
with the
>> scanBcf myself, and therefore saw no reason to include scanVcf.
That was
>> the reason, but of course I can add an scanVcf option in the
import-method!
>>
>> 2) Did you mean that the default argument for searchArea could
include
>> all the ranges in the provided bcf/vcf-file? If so I agree; will
include
>> that asap.
>>
>> Expect the new features before next week.
>>
>> /J.R.
>>
>>
>> On Mon, Jan 13, 2014 at 8:21 PM, Tim Triche, Jr. <tim.triche at="" usc.edu="">wrote:
>>
>>> Hi all,
>>>
>>> I'm looking to verify some RNAseq results using (in some cases
phased)
>>> SNP calls from the CEU trio and from primary samples in-house
(case and
>>> matched control). AllelicImbalance looks like a terrific library
for this,
>>> however my BCFs can't be read by Rsamtools::scanBcf. (I get an
error
>>> "Error: scanBcf: failed to find fmt encoded as '16977' path:
>>> /scratch/BAMs/LIU1675A8.hg19.bcf")
>>>
>>> VariantAnnotation::scanVcf, on the other hand, works fine.
>>>
>>> So, is there any particular reason why I couldn't hack up
>>> the impBcfGL/impBcfGRL method to use VCFs? Also, would it make
sense to
>>> have a default argument for searchArea that falls back to the
assembly in
>>> the metadata, if one is specified?
>>>
>>> The bizarre thing with the BCFs is that I can scan their headers
just
>>> fine... !
>>>
>>> Thanks,
>>>
>>> Timothy J. Triche, Jr., PhD
>>> Jane Anne Nohl Division of Hematology
>>> USC/Norris Comprehensive Cancer Center
>>>
>>
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Valerie Obenchain
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: vobencha at fhcrc.org
Phone: (206) 667-3158
Fax: (206) 667-1319
It looks like the bcf functions need substantial work. Development has
been focused on vcf and bcf has been left in the dust. Unfortunately
we
don't have the resources to dedicate to this right now. If someone
else
wants to pick it up that would be great, if not, it has to stay on the
back burner.
The current bcf code handles 6 geno fields (PL, DP, GQ, SP, GT and GL.
(See Rsamtools::.bcf_template). AllelicImbalance must only need a
subset
of these fields (or none?).
If you are starting with a vcf you can selectively import one or more
of
these geno fields with a param.
param <- ScanVcfParam(geno=c("PL", "DP", "GT"))
readVcf(fl, "genome", param=param)
In this example it looks like the ASEset only needs the rowData
(ranges)
from the vcf:
>>> #Alternatively you can use readVcf on the original .vcf file and
then
>>> create the ASEset-object like this:
>>> VCF <- readVcf("pathToFile")
>>> gr <- rowData(VCF)
>>> countList <- getAlleleCount(reads, gr, verbose=FALSE)
>>> a <- ASEsetFromCountList(rowData = gr, countList)
To import just ranges with no info or geno use this param setup:
param <- ScanVcfParam(info=NA, geno=NA)
readVcf(fl, "genome", param=param)
Valerie
On 01/15/2014 08:55 AM, Valerie Obenchain wrote:
> Hi Tim, Jesper,
>
> No, scanVcf() wasn't intended for reading bcf. It sounds like this
is a
> bug in scanBcf(). I'll look into it and get back to you.
>
> Thanks.
> Valerie
>
>
> On 01/14/2014 11:54 AM, Jesper Robert Gadin wrote:
>> Hey again,
>>
>> I experimented a little using scanVcf as import-function for ".bcf
>> files",
>> but unfortunately it did not work as well as I hoped. Actually not
>> sure if
>> scanVcf is meant to be used on .bcf files...
>>
>> #took out ex2.vcf that is the example file in the
>> VariantAnnotation-package. Processed it as follows:
>>> asBcf("ex2.vcf",c("20"),"~/Desktop/ex2",overwrite=TRUE)
>>> b <- scanBcf("ex2.bcf")
>> Error: scanBcf: failed to find fmt encoded as '18513'
>> path: ex2.bcf
>> #which gives an error similar to yours
>> #then tried
>> s <- scanVcf("ex2.bcf")
>> #No error!
>> #but the output is to me not really meaningful
>>
>> #My guess is that your bcf file is corrupt in some way. Possibly
because
>> you used the asBcf() function.
>> #If you instead use 'Bcftools' for this, the code below is
something I
>> know
>> works (important to use same reference as was used when mapping):
>> samtools mpileup -ugSf ../References/Bowtie2Hg19/hg19.fa -r chr1
>> AllSamples/*.bam | bcftools view -bvcg - > samples-chr1.bcf
>>
>> #Alternatively you can use readVcf on the original .vcf file and
then
>> create the ASEset-object like this:
>> VCF <- readVcf("pathToFile")
>> gr <- rowData(VCF)
>> countList <- getAlleleCount(reads, gr, verbose=FALSE)
>> a <- ASEsetFromCountList(rowData = gr, countList)
>>
>> /J.R.
>>
>>
>> On Tue, Jan 14, 2014 at 1:24 PM, Jesper Robert Gadin
>> <j.r.gadin at="" gmail.com="">wrote:
>>
>>> Hi Tim,
>>>
>>> Glad to hear you find the package useful also for validation!
>>>
>>> 1) There is no implementation of scanVcf, because I didn't want
the
>>> package to depend on too many other packages. Didn't have problems
>>> with the
>>> scanBcf myself, and therefore saw no reason to include scanVcf.
That was
>>> the reason, but of course I can add an scanVcf option in the
>>> import-method!
>>>
>>> 2) Did you mean that the default argument for searchArea could
include
>>> all the ranges in the provided bcf/vcf-file? If so I agree; will
include
>>> that asap.
>>>
>>> Expect the new features before next week.
>>>
>>> /J.R.
>>>
>>>
>>> On Mon, Jan 13, 2014 at 8:21 PM, Tim Triche, Jr.
>>> <tim.triche at="" usc.edu="">wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm looking to verify some RNAseq results using (in some cases
phased)
>>>> SNP calls from the CEU trio and from primary samples in-house
(case and
>>>> matched control). AllelicImbalance looks like a terrific library
>>>> for this,
>>>> however my BCFs can't be read by Rsamtools::scanBcf. (I get an
error
>>>> "Error: scanBcf: failed to find fmt encoded as '16977' path:
>>>> /scratch/BAMs/LIU1675A8.hg19.bcf")
>>>>
>>>> VariantAnnotation::scanVcf, on the other hand, works fine.
>>>>
>>>> So, is there any particular reason why I couldn't hack up
>>>> the impBcfGL/impBcfGRL method to use VCFs? Also, would it make
>>>> sense to
>>>> have a default argument for searchArea that falls back to the
>>>> assembly in
>>>> the metadata, if one is specified?
>>>>
>>>> The bizarre thing with the BCFs is that I can scan their headers
just
>>>> fine... !
>>>>
>>>> Thanks,
>>>>
>>>> Timothy J. Triche, Jr., PhD
>>>> Jane Anne Nohl Division of Hematology
>>>> USC/Norris Comprehensive Cancer Center
>>>>
>>>
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
--
Valerie Obenchain
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: vobencha at fhcrc.org
Phone: (206) 667-3158
Fax: (206) 667-1319