Question

VariantTools callVariants - binomial likelihood ratio

0

Entering edit mode

heyi xiao ▴ 360

@heyi-xiao-3308

Last seen 9.2 years ago

United States

Hi, I try to use VariantTools package package and I am working on the vignette examples. I have some questions on the variant calling procedure. The vignette says: ?The callVariants function uses a binomial likelihood ratio test for this purpose. The ratio is P(D|p = plower)=P(D|p = perror).? To understand this statement, I located the two internal functions that carry out this step. > VariantTools:::lrtFreqCutoff function (p0, p1, n = if (C == 1L) 1L else n, C = 1L) { num <- (1/n) * log(C) + log(1 - p0) - log(1 - p1) denom <- log(p1) - log(p0) + log(1 - p0) - log(1 - p1) num/denom } > VariantTools:::BinomialLRFilter function (p.lower = 0.2, p.error = 1/1000) { function(x) { freq.cutoff <- lrtFreqCutoff(p.error, p.lower) sample.freq <- altDepth(x)/totalDepth(x) passed <- sample.freq >= freq.cutoff passed[is.na(passed)] <- FALSE passed } } However, I get even more confused. Here are my questions: The binomial likelihood ratio should be p1^k*(1-p1)^(n-k)/(p0^k*(1-p0)^(n-k)), here n and k are total trial and number of hits. I don?t understand why it becomes num/denom in the lrtFreqCutoff function? What?s n and C in lrtFreqCutoff? These arguments are not used when BinomialLRFilter calls lrtFreqCutoff, why? In BinomialLRFilter, the criterion used is sample.freq >= freq.cutoff. But sample.freq <- altDepth(x)/totalDepth(x) is a frequency as defined, yet freq.cutoff seems to be the binomial likelihood ratio. How a frequency can be compared to likelihood ratio? In BinomialLRFilter, p0=perror = 0.001 and p1=plower = 0.2, the default p0 and p1 values are assumed, why not used the actual sequencing error rate and lowest variant frequency? Hope someone can explain these problems to me. Thank you!

VariantTools VariantTools • 1.7k views

ADD COMMENT • link updated 12.1 years ago by Michael Lawrence ★ 11k • written 12.1 years ago by heyi xiao ▴ 360

score 0 · Answer 1 · 2013-11-17

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 4.1 years ago

United States

On Sun, Nov 17, 2013 at 3:03 PM, heyi xiao <xiaoheyiyh@yahoo.com> wrote: > Hi, > I try to use VariantTools package package and I am working on the vignette > examples. I have some questions on the variant calling procedure. The > vignette says: The callVariants function uses a binomial likelihood ratio > test for this purpose. The ratio is P(D|p = plower)=P(D|p = perror). > > To understand this statement, I located the two internal functions that > carry out this step. > > > VariantTools:::lrtFreqCutoff > function (p0, p1, n = if (C == 1L) 1L else n, C = 1L) > { > num <- (1/n) * log(C) + log(1 - p0) - log(1 - p1) > denom <- log(p1) - log(p0) + log(1 - p0) - log(1 - p1) > num/denom > } > > > VariantTools:::BinomialLRFilter > function (p.lower = 0.2, p.error = 1/1000) > { > function(x) { > freq.cutoff <- lrtFreqCutoff(p.error, p.lower) > sample.freq <- altDepth(x)/totalDepth(x) > passed <- sample.freq >= freq.cutoff > passed[is.na(passed)] <- FALSE > passed > } > } > > However, I get even more confused. Here are my questions: > The binomial likelihood ratio should be > p1^k*(1-p1)^(n-k)/(p0^k*(1-p0)^(n-k)), here n and k are total trial and > number of hits. I dont understand why it becomes num/denom in the > lrtFreqCutoff function? > The lrtFreqCutoff is not calculating the likelihood ratio: it's calculating the frequency (k/n) at which the ratio becomes greater than C. It takes some algebra to work this out. 'n' is the coverage, but it cancels out when C = 1. That's what the filter always uses. If it didn't we would need to consider the coverage at each position. As it is now, it is only a function of the two probabilities and so is constant (~ 4% using the defaults) for every position. > > Whats n and C in lrtFreqCutoff? These arguments are not used when > BinomialLRFilter calls lrtFreqCutoff, why? > > In BinomialLRFilter, the criterion used is sample.freq >= freq.cutoff. But > sample.freq <- altDepth(x)/totalDepth(x) is a frequency as defined, yet > freq.cutoff seems to be the binomial likelihood ratio. How a frequency can > be compared to likelihood ratio? > > In BinomialLRFilter, p0=perror = 0.001 and p1=plower = 0.2, the default p0 > and p1 values are assumed, why not used the actual sequencing error rate > and lowest variant frequency? > > The user can change the defaults. How do you propose we figure out the correct values automatically? We could try to estimate the error rate from e.g. overlapping read ends, but that's just a heuristic. > Hope someone can explain these problems to me. Thank you! > > [[alternative HTML version deleted]]

ADD COMMENT • link 12.1 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Hi Michael, Thanks for your answers. Now I understand the lrtFreqCutoff step. I think the the values of p0 (=perror = 0.001) and p1 (=plower = 0.2) should be important for snp calling. Sequencing error, p0/perror, can be derived from the raw read data, or even better can be recalibrated like SOAPsnp or GATK. But that?s a bit too much of work. On the other hand, why you choose to use p1=plower = 0.2? I appreciate the simplicity of your method much, i.e. map the binomial likelihood ratio to the cutoff frequency. Have you tried to use the Bayesian probability score as in other method? I know it is more complicated but you get the significance level or p value this way. Thank you! -------------------------------------------- On Sun, 11/17/13, Michael Lawrence <lawrence.michael at="" gene.com=""> wrote: Subject: Re: VariantTools callVariants - binomial likelihood ratio Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">, "Michael Lawrence" <lawrence.michael at="" gene.com=""> Date: Sunday, November 17, 2013, 10:29 PM On Sun, Nov 17, 2013 wrote: Hi, I try to use VariantTools package package and I am working on the vignette examples. I have some questions on the variant calling procedure. The vignette says: ?The callVariants function uses a binomial likelihood ratio test for this purpose. The ratio is P(D|p = plower)=P(D|p = perror).? To understand this statement, I located the two internal functions that carry out this step. > VariantTools:::lrtFreqCutoff function (p0, p1, n = if (C == 1L) 1L else n, C = 1L) { ? ? num <- (1/n) * log(C) + log(1 - p0) - log(1 - p1) ? ? denom <- log(p1) - log(p0) + log(1 - p0) - log(1 - p1) ? ? num/denom } > VariantTools:::BinomialLRFilter function (p.lower = 0.2, p.error = 1/1000) { ? ? function(x) { ? ? ? ? freq.cutoff <- lrtFreqCutoff(p.error, p.lower) ? ? ? ? sample.freq <- altDepth(x)/totalDepth(x) ? ? ? ? passed <- sample.freq >= freq.cutoff ? ? ? ? passed[is.na(passed)] <- FALSE ? ? ? ? passed ? ? } } However, I get even more confused. Here are my questions: The binomial likelihood ratio should be p1^k*(1-p1)^(n-k)/(p0^k*(1-p0)^(n-k)), here n and k are total trial and number of hits. I don?t understand why it becomes num/denom in the lrtFreqCutoff function? The lrtFreqCutoff is not calculating the likelihood ratio: it's calculating the frequency (k/n) at which the ratio becomes greater than C. It takes some algebra to work this out. 'n' is the coverage, but it cancels out when C = 1. That's what the filter always uses. If it didn't we would need to consider the coverage at each position. As it is now, it is only a function of the two probabilities and so is constant (~ 4% using the defaults) for every position. ? What?s n and C in lrtFreqCutoff? These arguments are not used when BinomialLRFilter calls lrtFreqCutoff, why? In BinomialLRFilter, the criterion used is sample.freq >= freq.cutoff. But sample.freq <- altDepth(x)/totalDepth(x) is a frequency as defined, yet freq.cutoff seems to be the binomial likelihood ratio. How a frequency can be compared to likelihood ratio? In BinomialLRFilter, p0=perror = 0.001 and p1=plower = 0.2, the default p0 and p1 values are assumed, why not used the actual sequencing error rate and lowest variant frequency? The user can change the defaults. How do you propose we figure out the correct values automatically?? We could try to estimate the error rate from e.g. overlapping read ends, but that's just a heuristic. ? [[elided Yahoo spam]]

ADD REPLY • link 12.1 years ago heyi xiao ▴ 360

0

Entering edit mode

On Tue, Nov 19, 2013 at 10:33 AM, heyi xiao <xiaoheyiyh@yahoo.com> wrote: > Hi Michael, > Thanks for your answers. Now I understand the lrtFreqCutoff step. > I think the the values of p0 (=perror = 0.001) and p1 (=plower = 0.2) > should be important for snp calling. > Sequencing error, p0/perror, can be derived from the raw read data, or > even better can be recalibrated like SOAPsnp or GATK. But thats a bit too > much of work. On the other hand, why you choose to use p1=plower = 0.2? > This is somewhat arbitrary. It's just a matter of what you are trying to find in the data. We wanted to be reasonably confident that we would detect a variant if present at 20% frequency. > I appreciate the simplicity of your method much, i.e. map the binomial > likelihood ratio to the cutoff frequency. Have you tried to use the > Bayesian probability score as in other method? I know it is more > complicated but you get the significance level or p value this way. Thank > you! > > We have of course considered other methods. A p-value does not necessarily make the problem any easier. > -------------------------------------------- > On Sun, 11/17/13, Michael Lawrence <lawrence.michael@gene.com> wrote: > > Subject: Re: VariantTools callVariants - binomial likelihood ratio > To: "heyi xiao" <xiaoheyiyh@yahoo.com> > Cc: "bioconductor@r-project.org" <bioconductor@r-project.org>, "Michael > Lawrence" <lawrence.michael@gene.com> > Date: Sunday, November 17, 2013, 10:29 PM > > > > > On Sun, Nov 17, 2013 > at 3:03 PM, heyi xiao <xiaoheyiyh@yahoo.com> > wrote: > > Hi, > > I try to use VariantTools package package and I am working > on the vignette examples. I have some questions on the > variant calling procedure. The vignette says: The > callVariants function uses a binomial likelihood ratio test > for this purpose. The ratio is P(D|p = plower)=P(D|p = > perror). > > > > > To understand this statement, I located the two internal > functions that carry out this step. > > > > > VariantTools:::lrtFreqCutoff > > function (p0, p1, n = if (C == 1L) 1L else n, C = 1L) > > { > > num <- (1/n) * log(C) + log(1 - p0) - log(1 - p1) > > denom <- log(p1) - log(p0) + log(1 - p0) - log(1 - > p1) > > num/denom > > } > > > > > VariantTools:::BinomialLRFilter > > function (p.lower = 0.2, p.error = 1/1000) > > { > > function(x) { > > freq.cutoff <- lrtFreqCutoff(p.error, > p.lower) > > sample.freq <- altDepth(x)/totalDepth(x) > > passed <- sample.freq >= freq.cutoff > > passed[is.na(passed)] <- FALSE > > passed > > } > > } > > > > However, I get even more confused. Here are my questions: > > The binomial likelihood ratio should be > p1^k*(1-p1)^(n-k)/(p0^k*(1-p0)^(n-k)), here n and k are > total trial and number of hits. I dont understand why it > becomes num/denom in the lrtFreqCutoff function? > > > The lrtFreqCutoff is not calculating the > likelihood ratio: it's calculating the frequency (k/n) > at which the ratio becomes greater than C. It takes some > algebra to work this out. 'n' is the coverage, but > it cancels out when C = 1. That's what the filter always > uses. If it didn't we would need to consider the > coverage at each position. As it is now, it is only a > function of the two probabilities and so is constant (~ 4% > using the defaults) for every position. > > > > > > Whats n and C in lrtFreqCutoff? These arguments are not > used when BinomialLRFilter calls lrtFreqCutoff, why? > > > > In BinomialLRFilter, the criterion used is sample.freq >= > freq.cutoff. But sample.freq <- altDepth(x)/totalDepth(x) > is a frequency as defined, yet freq.cutoff seems to be the > binomial likelihood ratio. How a frequency can be compared > to likelihood ratio? > > > > > In BinomialLRFilter, p0=perror = 0.001 and p1=plower = 0.2, > the default p0 and p1 values are assumed, why not used the > actual sequencing error rate and lowest variant frequency? > > > > The user can change the defaults. How do you > propose we figure out the correct values automatically? We > could try to estimate the error rate from e.g. overlapping > read ends, but that's just a heuristic. > > > Hope someone can explain these problems to me. Thank you! > > > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 12.1 years ago Michael Lawrence ★ 11k