Question: Strand information for dbSNP packages
0
gravatar for Alex Gutteridge
7.8 years ago by
United States
Alex Gutteridge650 wrote:
I notice the GRanges returned by the dbSNP packages have strand '*'. Does anyone know how safe am I in assuming that the variant alleles also given by the package actually correspond to the '+' strand? I ask this in the context of trying to use predictCoding in the VariantAnnotations package to find coding SNPs. For SNPs in genes on the '-' strand I have found that I have to complement the alleles given by dbSNP to get the correct result. I just want to make sure that assuming the alleles are from the '+' strand is a reasonable assumption in the vast majority (>99%) of cases. I realise from my reading of the SNPlocs.Hsapiens.dbSNP.20110815 manual that some SNPs will be incorrect anyway (it mentions ~0.1% of SNPs not mapping to the reference at all), that level of failure is acceptable, but anything higher would be a worry. -- Alex Gutteridge
snplocs safe • 673 views
ADD COMMENTlink modified 7.8 years ago by Valerie Obenchain6.7k • written 7.8 years ago by Alex Gutteridge650
Answer: Strand information for dbSNP packages
0
gravatar for Valerie Obenchain
7.8 years ago by
United States
Valerie Obenchain6.7k wrote:
Hi Alex, On 02/28/2012 03:03 AM, Alex Gutteridge wrote: > I notice the GRanges returned by the dbSNP packages have strand '*'. > Does anyone know how safe am I in assuming that the variant alleles > also given by the package actually correspond to the '+' strand? The dbSNP packages don't contain any strand information so it isn't safe to assume one strand or the other. > > I ask this in the context of trying to use predictCoding in the > VariantAnnotations package to find coding SNPs. For SNPs in genes on > the '-' strand I have found that I have to complement the alleles > given by dbSNP to get the correct result. I just want to make sure > that assuming the alleles are from the '+' strand is a reasonable > assumption in the vast majority (>99%) of cases. This does not seem right, I'll look into it. Valerie > > I realise from my reading of the SNPlocs.Hsapiens.dbSNP.20110815 > manual that some SNPs will be incorrect anyway (it mentions ~0.1% of > SNPs not mapping to the reference at all), that level of failure is > acceptable, but anything higher would be a worry. >
ADD COMMENTlink written 7.8 years ago by Valerie Obenchain6.7k
Hi Alex, On 02/28/2012 03:50 PM, Valerie Obenchain wrote: > Hi Alex, > > On 02/28/2012 03:03 AM, Alex Gutteridge wrote: >> I notice the GRanges returned by the dbSNP packages have strand '*'. >> Does anyone know how safe am I in assuming that the variant alleles >> also given by the package actually correspond to the '+' strand? Yes the alleles actually always correspond to the + strand. I should clarify this in the man page. dbSNP reports the strand and alleles for a given SNP and the alleles they give is relative to the reported strand. However, when the SNPlocs.Hsapiens.dbSNP.20110815 package is made the alleles for SNPs on the minus strand are complemented so they correspond to the '+' strand. So all SNPs are considered to be on the + strand and everything is reported with respect to that strand. Hope this helps and sorry for the confusion. Cheers, H. > > The dbSNP packages don't contain any strand information so it isn't safe > to assume one strand or the other. >> >> I ask this in the context of trying to use predictCoding in the >> VariantAnnotations package to find coding SNPs. For SNPs in genes on >> the '-' strand I have found that I have to complement the alleles >> given by dbSNP to get the correct result. I just want to make sure >> that assuming the alleles are from the '+' strand is a reasonable >> assumption in the vast majority (>99%) of cases. > > This does not seem right, I'll look into it. > > Valerie >> >> I realise from my reading of the SNPlocs.Hsapiens.dbSNP.20110815 >> manual that some SNPs will be incorrect anyway (it mentions ~0.1% of >> SNPs not mapping to the reference at all), that level of failure is >> acceptable, but anything higher would be a worry. >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 7.8 years ago by Hervé Pagès ♦♦ 14k
Thanks Herve. I think that explains it. Valerie On 02/28/12 17:35, Hervé Pagès wrote: > Hi Alex, > > On 02/28/2012 03:50 PM, Valerie Obenchain wrote: >> Hi Alex, >> >> On 02/28/2012 03:03 AM, Alex Gutteridge wrote: >>> I notice the GRanges returned by the dbSNP packages have strand '*'. >>> Does anyone know how safe am I in assuming that the variant alleles >>> also given by the package actually correspond to the '+' strand? > > Yes the alleles actually always correspond to the + strand. I should > clarify this in the man page. dbSNP reports the strand and alleles for > a given SNP and the alleles they give is relative to the reported > strand. However, when the SNPlocs.Hsapiens.dbSNP.20110815 package is > made the alleles for SNPs on the minus strand are complemented so > they correspond to the '+' strand. So all SNPs are considered to be > on the + strand and everything is reported with respect to that strand. > > Hope this helps and sorry for the confusion. > > Cheers, > H. > >> >> The dbSNP packages don't contain any strand information so it isn't safe >> to assume one strand or the other. >>> >>> I ask this in the context of trying to use predictCoding in the >>> VariantAnnotations package to find coding SNPs. For SNPs in genes on >>> the '-' strand I have found that I have to complement the alleles >>> given by dbSNP to get the correct result. I just want to make sure >>> that assuming the alleles are from the '+' strand is a reasonable >>> assumption in the vast majority (>99%) of cases. >> >> This does not seem right, I'll look into it. >> >> Valerie >>> >>> I realise from my reading of the SNPlocs.Hsapiens.dbSNP.20110815 >>> manual that some SNPs will be incorrect anyway (it mentions ~0.1% of >>> SNPs not mapping to the reference at all), that level of failure is >>> acceptable, but anything higher would be a worry. >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLYlink written 7.8 years ago by Valerie Obenchain6.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 348 users visited in the last hour