Strand information for dbSNP packages
1
0
Entering edit mode
@alex-gutteridge-2935
Last seen 9.6 years ago
United States
I notice the GRanges returned by the dbSNP packages have strand '*'. Does anyone know how safe am I in assuming that the variant alleles also given by the package actually correspond to the '+' strand? I ask this in the context of trying to use predictCoding in the VariantAnnotations package to find coding SNPs. For SNPs in genes on the '-' strand I have found that I have to complement the alleles given by dbSNP to get the correct result. I just want to make sure that assuming the alleles are from the '+' strand is a reasonable assumption in the vast majority (>99%) of cases. I realise from my reading of the SNPlocs.Hsapiens.dbSNP.20110815 manual that some SNPs will be incorrect anyway (it mentions ~0.1% of SNPs not mapping to the reference at all), that level of failure is acceptable, but anything higher would be a worry. -- Alex Gutteridge
SNPlocs safe SNPlocs safe • 1.5k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States
Hi Alex, On 02/28/2012 03:03 AM, Alex Gutteridge wrote: > I notice the GRanges returned by the dbSNP packages have strand '*'. > Does anyone know how safe am I in assuming that the variant alleles > also given by the package actually correspond to the '+' strand? The dbSNP packages don't contain any strand information so it isn't safe to assume one strand or the other. > > I ask this in the context of trying to use predictCoding in the > VariantAnnotations package to find coding SNPs. For SNPs in genes on > the '-' strand I have found that I have to complement the alleles > given by dbSNP to get the correct result. I just want to make sure > that assuming the alleles are from the '+' strand is a reasonable > assumption in the vast majority (>99%) of cases. This does not seem right, I'll look into it. Valerie > > I realise from my reading of the SNPlocs.Hsapiens.dbSNP.20110815 > manual that some SNPs will be incorrect anyway (it mentions ~0.1% of > SNPs not mapping to the reference at all), that level of failure is > acceptable, but anything higher would be a worry. >
ADD COMMENT
0
Entering edit mode
Hi Alex, On 02/28/2012 03:50 PM, Valerie Obenchain wrote: > Hi Alex, > > On 02/28/2012 03:03 AM, Alex Gutteridge wrote: >> I notice the GRanges returned by the dbSNP packages have strand '*'. >> Does anyone know how safe am I in assuming that the variant alleles >> also given by the package actually correspond to the '+' strand? Yes the alleles actually always correspond to the + strand. I should clarify this in the man page. dbSNP reports the strand and alleles for a given SNP and the alleles they give is relative to the reported strand. However, when the SNPlocs.Hsapiens.dbSNP.20110815 package is made the alleles for SNPs on the minus strand are complemented so they correspond to the '+' strand. So all SNPs are considered to be on the + strand and everything is reported with respect to that strand. Hope this helps and sorry for the confusion. Cheers, H. > > The dbSNP packages don't contain any strand information so it isn't safe > to assume one strand or the other. >> >> I ask this in the context of trying to use predictCoding in the >> VariantAnnotations package to find coding SNPs. For SNPs in genes on >> the '-' strand I have found that I have to complement the alleles >> given by dbSNP to get the correct result. I just want to make sure >> that assuming the alleles are from the '+' strand is a reasonable >> assumption in the vast majority (>99%) of cases. > > This does not seem right, I'll look into it. > > Valerie >> >> I realise from my reading of the SNPlocs.Hsapiens.dbSNP.20110815 >> manual that some SNPs will be incorrect anyway (it mentions ~0.1% of >> SNPs not mapping to the reference at all), that level of failure is >> acceptable, but anything higher would be a worry. >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Thanks Herve. I think that explains it. Valerie On 02/28/12 17:35, Hervé Pagès wrote: > Hi Alex, > > On 02/28/2012 03:50 PM, Valerie Obenchain wrote: >> Hi Alex, >> >> On 02/28/2012 03:03 AM, Alex Gutteridge wrote: >>> I notice the GRanges returned by the dbSNP packages have strand '*'. >>> Does anyone know how safe am I in assuming that the variant alleles >>> also given by the package actually correspond to the '+' strand? > > Yes the alleles actually always correspond to the + strand. I should > clarify this in the man page. dbSNP reports the strand and alleles for > a given SNP and the alleles they give is relative to the reported > strand. However, when the SNPlocs.Hsapiens.dbSNP.20110815 package is > made the alleles for SNPs on the minus strand are complemented so > they correspond to the '+' strand. So all SNPs are considered to be > on the + strand and everything is reported with respect to that strand. > > Hope this helps and sorry for the confusion. > > Cheers, > H. > >> >> The dbSNP packages don't contain any strand information so it isn't safe >> to assume one strand or the other. >>> >>> I ask this in the context of trying to use predictCoding in the >>> VariantAnnotations package to find coding SNPs. For SNPs in genes on >>> the '-' strand I have found that I have to complement the alleles >>> given by dbSNP to get the correct result. I just want to make sure >>> that assuming the alleles are from the '+' strand is a reasonable >>> assumption in the vast majority (>99%) of cases. >> >> This does not seem right, I'll look into it. >> >> Valerie >>> >>> I realise from my reading of the SNPlocs.Hsapiens.dbSNP.20110815 >>> manual that some SNPs will be incorrect anyway (it mentions ~0.1% of >>> SNPs not mapping to the reference at all), that level of failure is >>> acceptable, but anything higher would be a worry. >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY

Login before adding your answer.

Traffic: 493 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6