Question: About the bioconductor package SNPlocs.Hsapiens.dbSNP.20071016.
9.2 years ago by
Hervé Pagès ♦♦ 13k
Hervé Pagès ♦♦ 13k wrote:
Hi Praveen, Praveen Surendran wrote: > Dear Herve Pages, > > I am working on the identification of non-synonymous snp's in humans > from an Affymetrix Data Source. > Currently I am using a version of bioconductor package which will fetch > the information on these snps with the variation and provides > the information on whether a snp is non-synonymous or not. > But I just found that the database does not have enough information on > all the non-synonymous snp's and would like to use this this > bioconductor package. > > Please have your comments on whether I will be able to use the package > to get information on whether the snp is non-synonymous from dbsnp using > this package. Please post to the Bioconductor mailing list (I'm cc'ing it right now). You'll benefit from a wider audience and the answers you will get will be archived so other people can find them and refer to them in the future. If I understand correctly you want to be able to determine whether the SNPs stored in SNPlocs.Hsapiens.dbSNP.20071016 are synonymous or not. Note that you give very little information about which BioC package you are currently using to fetch the information for the Human SNPs, where the package is fetching them from and why it "does not have enough information". SNPlocs.Hsapiens.dbSNP.20071016 only provides the locations and alleles of a SNP (see this recent thread for the details https://stat.ethz.ch/pipermail/bioconductor/2009-February/026231.html) so it's unlikely that you will get more information by using this package than by "fetching SNPs" directly from a public database like dbSNP. The information in SNPlocs.Hsapiens.dbSNP.20071016 was retrieved from dbSNP, from this location to be precise: ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat/ (note that the content of this folder has been updated since SNPlocs.Hsapiens.dbSNP.20071016 was made). My understanding is that in order to determine whether a SNP is synonymous or not you need to know the context of the SNP i.e. does it occur in a gene? if yes, which strand does the gene belong too? does it occur in a codon and where in the codon i.e. at position 1, 2 or 3? Also a SNP can have more than 1 alternate allele, some of them can be synonymous to the reference allele, other not. For example SNP with RefSNP id 6474828 (in chr9), has alleles C, G and T: > library(Biostrings) > library(SNPlocs.Hsapiens.dbSNP.20071016) > chr9snps <- getSNPlocs("chr9") > subset(chr9snps, RefSNP_id=="6474828") RefSNP_id alleles_as_ambig loc 61589 6474828 B 14279138 > IUPAC_CODE_MAP[["B"]]  "CGT" The reference allele (T) can be determined by looking at the reference genome: > library(BSgenome.Hsapiens.UCSC.hg18) > dna <- subseq(unmasked(Hsapiens$chr9), 14279138-2, 14279138+2) > dna 5-letter "DNAString" instance seq: TATAC The UCSC genome browser will confirm this and will also show that this chromosome location is inside a gene (NFIB) that belongs to the minus strand. So the coding DNA is: > codingdna <- reverseComplement(dna) > codingdna 5-letter "DNAString" instance seq: GTATA Note that letters in this short sequences are on the minus strand of chr9 but now at positions 14279138+2 to 14279138-2 in this order. The SNP is at position 14279138 (letter A), and the set of alleles originally reported for the plus strand (C, G, T) now becomes G, C and A. I don't know if the SNP belongs to a codon (this would need to be checked) but in case it did, I would also need to know its position in the codon. If it's at position 1: GTATA 123 > GENETIC_CODE[c("ATA", "CTA", "GTA")] ATA CTA GTA "I" "L" "V" so no alternate allele is synonymous to the reference allele for this SNP. If it's at position 2: GTATA 123 > GENETIC_CODE[c("TAT", "TCT", "TGT")] TAT TCT TGT "Y" "S" "C" same conclusion. But if it's at position 3: GTATA 123 > GENETIC_CODE[c("GTA", "GTC", "GTG")] GTA GTC GTG "V" "V" "V" then all alleles are synonymous. Hope this helps, H. > > Appreciate your kind attention on this query. > > Kind Regards, > > Praveen Surendran > Shields Lab. > School of Medicine & Medical Science. > Complex & Adaptive Systems Laboratory (CASL). > 8 Belfield Office. > University College Dublin (UCD). > Dublin 4, Ireland. > Mob : +353 8793 13071 > Off : +353 171 65334 > > -------------------------------------------------------------------- ---- > Unlimited freedom, unlimited storage. Get it now > <http: in.rd.yahoo.com="" tagline_mail_2="" *http:="" help.yahoo.com="" l="" in="" y="" ahoo="" mail="" yahoomail="" tools="" tools-08.html=""/> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT • link •