Using getSequence in BiomaRt to obtain snp's
1
0
Entering edit mode
@kenmcgarry-9730
Last seen 8.6 years ago

Hi,

There does not seem to be a simple way to obtain the full sequence of SNP data using getSequence(). I have read similar requests here and the solution was to read 10 or 20 up/down stream BP using the filters for getBM(). However, the error messages for getSequence seem to imply that it cant be used with ENSEMBL_MART_SNP.

Basically, I want to download the full cdna associated with a gene (and SNP). I'm wondering if I should just download the sequence for the protein I'm interested in and then just modify the location affected by the SNP accordingly?

Many thanks

Ken

biomart • 1.3k views
ADD COMMENT
0
Entering edit mode

It would help if you said exactly what you are looking for. Unfortunately 'the full sequence of SNP data' and 'full cdna associated with a gene (and SNP)' is rather inexact. What would the output you desire look like?
 

ADD REPLY
0
Entering edit mode
@kenmcgarry-9730
Last seen 8.6 years ago

My apologies, what I had in mind was obtaining the full snp sequence, similar to how we obtain a proteins sequence.

mart <- useMart(host="www.ensembl.org", biomart="ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl")
ENSG <- getBM(mart=mart, attributes="ensembl_gene_id", filters="hgnc_symbol", values="CTNS")

# The two lines above work ok, we setup "mart" to look at human genes and ENSG to contain the ensembl id for "CTNS" gene which is involved in the disease of interest. The line below setups "snpmart" to point to the snp database.

snpmart <- useMart(host="www.ensembl.org", biomart="ENSEMBL_MART_SNP", dataset="hsapiens_snp")

# The line below is where I get totally lost.

seq1 = getSequence(id="CTNS",type="entrezgene",seqType="cdna", mart=snpmart)

# It maybe that you cant get cdna sequences from snp database. However, I have had a bit luck by modifying some code written by Steffen...

getBM(attributes=c("refsnp_id","snp"),filters=c("snp_filter","downstream_flank","upstream_flank"),values=list(snp[1:5,1],20,20),mart=snpmart, checkFilters=FALSE)

This allows a predefined length of cdna to be extracted around the snp. I'm not sure what  filters and attributes to use to figure out how big each snp is and thus get the entire sequence.

ADD COMMENT
1
Entering edit mode

I'm still confused by the term 'full SNP sequence'.  A SNP is a single nucleotide polymorphism - so it's just one base, not a sequence.  Do you want to the the coding sequence for a gene that contains the SNP?

ADD REPLY
0
Entering edit mode

I guess that would be the best option, and then change the location defined by the snp. Many thanks for the advice.

ADD REPLY

Login before adding your answer.

Traffic: 499 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6