Question: Find human SNP orthologous flanking sequence
1
4.3 years ago by
jfertaj20
United Kingdom
jfertaj20 wrote:

Hi all,

I would like to ask if it would be possible to retrieve orthologous sequences to a human SNP flanking sequence automatically using biomaRt or another R package.

Basically I have a list of 4 human SNPs and I have the flanking sequences (60 kb each side) for each of them. I would like to obtain the orthologous sequences for several species (not only primates)

Juan

biomart bioconductor • 759 views
modified 4.3 years ago by Marc Carlson7.2k • written 4.3 years ago by jfertaj20
Answer: Find human SNP orthologous flanking sequence
1
4.3 years ago by
Marc Carlson7.2k
United States
Marc Carlson7.2k wrote:

Hi Juan,

There are several possible answers to this depending on exactly what you are starting with and what you want to find at the end.  I will make one guess about what you wanted here but others may have other ideas.  If your starting point is a range then you could do this:

First lets assume you know where your SNP is and that you can make a GRanges object to represent that

gr <- GRanges(seqnames='chr5', IRanges(174152000, width=200), strand='+')

Then lets see which genes this region overlaps with...

library(VariantAnnotation)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
locateVariants(gr, txdb, AllVariants())

Notice that it overlaps with gene 4488 (entrez gene ID), so how can we look up homologs?
Well one approach is to use the inparanoid resources in the AnnotationHub:

library(AnnotationHub)
ah = AnnotationHub()
ahp = query(ah, 'Inparanoid8Db')

At this point we have 268 different species to choose from, lets look at human since that is what we started with

ahph = query(ahp, 'Homo sapiens')
human = ahph[[1]]

Now lets see what kind of keys we can use to query this resource:

head(keys(human, 'HOMO_SAPIENS'))

Those are uniprot IDs.  We can look those up using an OrgDb.  Lets get the OrgDb for human and translate the entrez gene ID 4488 into a uniprot ID:

Org.Hs = query(ah, c('OrgDb','Homo sapiens'))[[1]]
ids <- mapIds(Org.Hs, '4488', 'UNIPROT', 'ENTREZID')

Now we can look up the matching entrez gene IDs from mouse (for example) using select() like so:

select(human, ids, 'MUS_MUSCULUS', 'HOMO_SAPIENS')

Hope this helps,

Marc