Question: getBM. Get genome location for probes that do not map to genes
1
2.7 years ago by
Aedin Culhane510
United States
Aedin Culhane510 wrote:

Hi

I am trying to get annotation on affy_hugene probes.  When I go to EnsEMBL, i can see these map unique to introns of genes.  Can I use getBM (biomaRt) to retrieve the genome location of mapped probes.   Then I can query other db to get annotation for that genomics region

mart <- useDataset("hsapiens_gene_ensembl",useMart("ensembl"))

getBM(attributes = c("chromosome_name", "band","start_position", "end_position"),filters= "affy_hugene_1_0_st_v1", values= "7893529",   mart= mart)

http://useast.ensembl.org/Homo_sapiens/Location/Genome?fdb=funcgen;ftype=ProbeFeature;id=7893529;ptype=pset

Thanks

Aedin

annotation biomart • 637 views
modified 2.7 years ago • written 2.7 years ago by Aedin Culhane510
Answer: getBM. Get genome location for probes that do not map to genes
0
2.7 years ago by
Aedin Culhane510
United States
Aedin Culhane510 wrote:

BTW, I also tried

select(hugene10sttranscriptcluster.db, keys = "7893529", keytype = "PROBEID", columns="MAP")

Answer: getBM. Get genome location for probes that do not map to genes
0
2.7 years ago by
United States
James W. MacDonald50k wrote:

You can't use any of the regular annotation packages to do that, as they are just what Affy gives us, repackaged. And they don't say anything about the intronic probes, in general.

This is echoed in the pdInfoPackage

> dbGetQuery(con, "select * from featureSet where transcript_cluster_id='7893529';")
fsetid strand start stop transcript_cluster_id exon_id crosshyb_type level
1 7893529     NA    NA   NA               7893529       0             0    NA
chrom type
1    NA   10

You can get the probe sequences from the probeset fasta file and you could then align against the human genome using say matchPDict from Biostrings, or just use blat at UCSC. Pasting the following in blat brings up the gene ANAPC5

>probe:HuGene-1_0-st-v1:814225;474:775; ProbeSetID=7893529; Sense; ProbeSetType=normgene->intron
AAATGTAAAGAGCCGCTATTCATAA
>probe:HuGene-1_0-st-v1:533338;987:507; ProbeSetID=7893529; Sense; ProbeSetType=normgene->intron
CAGAAATGTAAAGAGCCGCTATTCA
>probe:HuGene-1_0-st-v1:28669;318:27; ProbeSetID=7893529; Sense; ProbeSetType=normgene->intron
GAAATGTAAAGAGCCGCTATTCATA
>probe:HuGene-1_0-st-v1:699266;1015:665; ProbeSetID=7893529; Sense; ProbeSetType=normgene->intron
TAAAGAGCCGCTATTCATAACAGCC

Answer: getBM. Get genome location for probes that do not map to genes
0
2.7 years ago by
Aedin Culhane510
United States
Aedin Culhane510 wrote:

Thanks James

EnsEMBL has already mapped all of the sequences to the genome (see http://useast.ensembl.org/Homo_sapiens/Location/Genome?fdb=funcgen;ftype=ProbeFeature;id=7893529;ptype=pset), so really I just want to pull their mappings.  But I couldn't work out how to do this with BiomaRt.  Do you know?

Aedin

As far as I can tell you can't get that information from Biomart. As you note, they have done the mapping at Ensembl, and you can get it by searching (and they say you can get it from Biomart, but it appears only to be true if the probes hit an exon), so I don't know if there is any easy way to get at it without hitting the Ensembl DB directly. They do have a Perl API that you can use if you want to get your Perl on.

I tried playing around with a direct query to their MySQL database, but A) they kick you off in like three nanoseconds if you aren't doing something, and B) doing direct queries would require some knowledge of their DB schemas, which I don't have.

If it were my project, I would probably just read the probe fasta file into a DNAStringSet, subset to just the intronic controls, and then align to the human genome using something similar to what Herve does in section 8 of this vignette. That seems like the less 'teeth gnashy' way to proceed.