getBM. Get genome location for probes that do not map to genes
3
1
Entering edit mode
Aedin Culhane ▴ 510
@aedin-culhane-1526
Last seen 4.6 years ago
United States

Hi

I am trying to get annotation on affy_hugene probes.  When I go to EnsEMBL, i can see these map unique to introns of genes.  Can I use getBM (biomaRt) to retrieve the genome location of mapped probes.   Then I can query other db to get annotation for that genomics region

mart <- useDataset("hsapiens_gene_ensembl",useMart("ensembl"))

getBM(attributes = c("chromosome_name", "band","start_position", "end_position"),filters= "affy_hugene_1_0_st_v1", values= "7893529",   mart= mart)

 

http://useast.ensembl.org/Homo_sapiens/Location/Genome?fdb=funcgen;ftype=ProbeFeature;id=7893529;ptype=pset

 

Thanks

Aedin

 

biomart annotation • 1.4k views
ADD COMMENT
0
Entering edit mode
Aedin Culhane ▴ 510
@aedin-culhane-1526
Last seen 4.6 years ago
United States

BTW, I also tried 

select(hugene10sttranscriptcluster.db, keys = "7893529", keytype = "PROBEID", columns="MAP")

ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States

You can't use any of the regular annotation packages to do that, as they are just what Affy gives us, repackaged. And they don't say anything about the intronic probes, in general.

This is echoed in the pdInfoPackage

> dbGetQuery(con, "select * from featureSet where transcript_cluster_id='7893529';")
   fsetid strand start stop transcript_cluster_id exon_id crosshyb_type level
1 7893529     NA    NA   NA               7893529       0             0    NA
  chrom type
1    NA   10

You can get the probe sequences from the probeset fasta file and you could then align against the human genome using say matchPDict from Biostrings, or just use blat at UCSC. Pasting the following in blat brings up the gene ANAPC5

>probe:HuGene-1_0-st-v1:814225;474:775; ProbeSetID=7893529; Sense; ProbeSetType=normgene->intron
AAATGTAAAGAGCCGCTATTCATAA
>probe:HuGene-1_0-st-v1:533338;987:507; ProbeSetID=7893529; Sense; ProbeSetType=normgene->intron
CAGAAATGTAAAGAGCCGCTATTCA
>probe:HuGene-1_0-st-v1:28669;318:27; ProbeSetID=7893529; Sense; ProbeSetType=normgene->intron
GAAATGTAAAGAGCCGCTATTCATA
>probe:HuGene-1_0-st-v1:699266;1015:665; ProbeSetID=7893529; Sense; ProbeSetType=normgene->intron
TAAAGAGCCGCTATTCATAACAGCC

 

ADD COMMENT
0
Entering edit mode
Aedin Culhane ▴ 510
@aedin-culhane-1526
Last seen 4.6 years ago
United States

Thanks James

EnsEMBL has already mapped all of the sequences to the genome (see http://useast.ensembl.org/Homo_sapiens/Location/Genome?fdb=funcgen;ftype=ProbeFeature;id=7893529;ptype=pset), so really I just want to pull their mappings.  But I couldn't work out how to do this with BiomaRt.  Do you know?

Aedin

ADD COMMENT
0
Entering edit mode

As far as I can tell you can't get that information from Biomart. As you note, they have done the mapping at Ensembl, and you can get it by searching (and they say you can get it from Biomart, but it appears only to be true if the probes hit an exon), so I don't know if there is any easy way to get at it without hitting the Ensembl DB directly. They do have a Perl API that you can use if you want to get your Perl on.

I tried playing around with a direct query to their MySQL database, but A) they kick you off in like three nanoseconds if you aren't doing something, and B) doing direct queries would require some knowledge of their DB schemas, which I don't have.

If it were my project, I would probably just read the probe fasta file into a DNAStringSet, subset to just the intronic controls, and then align to the human genome using something similar to what Herve does in section 8 of this vignette. That seems like the less 'teeth gnashy' way to proceed.

ADD REPLY

Login before adding your answer.

Traffic: 727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6