Find Affy probes within a particular region
2
0
Entering edit mode
Daniel Brewer ★ 1.9k
@daniel-brewer-1791
Last seen 9.6 years ago
Hi, I was wondering what the best way to find which Affymetrix probes are within a specific genomic region (chromosome, start, stop). I am not sure if Biomart nor the annotation.db can do this as they both go to some common ID first. The annotation.db stuff seems to only have one position information too. The other option is to dwonload the annotation file from Affymetrix and load that in, but I would prefer to avoid that if at all possible. Has anyone got any ideas. Many thanks -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research Molecular Carcinogenesis Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}
Annotation GO Cancer biomaRt Annotation GO Cancer biomaRt • 893 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 days ago
United States
Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> writes: > Hi, > > I was wondering what the best way to find which Affymetrix probes are > within a specific genomic region (chromosome, start, stop). I am not > sure if Biomart nor the annotation.db can do this as they both go to > some common ID first. The annotation.db stuff seems to only have one > position information too. The other option is to dwonload the > annotation file from Affymetrix and load that in, but I would prefer to > avoid that if at all possible. > > Has anyone got any ideas. Not sure whether this is a good idea or not, but... ## create a data frame of probe genomic location makeLookup <- function(pkg) { filt <- function(x) !is.null(names(x)) # some w/out names, hence czomes lst <- Filter(filt, as.list(getAnnMap("CHRLOC", pkg))) data.frame(id=rep(names(lst), sapply(lst, length)), pos=unlist(lst, use.names=FALSE), chr=unlist(lapply(lst, names), use.names=FALSE), row.names=NULL) } this gives us > lookup <- makeLookup("hgu95av2.db") > head(lookup) id pos chr 1 1000_at -30032926 16 2 1001_at 43539250 1 3 1002_f_at 96512452 10 4 1003_s_at 118269310 11 5 1003_s_at 118259776 11 6 1004_at 118269310 11 then... ## find probes in a single region contains <- function(chr, start, end, table) { apos <- abs(table$pos) idx <- table$chr == chr & apos >= start & apos <=end table[idx,] } > contains(10, 96000000, 97000000, lookup) id pos chr 3 1002_f_at 96512452 10 525 1455_f_at 96688429 10 550 1477_s_at 96433367 10 4321 34078_s_at 96512452 10 6798 36320_at 96152175 10 7509 36937_s_at -96987321 10 9367 38548_at -96786519 10 One could use 'contains' with mapply to get multiple regions, but perhaps there's a more efficient way for such bulk queries. Not sure about your concerns about just 'location'; the probes are a common length, so you could incorporate this into the 'idx' calculation in contains(). Probably someone else will offer up a ready-made solution. Martin > Many thanks > > -- > ************************************************************** > Daniel Brewer, Ph.D. > > Institute of Cancer Research > Molecular Carcinogenesis > Email: daniel.brewer at icr.ac.uk > ************************************************************** > > The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > > This e-mail message is confidential and for use by the a...{{dropped:2}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 14 minutes ago
United States
Hi Daniel, Daniel Brewer wrote: > Hi, > > I was wondering what the best way to find which Affymetrix probes are > within a specific genomic region (chromosome, start, stop). I am not > sure if Biomart nor the annotation.db can do this as they both go to > some common ID first. The annotation.db stuff seems to only have one > position information too. The other option is to dwonload the > annotation file from Affymetrix and load that in, but I would prefer to > avoid that if at all possible. So it depends on exactly what you are trying to do here. The annotation package will give you the chromosome and start location for the probeset. You can use biomaRt to get the chromosome, start, stop for the probeset as well: > getBM(c("chromosome_name", "start_position","end_position"), "affy_hg_u133_plus_2", "1007_s_at", mart) chromosome_name start_position end_position 1 6 30956784 30975912 Of course the question posed is to find things within a specific region, not find the region given the probeset ID. If this is what you want, then an annotation.db package might be what you want. Something like library(mouse4302.db) tab <- toTable(mouse4302CHRLOC) ind <- tab[,3] == 6 & abs(tab[,2]) <= 53043660 & abs(tab[,2]) >= 50000000 tab[ind,] > head(tab[ind,]) probe_id start_location Chromosome 1481 1416884_at 51420614 6 1482 1432628_at 51420614 6 1483 1439421_x_at 51420614 6 1484 1448504_a_at 51420614 6 1485 1454305_at 51420614 6 2397 1422483_a_at -50512561 6 If you then want the probeset start, end, you could feed the Affy IDs to biomaRt. But you specifically said probes, so you might actually want probes instead of probesets. If you want to get that specific then you could use the Biostrings package along with one of the genome packages (warning - HUGE download) and a probe package to get the locations of these probes. I'm not going to give code for that. Instead see the vignette in the BSgenome package, which has lots of examples. Best, Jim > > Has anyone got any ideas. > > Many thanks > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD COMMENT
0
Entering edit mode
On Tue, Jun 17, 2008 at 12:35 PM, James W. MacDonald <jmacdon at="" med.umich.edu=""> wrote: > Hi Daniel, > > Daniel Brewer wrote: >> >> Hi, >> >> I was wondering what the best way to find which Affymetrix probes are >> within a specific genomic region (chromosome, start, stop). I am not >> sure if Biomart nor the annotation.db can do this as they both go to >> some common ID first. The annotation.db stuff seems to only have one >> position information too. The other option is to dwonload the >> annotation file from Affymetrix and load that in, but I would prefer to >> avoid that if at all possible. > > So it depends on exactly what you are trying to do here. The annotation > package will give you the chromosome and start location for the probeset. > You can use biomaRt to get the chromosome, start, stop for the probeset as > well: > >> getBM(c("chromosome_name", "start_position","end_position"), >> "affy_hg_u133_plus_2", "1007_s_at", mart) > chromosome_name start_position end_position > 1 6 30956784 30975912 > > Of course the question posed is to find things within a specific region, not > find the region given the probeset ID. If this is what you want, then an > annotation.db package might be what you want. Something like > > library(mouse4302.db) > tab <- toTable(mouse4302CHRLOC) > ind <- tab[,3] == 6 & abs(tab[,2]) <= 53043660 & > abs(tab[,2]) >= 50000000 > tab[ind,] > >> head(tab[ind,]) > probe_id start_location Chromosome > 1481 1416884_at 51420614 6 > 1482 1432628_at 51420614 6 > 1483 1439421_x_at 51420614 6 > 1484 1448504_a_at 51420614 6 > 1485 1454305_at 51420614 6 > 2397 1422483_a_at -50512561 6 > > If you then want the probeset start, end, you could feed the Affy IDs to > biomaRt. > > But you specifically said probes, so you might actually want probes instead > of probesets. If you want to get that specific then you could use the > Biostrings package along with one of the genome packages (warning - HUGE > download) and a probe package to get the locations of these probes. I'm not > going to give code for that. Instead see the vignette in the BSgenome > package, which has lots of examples. Another alternative is to use the Ensembl core databases directly rather than biomaRt. This is a bit tedious, but there is a treasure trove of information embedded therein. To get the probe locations for all mapped affy probes on the HG_U133A array, you could do: > library(RMySQL) > con <- dbConnect(MySQL(),user='anonymous',host='ensembldb.ensembl.or g',dbname='homo_sapiens_core_47_36i') > ### Warning--BIG SQL command coming.... > a=dbGetQuery(con,sprintf("select op.probeset as probeset,op.name as probe_name,s.name as chromosome,of.seq_region_start as start,of.seq_region_end as end,of.seq_region_strand as strand from oligo_feature of inner join oligo_probe op on op.oligo_probe_id=of.oligo_probe_id inner join oligo_array oa on oa.oligo_array_id=op.oligo_array_id inner join seq_region s on s.seq_region_id=of.seq_region_id where oa.name='%s' order by probeset",'HG_U133A')) > a[1:5,] probeset probe_name chromosome start end strand 1 1007_s_at 467:181; 6 30975290 30975314 1 2 1007_s_at 365:115; 6 30975523 30975547 1 3 1007_s_at 532:139; 6 30975672 30975696 1 4 1007_s_at 86:557; 6 30975472 30975496 1 5 1007_s_at 308:15; 6 30975838 30975862 1 This takes a minute or so to complete (at least from the US). You can then use normal R to do filtering on this dataframe. Hope that helps somewhat. Sean >> >> Has anyone got any ideas. >> >> Many thanks >> > > -- > James W. MacDonald, M.S. > Biostatistician > Affymetrix and cDNA Microarray Core > University of Michigan Cancer Center > 1500 E. Medical Center Drive > 7410 CCGC > Ann Arbor MI 48109 > 734-647-5623 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6