Hi,
I was wondering what the best way to find which Affymetrix probes are
within a specific genomic region (chromosome, start, stop). I am not
sure if Biomart nor the annotation.db can do this as they both go to
some common ID first. The annotation.db stuff seems to only have one
position information too. The other option is to dwonload the
annotation file from Affymetrix and load that in, but I would prefer
to
avoid that if at all possible.
Has anyone got any ideas.
Many thanks
--
**************************************************************
Daniel Brewer, Ph.D.
Institute of Cancer Research
Molecular Carcinogenesis
Email: daniel.brewer at icr.ac.uk
**************************************************************
The Institute of Cancer Research: Royal Cancer Hospital, a charitable
Company Limited by Guarantee, Registered in England under Company No.
534147 with its Registered Office at 123 Old Brompton Road, London SW7
3RP.
This e-mail message is confidential and for use by the
a...{{dropped:2}}
Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> writes:
> Hi,
>
> I was wondering what the best way to find which Affymetrix probes
are
> within a specific genomic region (chromosome, start, stop). I am
not
> sure if Biomart nor the annotation.db can do this as they both go to
> some common ID first. The annotation.db stuff seems to only have
one
> position information too. The other option is to dwonload the
> annotation file from Affymetrix and load that in, but I would prefer
to
> avoid that if at all possible.
>
> Has anyone got any ideas.
Not sure whether this is a good idea or not, but...
## create a data frame of probe genomic location
makeLookup <- function(pkg) {
filt <- function(x) !is.null(names(x)) # some w/out names, hence
czomes
lst <- Filter(filt, as.list(getAnnMap("CHRLOC", pkg)))
data.frame(id=rep(names(lst), sapply(lst, length)),
pos=unlist(lst, use.names=FALSE),
chr=unlist(lapply(lst, names), use.names=FALSE),
row.names=NULL)
}
this gives us
> lookup <- makeLookup("hgu95av2.db")
> head(lookup)
id pos chr
1 1000_at -30032926 16
2 1001_at 43539250 1
3 1002_f_at 96512452 10
4 1003_s_at 118269310 11
5 1003_s_at 118259776 11
6 1004_at 118269310 11
then...
## find probes in a single region
contains <- function(chr, start, end, table) {
apos <- abs(table$pos)
idx <- table$chr == chr & apos >= start & apos <=end
table[idx,]
}
> contains(10, 96000000, 97000000, lookup)
id pos chr
3 1002_f_at 96512452 10
525 1455_f_at 96688429 10
550 1477_s_at 96433367 10
4321 34078_s_at 96512452 10
6798 36320_at 96152175 10
7509 36937_s_at -96987321 10
9367 38548_at -96786519 10
One could use 'contains' with mapply to get multiple regions, but
perhaps there's a more efficient way for such bulk queries.
Not sure about your concerns about just 'location'; the probes are a
common length, so you could incorporate this into the 'idx'
calculation in contains().
Probably someone else will offer up a ready-made solution.
Martin
> Many thanks
>
> --
> **************************************************************
> Daniel Brewer, Ph.D.
>
> Institute of Cancer Research
> Molecular Carcinogenesis
> Email: daniel.brewer at icr.ac.uk
> **************************************************************
>
> The Institute of Cancer Research: Royal Cancer Hospital, a
charitable Company Limited by Guarantee, Registered in England under
Company No. 534147 with its Registered Office at 123 Old Brompton
Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the
a...{{dropped:2}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
Hi Daniel,
Daniel Brewer wrote:
> Hi,
>
> I was wondering what the best way to find which Affymetrix probes
are
> within a specific genomic region (chromosome, start, stop). I am
not
> sure if Biomart nor the annotation.db can do this as they both go to
> some common ID first. The annotation.db stuff seems to only have
one
> position information too. The other option is to dwonload the
> annotation file from Affymetrix and load that in, but I would prefer
to
> avoid that if at all possible.
So it depends on exactly what you are trying to do here. The
annotation
package will give you the chromosome and start location for the
probeset. You can use biomaRt to get the chromosome, start, stop for
the
probeset as well:
> getBM(c("chromosome_name", "start_position","end_position"),
"affy_hg_u133_plus_2", "1007_s_at", mart)
chromosome_name start_position end_position
1 6 30956784 30975912
Of course the question posed is to find things within a specific
region,
not find the region given the probeset ID. If this is what you want,
then an annotation.db package might be what you want. Something like
library(mouse4302.db)
tab <- toTable(mouse4302CHRLOC)
ind <- tab[,3] == 6 & abs(tab[,2]) <= 53043660 &
abs(tab[,2]) >= 50000000
tab[ind,]
> head(tab[ind,])
probe_id start_location Chromosome
1481 1416884_at 51420614 6
1482 1432628_at 51420614 6
1483 1439421_x_at 51420614 6
1484 1448504_a_at 51420614 6
1485 1454305_at 51420614 6
2397 1422483_a_at -50512561 6
If you then want the probeset start, end, you could feed the Affy IDs
to
biomaRt.
But you specifically said probes, so you might actually want probes
instead of probesets. If you want to get that specific then you could
use the Biostrings package along with one of the genome packages
(warning - HUGE download) and a probe package to get the locations of
these probes. I'm not going to give code for that. Instead see the
vignette in the BSgenome package, which has lots of examples.
Best,
Jim
>
> Has anyone got any ideas.
>
> Many thanks
>
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
On Tue, Jun 17, 2008 at 12:35 PM, James W. MacDonald
<jmacdon at="" med.umich.edu=""> wrote:
> Hi Daniel,
>
> Daniel Brewer wrote:
>>
>> Hi,
>>
>> I was wondering what the best way to find which Affymetrix probes
are
>> within a specific genomic region (chromosome, start, stop). I am
not
>> sure if Biomart nor the annotation.db can do this as they both go
to
>> some common ID first. The annotation.db stuff seems to only have
one
>> position information too. The other option is to dwonload the
>> annotation file from Affymetrix and load that in, but I would
prefer to
>> avoid that if at all possible.
>
> So it depends on exactly what you are trying to do here. The
annotation
> package will give you the chromosome and start location for the
probeset.
> You can use biomaRt to get the chromosome, start, stop for the
probeset as
> well:
>
>> getBM(c("chromosome_name", "start_position","end_position"),
>> "affy_hg_u133_plus_2", "1007_s_at", mart)
> chromosome_name start_position end_position
> 1 6 30956784 30975912
>
> Of course the question posed is to find things within a specific
region, not
> find the region given the probeset ID. If this is what you want,
then an
> annotation.db package might be what you want. Something like
>
> library(mouse4302.db)
> tab <- toTable(mouse4302CHRLOC)
> ind <- tab[,3] == 6 & abs(tab[,2]) <= 53043660 &
> abs(tab[,2]) >= 50000000
> tab[ind,]
>
>> head(tab[ind,])
> probe_id start_location Chromosome
> 1481 1416884_at 51420614 6
> 1482 1432628_at 51420614 6
> 1483 1439421_x_at 51420614 6
> 1484 1448504_a_at 51420614 6
> 1485 1454305_at 51420614 6
> 2397 1422483_a_at -50512561 6
>
> If you then want the probeset start, end, you could feed the Affy
IDs to
> biomaRt.
>
> But you specifically said probes, so you might actually want probes
instead
> of probesets. If you want to get that specific then you could use
the
> Biostrings package along with one of the genome packages (warning -
HUGE
> download) and a probe package to get the locations of these probes.
I'm not
> going to give code for that. Instead see the vignette in the
BSgenome
> package, which has lots of examples.
Another alternative is to use the Ensembl core databases directly
rather than biomaRt. This is a bit tedious, but there is a treasure
trove of information embedded therein. To get the probe locations for
all mapped affy probes on the HG_U133A array, you could do:
> library(RMySQL)
> con <- dbConnect(MySQL(),user='anonymous',host='ensembldb.ensembl.or
g',dbname='homo_sapiens_core_47_36i')
> ### Warning--BIG SQL command coming....
> a=dbGetQuery(con,sprintf("select op.probeset as probeset,op.name as
probe_name,s.name as chromosome,of.seq_region_start as
start,of.seq_region_end as end,of.seq_region_strand as strand from
oligo_feature of inner join oligo_probe op on
op.oligo_probe_id=of.oligo_probe_id inner join oligo_array oa on
oa.oligo_array_id=op.oligo_array_id inner join seq_region s on
s.seq_region_id=of.seq_region_id where oa.name='%s' order by
probeset",'HG_U133A'))
> a[1:5,]
probeset probe_name chromosome start end strand
1 1007_s_at 467:181; 6 30975290 30975314 1
2 1007_s_at 365:115; 6 30975523 30975547 1
3 1007_s_at 532:139; 6 30975672 30975696 1
4 1007_s_at 86:557; 6 30975472 30975496 1
5 1007_s_at 308:15; 6 30975838 30975862 1
This takes a minute or so to complete (at least from the US). You can
then use normal R to do filtering on this dataframe. Hope that helps
somewhat.
Sean
>>
>> Has anyone got any ideas.
>>
>> Many thanks
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>