Search
Question: Retrieving probes mapping to gene ID in microarray analysis
0
gravatar for n.lu
6 weeks ago by
n.lu0
n.lu0 wrote:

Hi, 

How can one retrieve all the probes that map to a specific gene ID? 

My data is form GeneChip™ Mouse Gene 2.0 ST Array, Affymetrix. 

 

ADD COMMENTlink modified 6 weeks ago by Mike Smith2.1k • written 6 weeks ago by n.lu0
2
gravatar for Mike Smith
6 weeks ago by
Mike Smith2.1k
EMBL Heidelberg / de.NBI
Mike Smith2.1k wrote:

How about something like this using biomaRt.  This might only be valid for the Mouse Gene ST 2.1 though, Ensembl only seem to have data for 1.0 and 2.1.

library(biomaRt)

## set up biomart
ensembl_mart <- useMart("ensembl")
ensembl_mart <- useDataset("mmusculus_gene_ensembl", mart = ensembl_mart)

## this is the MGI symbol for the gene we're interested in
gene_id <- "Hoxa5"
 
## return the ENSEMBL gene ID, AFFY probe ID, & MGI gene symbol 
getBM(mart = ensembl_mart,
      filters = "mgi_symbol",
      values = gene_id,
      attributes = c("ensembl_gene_id", "affy_mogene_2_1_st_v1", "mgi_symbol"))
     ensembl_gene_id affy_mogene_2_1_st_v1 mgi_symbol
1 ENSMUSG00000038253              17466833      Hoxa5
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Mike Smith2.1k

Thank you. That helps me get the probeset ID. Do you know if there is a way to go deeper than that and get all the probes that map to the probe set?

I have Mouse Gene ST 2.0, I am particularly interested in "Fas" gene. The probe ID is 17358797, and I can see from the affymetrix  .csv file that the number of total probes mapping to it is 30 - I was wondering if it is possible to identify them (and access them in raw data)? 

 

 

ADD REPLYlink written 6 weeks ago by n.lu0
1
> con <- db(pd.mogene.2.0.st)

> fid <- dbGetQuery(con, "select fid from pmfeature inner join featureSet using(fsetid) where transcript_cluster_id='17358797';")[,1]
> fid
 [1] 1452618 2211945  112562   36741 1116444 1689770  174852 2501386 1925766
[10]  366696  915875  839135 1957419 1096619  336595 1593664 1584687 1086107
[19]  519674 2269721 1008448  535023 1770936 1218704  575790  144930  503433
[28]  231489 1517387 2216974

And the fid is the row from the exprs slot in your GeneFeatureSet, so you can get the probe values by

dat <- read.celfiles(list.celfiles())

fas.prbs <- exprs(dat)[fid,]

 

ADD REPLYlink written 6 weeks ago by James W. MacDonald45k

Thank you!!

Is it also possible to access their sequences?

ADD REPLYlink written 6 weeks ago by n.lu0
1

You need to download the pgf file for that:

> library(affxparser)
> z <- readPgf("/data/BioC/annotation_packages/mogene/MoGene-2_0-st.v1.pgf")

> data.frame(probeid = z$probeId[z$probeId %in% fid], sequence = z$probeSequence[z$probeId %in% fid])
   probeid                  sequence
1  1452618 TACGACACCTAGACCCGACAGGACG
2  2211945 ACGACACCTAGACCCGACAGGACGG
3   112562 ACACCTAGACCCGACAGGACGGAGA
4    36741 TCCTCCGGGTAAAACGACAGTTGGT
5  1116444 GACGAGTCTTCCTAATATAGTTCCT
6  1689770 CTCCGCCCAAGCACTTTGACTATTT
7   174852 CGAGTGTCAATTCTCAAGTATGAGT
8  2501386 CCATGATTATCGTAGAGGCTCTCAA
9  1925766 TCGGGCAACCTCACTAAGTTGAAAG
10  366696 CTGTATGAATACCTCTCTGAGAAAT
11  915875 CGACTCGTAAAACCTCTGGTAGTCT
12  839135 ACCCACCTACGTTGAAATTACTTCT
13 1957419 GGTACGTGTCTTCCCTTCCTCATGT
14 1096619 ATGTACCTGTTCTTGGTAATACGAC
15  336595 TGACGCTAAGAGGACCGACACTTGT
16 1593664 CACTTGTGACACAAGCGACGCGGAG
17 1584687 ACTGGGTCTTATGGTTCACGTTCAC
18 1086107 TGTACCTTGGGAACTCGGCACGTGT
19  519674 CGTTGGTCGTTATGTTTGACGTCCT
20 2269721 ACACTTGTACCTTGGGAACTCGGCA
21 1008448 GTCTTTAGCGGATACCAACAACTGG
22  535023 GGGTCTTTAGCGGATACCAACAACT
23 1770936 CTGGTAGGAACAAAACAATTAAGGT
24 1218704 CTCTGTCCTACTGGGACTTAGATCT
25  575790 ATATATAGCTTTCATGGCCTTTTCT
26  144930 CTTTGGTATGGTTACTTACGGAGTT
27  503433 CGGCTTACAGCGTCTTGGAATCTAT
28  231489 ACGAGGACACGACCATGGTTAGAGT
29 1517387 CGTCAACGACTCTACTTGGTAAAAG
30 2216974 ATGAGCGTCAATTAACTTCTGGAAG
>
ADD REPLYlink written 6 weeks ago by James W. MacDonald45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 141 users visited in the last hour