How about something like this using biomaRt. This might only be valid for the Mouse Gene ST 2.1 though, Ensembl only seem to have data for 1.0 and 2.1.
library(biomaRt)
## set up biomart
ensembl_mart <- useMart("ensembl")
ensembl_mart <- useDataset("mmusculus_gene_ensembl", mart = ensembl_mart)
## this is the MGI symbol for the gene we're interested in
gene_id <- "Hoxa5"
## return the ENSEMBL gene ID, AFFY probe ID, & MGI gene symbol
getBM(mart = ensembl_mart,
filters = "mgi_symbol",
values = gene_id,
attributes = c("ensembl_gene_id", "affy_mogene_2_1_st_v1", "mgi_symbol"))
Thank you. That helps me get the probeset ID. Do you know if there is a way to go deeper than that and get all the probes that map to the probe set?
I have Mouse Gene ST 2.0, I am particularly interested in "Fas" gene. The probe ID is 17358797, and I can see from the affymetrix .csv file that the number of total probes mapping to it is 30 - I was wondering if it is possible to identify them (and access them in raw data)?
Thank you. That helps me get the probeset ID. Do you know if there is a way to go deeper than that and get all the probes that map to the probe set?
I have Mouse Gene ST 2.0, I am particularly interested in "Fas" gene. The probe ID is 17358797, and I can see from the affymetrix .csv file that the number of total probes mapping to it is 30 - I was wondering if it is possible to identify them (and access them in raw data)?
And the fid is the row from the exprs slot in your GeneFeatureSet, so you can get the probe values by
Thank you!!
Is it also possible to access their sequences?
You need to download the pgf file for that:
> library(affxparser) > z <- readPgf("/data/BioC/annotation_packages/mogene/MoGene-2_0-st.v1.pgf") > data.frame(probeid = z$probeId[z$probeId %in% fid], sequence = z$probeSequence[z$probeId %in% fid]) probeid sequence 1 1452618 TACGACACCTAGACCCGACAGGACG 2 2211945 ACGACACCTAGACCCGACAGGACGG 3 112562 ACACCTAGACCCGACAGGACGGAGA 4 36741 TCCTCCGGGTAAAACGACAGTTGGT 5 1116444 GACGAGTCTTCCTAATATAGTTCCT 6 1689770 CTCCGCCCAAGCACTTTGACTATTT 7 174852 CGAGTGTCAATTCTCAAGTATGAGT 8 2501386 CCATGATTATCGTAGAGGCTCTCAA 9 1925766 TCGGGCAACCTCACTAAGTTGAAAG 10 366696 CTGTATGAATACCTCTCTGAGAAAT 11 915875 CGACTCGTAAAACCTCTGGTAGTCT 12 839135 ACCCACCTACGTTGAAATTACTTCT 13 1957419 GGTACGTGTCTTCCCTTCCTCATGT 14 1096619 ATGTACCTGTTCTTGGTAATACGAC 15 336595 TGACGCTAAGAGGACCGACACTTGT 16 1593664 CACTTGTGACACAAGCGACGCGGAG 17 1584687 ACTGGGTCTTATGGTTCACGTTCAC 18 1086107 TGTACCTTGGGAACTCGGCACGTGT 19 519674 CGTTGGTCGTTATGTTTGACGTCCT 20 2269721 ACACTTGTACCTTGGGAACTCGGCA 21 1008448 GTCTTTAGCGGATACCAACAACTGG 22 535023 GGGTCTTTAGCGGATACCAACAACT 23 1770936 CTGGTAGGAACAAAACAATTAAGGT 24 1218704 CTCTGTCCTACTGGGACTTAGATCT 25 575790 ATATATAGCTTTCATGGCCTTTTCT 26 144930 CTTTGGTATGGTTACTTACGGAGTT 27 503433 CGGCTTACAGCGTCTTGGAATCTAT 28 231489 ACGAGGACACGACCATGGTTAGAGT 29 1517387 CGTCAACGACTCTACTTGGTAAAAG 30 2216974 ATGAGCGTCAATTAACTTCTGGAAG >