Annotation using the affy package
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 11.0 years ago
I am using the affy package to analyze a set of GSM files downloaded from GEO. In addition to providing a table with probe ids, expression levels and p values, I would like to have the ensembl ids associated with the probe ids. I loaded in the corresponding platform data (in my case mouse4302) but I am not quite sure how to go about the connection of the data. Here is the way I am building the analysis table: -- output of sessionInfo(): source("http://bioconductor.org/biocLite.R") library(affy) filenames <- c("1.CEL","2.CEL") affy.data <- ReadAffy(filenames = as.character(filenames)) platform <- annotation(affy.data),".db" biocLite(platform) library(platform) eset_rma <- rma(affy.data) eset_pma <- mas5calls(affy.data) my_frame <- data.frame(exprs(eset_rma), assayDataElement(eset_pma, "se.exprs")) my_frame <- my_frame[, sort(names(my_frame))] write.table(my_frame, file="export.tsv", sep="\t", col.names = NA) -- Sent via the guest posting facility at bioconductor.org.
GO probe affy GO probe affy • 1.6k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States
Hi Jens, On 10/23/2012 3:48 PM, Jens Lichtenberg [guest] wrote: > I am using the affy package to analyze a set of GSM files downloaded from GEO. In addition to providing a table with probe ids, expression levels and p values, I would like to have the ensembl ids associated with the probe ids. > > I loaded in the corresponding platform data (in my case mouse4302) but I am not quite sure how to go about the connection of the data. > > Here is the way I am building the analysis table: > > -- output of sessionInfo(): > > source("http://bioconductor.org/biocLite.R") > library(affy) > > filenames<- c("1.CEL","2.CEL") > > affy.data<- ReadAffy(filenames = as.character(filenames)) > platform<- annotation(affy.data),".db" > biocLite(platform) > library(platform) > > eset_rma<- rma(affy.data) > eset_pma<- mas5calls(affy.data) > my_frame<- data.frame(exprs(eset_rma), assayDataElement(eset_pma, "se.exprs")) > my_frame<- my_frame[, sort(names(my_frame))] > write.table(my_frame, file="export.tsv", sep="\t", col.names = NA) ens <- select(mouse4302.db, featureNames(eset_pma), "ENSEMBL") If all the probeset IDs in 'ens' and 'my_frame' match up, you can simply cbind() to my_frame. I assume they will, but I would check to be sure. Otherwise you can just merge(). Best, Jim > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Hi James, Thank you so much for your help. I am successfully building ens (had to update to 2.15 to use select) but for some reason I am having problems merging/binding the data into the same frame > data.frame(ens, exprs(eset_rma), assayDataElement(eset_pma, "se.exprs")) Error in data.frame(ens, exprs(eset_rma), assayDataElement(eset_pma, "se.exprs")) : arguments imply differing number of rows: 46603, 45101 > merge(ens,exprs(eset_rma)) Error in rep.intrep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) : cannot allocate vector of length 2101841903 Any idea how I could resolve this issue? Jens On Tue, Oct 23, 2012 at 5:09 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Jens, > > > On 10/23/2012 3:48 PM, Jens Lichtenberg [guest] wrote: > >> I am using the affy package to analyze a set of GSM files downloaded from >> GEO. In addition to providing a table with probe ids, expression levels and >> p values, I would like to have the ensembl ids associated with the probe >> ids. >> >> I loaded in the corresponding platform data (in my case mouse4302) but I >> am not quite sure how to go about the connection of the data. >> >> Here is the way I am building the analysis table: >> >> -- output of sessionInfo(): >> >> source("http://bioconductor.**org/biocLite.R<http: bioconductor.or="" g="" bioclite.r=""> >> ") >> library(affy) >> >> filenames<- c("1.CEL","2.CEL") >> >> affy.data<- ReadAffy(filenames = as.character(filenames)) >> platform<- annotation(affy.data),".db" >> biocLite(platform) >> library(platform) >> >> eset_rma<- rma(affy.data) >> eset_pma<- mas5calls(affy.data) >> my_frame<- data.frame(exprs(eset_rma), assayDataElement(eset_pma, >> "se.exprs")) >> my_frame<- my_frame[, sort(names(my_frame))] >> write.table(my_frame, file="export.tsv", sep="\t", col.names = NA) >> > > ens <- select(mouse4302.db, featureNames(eset_pma), "ENSEMBL") > > If all the probeset IDs in 'ens' and 'my_frame' match up, you can simply > cbind() to my_frame. I assume they will, but I would check to be sure. > Otherwise you can just merge(). > > Best, > > Jim > > > > >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Jens, On 10/24/2012 9:52 AM, Jens Lichtenberg wrote: > Hi James, > > Thank you so much for your help. I am successfully building ens (had > to update to 2.15 to use select) but for some reason I am having > problems merging/binding the data into the same frame > > > data.frame(ens, exprs(eset_rma), assayDataElement(eset_pma, "se.exprs")) > Error in data.frame(ens, exprs(eset_rma), assayDataElement(eset_pma, > "se.exprs")) : > arguments imply differing number of rows: 46603, 45101 Indeed. When you first generated your ens data.frame, you got this message: > ens <- select(mouse4302.db, Lkeys(mouse4302ENSEMBL), "ENSEMBL") Warning message: In .generateExtraRows(tab, keys, jointype) : 'select' resulted in 1:many mapping between keys and return rows Which means that there are multiple probesetID -> ENSEMBL mappings for some probesets. So now you have to decide what you want to do with these multiple mapped probesets. You could either decide that a single unique mapping is sufficient, and do this: > ens2 <- ens[!duplicated(ens$PROBEID),] > nrow(ens2) [1] 45101 And you can then test to see if ens2 can be cbind()ed to eset_pma: all.equal(ens2$PROBEID, featureNames(eset_pma), check.attributes = FALSE) and if TRUE, cbind() away. Or if you want all of the ENSEMBL IDs, you can just collapse them to comma-separated vectors and then incorporate: ens3 <- tapply(ens$ENSEMBL, ens[,1], paste, collapse = ",") data.frame(ens3[featureNames(eset_puma)], <other args="" go="" here="">) > > > merge(ens,exprs(eset_rma)) > Error in rep.int <http: rep.int="">rep.int > <http: rep.int="">(seq_len(nx), rep.int <http: rep.int="">(rep.fac, nx)), > orep) : > cannot allocate vector of length 2101841903 > > Any idea how I could resolve this issue? Note that you need to read the help page for the function you are using. What do you think happens with merge() if you don't specify the columns upon which you intend to merge? You are trying to merge two things, each of which has less that 47K rows. But the error says something about a vector that is over 2.1 billion items. That should make you say 'Wait, WHAT? What did I do?' and then investigate. See ?merge. Best, Jim > > Jens > > On Tue, Oct 23, 2012 at 5:09 PM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi Jens, > > > On 10/23/2012 3:48 PM, Jens Lichtenberg [guest] wrote: > > I am using the affy package to analyze a set of GSM files > downloaded from GEO. In addition to providing a table with > probe ids, expression levels and p values, I would like to > have the ensembl ids associated with the probe ids. > > I loaded in the corresponding platform data (in my case > mouse4302) but I am not quite sure how to go about the > connection of the data. > > Here is the way I am building the analysis table: > > -- output of sessionInfo(): > > source("http://bioconductor.org/biocLite.R") > library(affy) > > filenames<- c("1.CEL","2.CEL") > > affy.data<- ReadAffy(filenames = as.character(filenames)) > platform<- annotation(affy.data),".db" > biocLite(platform) > library(platform) > > eset_rma<- rma(affy.data) > eset_pma<- mas5calls(affy.data) > my_frame<- data.frame(exprs(eset_rma), > assayDataElement(eset_pma, "se.exprs")) > my_frame<- my_frame[, sort(names(my_frame))] > write.table(my_frame, file="export.tsv", sep="\t", col.names = NA) > > > ens <- select(mouse4302.db, featureNames(eset_pma), "ENSEMBL") > > If all the probeset IDs in 'ens' and 'my_frame' match up, you can > simply cbind() to my_frame. I assume they will, but I would check > to be sure. Otherwise you can just merge(). > > Best, > > Jim > > > > > -- > Sent via the guest posting facility at bioconductor.org > <http: bioconductor.org="">. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY

Login before adding your answer.

Traffic: 1167 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6