Search
Question: getBM returns shorter vectors than values
3
4.9 years ago by
Denmark
Lescai, Francesco370 wrote:
Hi, I have the same problem, and it's been this way since I used biomaRt I might say. is there any way to force getBM to return NA when the attribute corresponding to the filter cannot be found? At least when annotating your results you'd be able to get same length vectors, and it would be much easier to do that in data.frames. thanks for any suggestions, cheers, Francesco On 29 Aug 2013, at 05:40, Atul <atulkakrana@outlook.com<mailto:atulkakrana@outlook.com>> wrote: Hi All, I am using Oligo package to analyse samples generated using HuEx 1.0 ST v2 chip. The problem I am facing is with annotating the results. Here is my code (simplified): celFilesA <- list.celfiles() AF_data.A <- read.celfiles(celFilesA,pkgname='pd.huex.1.0.st.v2') AF.eset.RMA <- rma(AF_data.A,target='core') > dim(exprs(AF.eset.RMA)) [1] 22011 10 ##Attempt to annotate library(biomaRt) ID <- rownames(AF.eset.RMA) ensembl <- useMart('ensembl', dataset = 'hsapiens_gene_ensembl') Anno <- getBM(attributes=c("strand","transcript_start","chromosome_nam e","hgnc_symbol"),filters=c("affy_huex_1_0_st_v2"),values=ID,mart=ense mbl) > dim(Anno) [1] 1635 4 As you see, out of total 22011 genes/probeset I can annotate only 1635 genes/probesets. Is there any way I can get the annotations for all of the genes/probesets and add them back to my expression set (AF.eset.RMA). So, that annotations are included in the final results. Usually, with other chips I do this: ID <- featureNames(AF.eset.RMA) Symbol <- getSYMBOL(ID, 'mouse4302.db') Name <- as.character(lookUp(ID, "mouse4302.db", "GENENAME")) tmp <- data.frame(ID=ID, Symbol=Symbol, Name=Name,stringsAsFactors=F) tmp[tmp=="NA"] <- NA fData(AF.esetRMA) <- tmp And this is what I want to achieve in present case. I would appreciate your help. Thanks AK _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
modified 4.9 years ago by Steffen Durinck530 • written 4.9 years ago by Lescai, Francesco370

I feel your situation.  Guess it is complicated to modify things on the part of biomart.  But, Hey,

dplyr::left_join should be able to take care of the missing NA. (as mentioned, like a wrapper, via constructing a data frame using the input parameter "values")

0
4.9 years ago by
Steffen Durinck530 wrote:
Hi Francesco, That is correct, biomaRt doesn't return anything if it can find it. It is designed to work just like the BioMart web services at www.biomart.orgwhich behave the same. I usually add the filter as an attribute so I can match things up and figure out what did return a result. Your query would be: Anno <- getBM(attributes=c("affy_huex_1_0_st_v2","strand"," transcript_start","chromosome_name","hgnc_symbol"),filters= c("affy_huex_1_0_st_v2"),values=ID,mart=ensembl) If you want a vector back with the same length as ID and with NA's where you didn't get a result, you could write a wrapper function around getBM that does that for you. Best, Steffen On Wed, Sep 11, 2013 at 6:15 AM, Francesco Lescai < francesco.lescai@hum-gen.au.dk> wrote: > Hi, > I have the same problem, and it's been this way since I used biomaRt I > might say. > is there any way to force getBM to return NA when the attribute > corresponding to the filter cannot be found? > At least when annotating your results you'd be able to get same length > vectors, and it would be much easier to do that in data.frames. > > thanks for any suggestions, > cheers, > Francesco > > > On 29 Aug 2013, at 05:40, Atul <atulkakrana@outlook.com<mailto:> atulkakrana@outlook.com>> wrote: > > Hi All, > > I am using Oligo package to analyse samples generated using HuEx 1.0 ST v2 > chip. The problem I am facing is with annotating the results. > > Here is my code (simplified): > > celFilesA <- list.celfiles() > AF_data.A <- read.celfiles(celFilesA,pkgname='pd.huex.1.0.st.v2') > AF.eset.RMA <- rma(AF_data.A,target='core') > > > dim(exprs(AF.eset.RMA)) > [1] 22011 10 > > ##Attempt to annotate > library(biomaRt) > ID <- rownames(AF.eset.RMA) > ensembl <- useMart('ensembl', dataset = 'hsapiens_gene_ensembl') > Anno <- > getBM(attributes=c("strand","transcript_start","chromosome_name","hg nc_symbol"),filters=c("affy_huex_1_0_st_v2"),values=ID,mart=ensembl) > > > dim(Anno) > [1] 1635 4 > > As you see, out of total 22011 genes/probeset I can annotate only 1635 > genes/probesets. Is there any way I can get the annotations for all of the > genes/probesets and add them back to my expression set (AF.eset.RMA). So, > that annotations are included in the final results. > > > Usually, with other chips I do this: > ID <- featureNames(AF.eset.RMA) > Symbol <- getSYMBOL(ID, 'mouse4302.db') > Name <- as.character(lookUp(ID, "mouse4302.db", "GENENAME")) > tmp <- data.frame(ID=ID, Symbol=Symbol, Name=Name,stringsAsFactors=F) > tmp[tmp=="NA"] <- NA > fData(AF.esetRMA) <- tmp > > And this is what I want to achieve in present case. I would appreciate > your help. > > Thanks > > AK > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org<mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
Hi Steffen, thanks for your reply, yes it works this way :-) however, getBM doesn't seem to return results in the same order. here's a simple test > tesgenes [1] "ENSMUSG00000027255" "ENSMUSG00000020472" "ENSMUSG00000020807" "ENSMUSG00000086769" "ENSMUSG00000016024" > getBM(filters=c("ensembl_gene_id"), attributes=c("ensembl_gene_id", "external_gene_id"), values=tesgenes, mart=ensembl) ensembl_gene_id external_gene_id 1 ENSMUSG00000016024 Lbp 2 ENSMUSG00000020472 Zkscan17 3 ENSMUSG00000020807 4933427D14Rik 4 ENSMUSG00000027255 Arfgap2 5 ENSMUSG00000086769 Gm15587 therefore if I have a data.frame with gene IDs and I just make a cbind, it doesn't match. I solved it by merging the two data.frame by columns id like this MyResults <- merge( MyResults, getBM(filters=c("ensembl_gene_id"), attributes=c("ensembl_gene_id", "external_gene_id"), values= MyResults$geneID, mart=ensembl), by.x="geneID", by.y="ensembl_gene_id" ) is there any way to control getBM() to return data in the same order of the vector of values, or it is a behaviour due to the way the query works? thanks for your prompt reply, Francesco On 11 Sep 2013, at 17:46, Steffen Durinck <durinck.steffen@gene.com<mailto:durinck.steffen@gene.com>> wrote: Hi Francesco, That is correct, biomaRt doesn't return anything if it can find it. It is designed to work just like the BioMart web services at www.biomart.org<http: www.biomart.org=""/> which behave the same. I usually add the filter as an attribute so I can match things up and figure out what did return a result. Your query would be: Anno <- getBM(attributes=c("affy_huex_1_0_st_v2","strand","transcript_ start","chromosome_name","hgnc_symbol"),filters=c("affy_huex_1_0_st_v2 "),values=ID,mart=ensembl) If you want a vector back with the same length as ID and with NA's where you didn't get a result, you could write a wrapper function around getBM that does that for you. Best, Steffen On Wed, Sep 11, 2013 at 6:15 AM, Francesco Lescai <francesco.lescai @hum-gen.au.dk<mailto:francesco.lescai@hum-gen.au.dk="">> wrote: Hi, I have the same problem, and it's been this way since I used biomaRt I might say. is there any way to force getBM to return NA when the attribute corresponding to the filter cannot be found? At least when annotating your results you'd be able to get same length vectors, and it would be much easier to do that in data.frames. thanks for any suggestions, cheers, Francesco On 29 Aug 2013, at 05:40, Atul <atulkakrana@outlook.com<mailto:atulkak rana@outlook.com=""><mailto:atulkakrana@outlook.com<mailto:atulkakrana@ou tlook.com="">>> wrote: Hi All, I am using Oligo package to analyse samples generated using HuEx 1.0 ST v2 chip. The problem I am facing is with annotating the results. Here is my code (simplified): celFilesA <- list.celfiles() AF_data.A <- read.celfiles(celFilesA,pkgname='pd.huex.1.0.st.v2') AF.eset.RMA <- rma(AF_data.A,target='core') > dim(exprs(AF.eset.RMA)) [1] 22011 10 ##Attempt to annotate library(biomaRt) ID <- rownames(AF.eset.RMA) ensembl <- useMart('ensembl', dataset = 'hsapiens_gene_ensembl') Anno <- getBM(attributes=c("strand","transcript_start","chromosome_nam e","hgnc_symbol"),filters=c("affy_huex_1_0_st_v2"),values=ID,mart=ense mbl) > dim(Anno) [1] 1635 4 As you see, out of total 22011 genes/probeset I can annotate only 1635 genes/probesets. Is there any way I can get the annotations for all of the genes/probesets and add them back to my expression set (AF.eset.RMA). So, that annotations are included in the final results. Usually, with other chips I do this: ID <- featureNames(AF.eset.RMA) Symbol <- getSYMBOL(ID, 'mouse4302.db') Name <- as.character(lookUp(ID, "mouse4302.db", "GENENAME")) tmp <- data.frame(ID=ID, Symbol=Symbol, Name=Name,stringsAsFactors=F) tmp[tmp=="NA"] <- NA fData(AF.esetRMA) <- tmp And this is what I want to achieve in present case. I would appreciate your help. Thanks AK _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org><mailto:b ioconductor@r-project.org<mailto:bioconductor@r-project.org="">> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]] ADD REPLYlink written 4.9 years ago by Lescai, Francesco370 Hi Francesco This is due to the actual biomart server which is access by the Bioconductor package biomaRt. Unless, I am unaware of a recent change in the biomart server, there is now way to preserve the order of the input (or keep duplicates, or indicate which id does not have a result, etc). Of course, there is a quick and dirty (and bad) solution: You loop over your gene IDs and make an individual request for each gene.... Regards, Hans-Rudolf On 09/12/2013 11:21 AM, Francesco Lescai wrote: > Hi Steffen, > thanks for your reply, yes it works this way :-) > > however, getBM doesn't seem to return results in the same order. here's a simple test > >> tesgenes > [1] "ENSMUSG00000027255" "ENSMUSG00000020472" "ENSMUSG00000020807" "ENSMUSG00000086769" "ENSMUSG00000016024" >> getBM(filters=c("ensembl_gene_id"), attributes=c("ensembl_gene_id", "external_gene_id"), values=tesgenes, mart=ensembl) > ensembl_gene_id external_gene_id > 1 ENSMUSG00000016024 Lbp > 2 ENSMUSG00000020472 Zkscan17 > 3 ENSMUSG00000020807 4933427D14Rik > 4 ENSMUSG00000027255 Arfgap2 > 5 ENSMUSG00000086769 Gm15587 > > therefore if I have a data.frame with gene IDs and I just make a cbind, it doesn't match. > I solved it by merging the two data.frame by columns id like this > > MyResults <- merge( > MyResults, > getBM(filters=c("ensembl_gene_id"), attributes=c("ensembl_gene_id", "external_gene_id"), values= MyResults$geneID, mart=ensembl), > by.x="geneID", > by.y="ensembl_gene_id" > ) > > is there any way to control getBM() to return data in the same order of the vector of values, or it is a behaviour due to the way the query works? > > thanks for your prompt reply, > Francesco > > On 11 Sep 2013, at 17:46, Steffen Durinck <durinck.steffen at="" gene.com<mailto:durinck.steffen="" at="" gene.com="">> wrote: > > Hi Francesco, > > That is correct, biomaRt doesn't return anything if it can find it. It is designed to work just like the BioMart web services at www.biomart.org<http: www.biomart.org=""/> which behave the same. > I usually add the filter as an attribute so I can match things up and figure out what did return a result. > Your query would be: > > Anno <- getBM(attributes=c("affy_huex_1_0_st_v2","strand","transcrip t_start","chromosome_name","hgnc_symbol"),filters=c("affy_huex_1_0_st_ v2"),values=ID,mart=ensembl) > > If you want a vector back with the same length as ID and with NA's where you didn't get a result, you could write a wrapper function around getBM that does that for you. > > Best, > Steffen > > > On Wed, Sep 11, 2013 at 6:15 AM, Francesco Lescai <francesco.lescai at="" hum-gen.au.dk<mailto:francesco.lescai="" at="" hum-gen.au.dk="">> wrote: > Hi, > I have the same problem, and it's been this way since I used biomaRt I might say. > is there any way to force getBM to return NA when the attribute corresponding to the filter cannot be found? > At least when annotating your results you'd be able to get same length vectors, and it would be much easier to do that in data.frames. > > thanks for any suggestions, > cheers, > Francesco > > > On 29 Aug 2013, at 05:40, Atul <atulkakrana at="" outlook.com<mailto:atulkakrana="" at="" outlook.com=""><mailto:atulkakrana at="" outlook.com<mailto:atulkakrana="" at="" outlook.com="">>> wrote: > > Hi All, > > I am using Oligo package to analyse samples generated using HuEx 1.0 ST v2 chip. The problem I am facing is with annotating the results. > > Here is my code (simplified): > > celFilesA <- list.celfiles() > AF_data.A <- read.celfiles(celFilesA,pkgname='pd.huex.1.0.st.v2') > AF.eset.RMA <- rma(AF_data.A,target='core') > >> dim(exprs(AF.eset.RMA)) > [1] 22011 10 > > ##Attempt to annotate > library(biomaRt) > ID <- rownames(AF.eset.RMA) > ensembl <- useMart('ensembl', dataset = 'hsapiens_gene_ensembl') > Anno <- getBM(attributes=c("strand","transcript_start","chromosome_n ame","hgnc_symbol"),filters=c("affy_huex_1_0_st_v2"),values=ID,mart=en sembl) > >> dim(Anno) > [1] 1635 4 > > As you see, out of total 22011 genes/probeset I can annotate only 1635 genes/probesets. Is there any way I can get the annotations for all of the genes/probesets and add them back to my expression set (AF.eset.RMA). So, that annotations are included in the final results. > > > Usually, with other chips I do this: > ID <- featureNames(AF.eset.RMA) > Symbol <- getSYMBOL(ID, 'mouse4302.db') > Name <- as.character(lookUp(ID, "mouse4302.db", "GENENAME")) > tmp <- data.frame(ID=ID, Symbol=Symbol, Name=Name,stringsAsFactors=F) > tmp[tmp=="NA"] <- NA > fData(AF.esetRMA) <- tmp > > And this is what I want to achieve in present case. I would appreciate your help. > > Thanks > > AK > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >