getHomolog in biomaRt
1
0
Entering edit mode
@steve-pederson-2103
Last seen 10.2 years ago
Hi, I'm still on a steep learning curve with R & am trying to convert a large batch of mouse entrezIDs to homologous human entrezID & when sending as a batch to biomaRt the search result doesn't contain the query string (is this possible as a suggestion for the next update?), so is unable to be matched to the original. For example: > getHomolog( id = c("73663","66645","74855"), to.type = "entrezgene", from.type = "entrezgene", from.mart = mouse, to.mart=human ) V1 1 55269 As a result, I'm sending one at a time via a quick function that I set up. The batch regularly seems to fail & I get the following error message: Error in read.table(con, sep = "\t", header = FALSE, quote = "", comment.char = "", : no lines available in input This is an example of the exact code that causes it: library(biomaRt) human <- useMart("ensembl","hsapiens_gene_ensembl") mouse <- useMart("ensembl","mmusculus_gene_ensembl") getHomolog( id = "380768", to.type = "entrezgene", from.type = "entrezgene", from.mart = mouse, to.mart=human ) The response is not NULL, as my code is set up to handle this response. My main question is, does anyone know how do I stop the loop aborting when I receive this error message, which I think is external? If I can record which specific IDs are causing the error, I could exclude them from the original batch, but the error-handling is a bit murky to my reading in the R help. My actual function is included below (biomaRt.conversion). Unfortunately, I don't have any MySQL experience (yet) so that isn't an option for me as an alternative. The list is derived from those unable to be matched from ProbeMatchDB2.0, as that database maps via Unigene http://brainarray.mbni.med.umich.edu/Brainarray/Database/ProbeMatchDB/ ncbi_probmatch_para_step1.asp Thanks, Steve biomaRt.conversion <- function(x,from.id,to.id,from.sp,to.sp) { # x is the initial list of ids # from.id & to.id are the type of codes (e.g entrez or unigene) # from.mart & to.mart can only be human or mouse # Warnings will need to be suppressed in the case of no match existing homologs <- c() no.homolog <- c() if (from.sp=="human") mart1 <-useMart("ensembl","hsapiens_gene_ensembl") if (to.sp=="human") mart2 <- useMart("ensembl","hsapiens_gene_ensembl") if (from.sp=="mouse") mart1 <-useMart("ensembl","mmusculus_gene_ensembl") if (to.sp=="mouse") mart2 <- useMart("ensembl","mmusculus_gene_ensembl") for (i in 1:length(x)) { suppressWarnings(hum <- getHomolog( id = x[i], to.type=to.id, from.type =from.id, from.mart = mart1, to.mart = mart2)) if (is.null(hum)==FALSE) # if a homolog was found { #A duplicate removal stage if(dim(hum)[1]>1) { j=1 # the first entry in hum to check for duplicates k=dim(hum)[1] while(j<k) {="" if(length(which(hum="=hum[j]))">1)# if there is a duplicate { hum <- hum[-(which(hum==hum[j])[-1]),] #removes all the duplicates except the first #reset the values if(is.null(dim(hum)[1])==TRUE) { k=0 #this will exit the loop if "hum" is now a single value } else { k=dim(hum)[1] j=j+1 } } } } for (j in 1:length(hum)) { homologs <- rbind(homologs,c(x[i],hum[j])) } } else #if no homolog was found { no.homolog <- c(no.homolog,x[i]) } } colnames(homologs) <- c(paste(from.sp,"ID",sep="."),paste(to.sp,"ID",sep=".")) list(homologs=data.frame(homologs),no.homolog=no.homolog) }
convert biomaRt convert biomaRt • 1.6k views
ADD COMMENT
0
Entering edit mode
@steffen-durinck-1780
Last seen 10.2 years ago
Hi Steve, Which version of biomaRt are you using? I would recommend using the devel version, as this will return both the query id and it's homolog id. >human=useMart("ensembl", dataset="hsapiens_gene_ensembl") >mouse = useMart("ensembl", dataset="mmusculus_gene_ensembl") > getHomolog( id = c("66645","64058"), to.type = "entrezgene",from.type = "entrezgene", from.mart = mouse, to.mart=human ) V1 V2 1 64058 64065 2 66645 55269 > sessionInfo() R version 2.4.0 (2006-10-03) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US. UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8 ;LC_IDENTIFICATION=C attached base packages: [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" [7] "base" other attached packages: biomaRt RCurl XML "1.9.22" "0.8-0" "1.4-1" Cheers, Steffen Steve Pederson wrote: > Hi, > > I'm still on a steep learning curve with R & am trying to convert a > large batch of mouse entrezIDs to homologous human entrezID & when > sending as a batch to biomaRt the search result doesn't contain the > query string (is this possible as a suggestion for the next update?), so > is unable to be matched to the original. For example: > > > getHomolog( id = c("73663","66645","74855"), to.type = "entrezgene", > from.type = "entrezgene", from.mart = mouse, to.mart=human ) > V1 > 1 55269 > > As a result, I'm sending one at a time via a quick function that I set > up. The batch regularly seems to fail & I get the following error message: > Error in read.table(con, sep = "\t", header = FALSE, quote = "", > comment.char = "", : > no lines available in input > > This is an example of the exact code that causes it: > library(biomaRt) > human <- useMart("ensembl","hsapiens_gene_ensembl") > mouse <- useMart("ensembl","mmusculus_gene_ensembl") > getHomolog( id = "380768", to.type = "entrezgene", from.type = > "entrezgene", from.mart = mouse, to.mart=human ) > > The response is not NULL, as my code is set up to handle this response. > > My main question is, does anyone know how do I stop the loop aborting > when I receive this error message, which I think is external? If I can > record which specific IDs are causing the error, I could exclude them > from the original batch, but the error-handling is a bit murky to my > reading in the R help. My actual function is included below > (biomaRt.conversion). > > Unfortunately, I don't have any MySQL experience (yet) so that isn't an > option for me as an alternative. > > The list is derived from those unable to be matched from > ProbeMatchDB2.0, as that database maps via Unigene > http://brainarray.mbni.med.umich.edu/Brainarray/Database/ProbeMatchD B/ncbi_probmatch_para_step1.asp > > Thanks, > > Steve > > > > biomaRt.conversion <- function(x,from.id,to.id,from.sp,to.sp) > { > # x is the initial list of ids > # from.id & to.id are the type of codes (e.g entrez or unigene) > # from.mart & to.mart can only be human or mouse > # Warnings will need to be suppressed in the case of no match existing > homologs <- c() > no.homolog <- c() > if (from.sp=="human") mart1 > <-useMart("ensembl","hsapiens_gene_ensembl") > if (to.sp=="human") mart2 <- useMart("ensembl","hsapiens_gene_ensembl") > if (from.sp=="mouse") mart1 > <-useMart("ensembl","mmusculus_gene_ensembl") > if (to.sp=="mouse") mart2 <- > useMart("ensembl","mmusculus_gene_ensembl") > for (i in 1:length(x)) > { > suppressWarnings(hum <- getHomolog( id = x[i], to.type=to.id, > from.type =from.id, from.mart = mart1, to.mart = mart2)) > if (is.null(hum)==FALSE) # if a homolog was found > { > #A duplicate removal stage > if(dim(hum)[1]>1) > { > j=1 # the first entry in hum to check for duplicates > k=dim(hum)[1] > while(j<k)> { > if(length(which(hum==hum[j]))>1)# if there is a > duplicate > { > hum <- hum[-(which(hum==hum[j])[-1]),] #removes > all the duplicates except the first > #reset the values > if(is.null(dim(hum)[1])==TRUE) > { > k=0 #this will exit the loop if "hum" is > now a single value > } > else > { > k=dim(hum)[1] > j=j+1 > } > } > } > } > > for (j in 1:length(hum)) > { > homologs <- rbind(homologs,c(x[i],hum[j])) > } > > } > else #if no homolog was found > { > no.homolog <- c(no.homolog,x[i]) > } > } > colnames(homologs) <- > c(paste(from.sp,"ID",sep="."),paste(to.sp,"ID",sep=".")) > list(homologs=data.frame(homologs),no.homolog=no.homolog) > } > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877
ADD COMMENT
0
Entering edit mode
Hi Steffen, Thanks for the response & that sorted my problem out rather well. I had been using biomaRt 1.8.2. Cheers, Steve Steffen Durinck wrote: > Hi Steve, > > Which version of biomaRt are you using? > I would recommend using the devel version, as this will return both the > query id and it's homolog id. > > >human=useMart("ensembl", dataset="hsapiens_gene_ensembl") > >mouse = useMart("ensembl", dataset="mmusculus_gene_ensembl") > > getHomolog( id = c("66645","64058"), to.type = "entrezgene",from.type > = "entrezgene", from.mart = mouse, to.mart=human ) > V1 V2 > 1 64058 64065 > 2 66645 55269 > > > > sessionInfo() > R version 2.4.0 (2006-10-03) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_U S.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF -8;LC_IDENTIFICATION=C > > > attached base packages: > [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" > [7] "base" > > other attached packages: > biomaRt RCurl XML > "1.9.22" "0.8-0" "1.4-1" > > Cheers, > Steffen > > Steve Pederson wrote: >> Hi, >> >> I'm still on a steep learning curve with R & am trying to convert a >> large batch of mouse entrezIDs to homologous human entrezID & when >> sending as a batch to biomaRt the search result doesn't contain the >> query string (is this possible as a suggestion for the next update?), >> so is unable to be matched to the original. For example: >> >> > getHomolog( id = c("73663","66645","74855"), to.type = >> "entrezgene", from.type = "entrezgene", from.mart = mouse, >> to.mart=human ) >> V1 >> 1 55269 >> >> As a result, I'm sending one at a time via a quick function that I set >> up. The batch regularly seems to fail & I get the following error >> message: >> Error in read.table(con, sep = "\t", header = FALSE, quote = "", >> comment.char = "", : >> no lines available in input >> >> This is an example of the exact code that causes it: >> library(biomaRt) >> human <- useMart("ensembl","hsapiens_gene_ensembl") >> mouse <- useMart("ensembl","mmusculus_gene_ensembl") >> getHomolog( id = "380768", to.type = "entrezgene", from.type = >> "entrezgene", from.mart = mouse, to.mart=human ) >> >> The response is not NULL, as my code is set up to handle this response. >> >> My main question is, does anyone know how do I stop the loop aborting >> when I receive this error message, which I think is external? If I can >> record which specific IDs are causing the error, I could exclude them >> from the original batch, but the error-handling is a bit murky to my >> reading in the R help. My actual function is included below >> (biomaRt.conversion). >> >> Unfortunately, I don't have any MySQL experience (yet) so that isn't >> an option for me as an alternative. >> >> The list is derived from those unable to be matched from >> ProbeMatchDB2.0, as that database maps via Unigene >> http://brainarray.mbni.med.umich.edu/Brainarray/Database/ProbeMatch DB/ncbi_probmatch_para_step1.asp >> >> >> Thanks, >> >> Steve >> >> >> >> biomaRt.conversion <- function(x,from.id,to.id,from.sp,to.sp) >> { >> # x is the initial list of ids >> # from.id & to.id are the type of codes (e.g entrez or unigene) >> # from.mart & to.mart can only be human or mouse >> # Warnings will need to be suppressed in the case of no match >> existing >> homologs <- c() >> no.homolog <- c() >> if (from.sp=="human") mart1 >> <-useMart("ensembl","hsapiens_gene_ensembl") >> if (to.sp=="human") mart2 <- >> useMart("ensembl","hsapiens_gene_ensembl") >> if (from.sp=="mouse") mart1 >> <-useMart("ensembl","mmusculus_gene_ensembl") >> if (to.sp=="mouse") mart2 <- >> useMart("ensembl","mmusculus_gene_ensembl") >> for (i in 1:length(x)) >> { >> suppressWarnings(hum <- getHomolog( id = x[i], to.type=to.id, >> from.type =from.id, from.mart = mart1, to.mart = mart2)) >> if (is.null(hum)==FALSE) # if a homolog was found >> { >> #A duplicate removal stage >> if(dim(hum)[1]>1) >> { >> j=1 # the first entry in hum to check for duplicates >> k=dim(hum)[1] >> while(j<k)>> { >> if(length(which(hum==hum[j]))>1)# if there is a >> duplicate >> { >> hum <- hum[-(which(hum==hum[j])[-1]),] >> #removes all the duplicates except the first >> #reset the values >> if(is.null(dim(hum)[1])==TRUE) >> { >> k=0 #this will exit the loop if "hum" is >> now a single value >> } >> else >> { >> k=dim(hum)[1] >> j=j+1 >> } >> } >> } >> } >> >> for (j in 1:length(hum)) >> { >> homologs <- rbind(homologs,c(x[i],hum[j])) >> } >> >> } >> else #if no homolog was found >> { >> no.homolog <- c(no.homolog,x[i]) >> } >> } >> colnames(homologs) <- >> c(paste(from.sp,"ID",sep="."),paste(to.sp,"ID",sep=".")) >> list(homologs=data.frame(homologs),no.homolog=no.homolog) >> } >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >
ADD REPLY

Login before adding your answer.

Traffic: 556 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6