Search
Question: make a dash instead those gene that do not have ortologue
0
gravatar for Bioinformatics
4 months ago by
Bioinformatics30 wrote:

I want to convert a set of genes which I can use the biomart to convert them 


    musGenes <- c("Hmmr", "Tlx3","STSRAAA1", "Cpeb4")
    convertMouseGeneList <- function(x){
    require("biomaRt")
    human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
    mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
    genesV2 = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", values = x , mart = mouse, attributesL = c("hgnc_symbol"), martL = human, uniqueRows=T)
    humanx <- unique(genesV2[, 2])
    return(humanx)
    }


It will return only those that are in the database but does not show which ones are not. Is there a function in biomart to not let it overlap the data ? for example in this case it should return empty for "STSRAAA1"


for example the desired output should look like this

    Hmmr 
    Tlx3
    - 
    Cpeb4

 

ADD COMMENTlink modified 4 months ago by Mike Smith2.7k • written 4 months ago by Bioinformatics30
0
gravatar for Mike Smith
4 months ago by
Mike Smith2.7k
EMBL Heidelberg / de.NBI
Mike Smith2.7k wrote:

Unfortunately, you can't do this directly in biomaRt, the BioMart service gives no indication of input values that don't return anything.  However you can do it yourself if you make sure you return the query values in your result - then you can see which are missing. 

First off we'll set up the same query you did before, but we won't select only the human genes:

musGenes <- c("Hmmr", "Tlx3", "STSRAAA1", "Cpeb4")

human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
genesV2 = getLDS(attributes = c("mgi_symbol"), 
                   filters = "mgi_symbol", 
                   values = musGenes, 
                   mart = mouse, 
                   attributesL = c("hgnc_symbol"), 
                   martL = human, uniqueRows=T)

From here, the strategy is to identify whether there are any of the original mouse genes missing in the result.  If there are then we make a second data.frame containing those mouse symbols paired with "-", and then bind this with the original results.

## Find any of our original values not in the results
musMissing <- setdiff(musGenes, genesV2[,'MGI.symbol'])

## Create data.frame with missing genes paired with "-"
mapMissing <- data.frame(MGI.symbol = musMissing, 
              HGNC.symbol = rep("-", length(musMissing)))

## Combine
results <- rbind(genesV2, mapMissing)

> results
  MGI.symbol HGNC.symbol
1      Cpeb4       CPEB4
2       Tlx3        TLX3
3       Hmmr        HMMR
4   STSRAAA1           -

One thing to note is that the results are not in the same order as the original query - this is true even with the original approach; BioMart doesn't preserve the order of the query values.  If you want to return things in the same order you can do so like this:

## order same as original query
> results[ match(musGenes, results[,1]), ]
  MGI.symbol HGNC.symbol
3       Hmmr        HMMR
2       Tlx3        TLX3
4   STSRAAA1           -
1      Cpeb4       CPEB4
ADD COMMENTlink modified 4 months ago • written 4 months ago by Mike Smith2.7k

@Mike Smith thanks this is difently helpful but in case if the gene names exist and similar in both. do you know what I am trying to do? I am trying to find the orthologue of a mouse gene. is there a possibility to evaluate them based on chromosome or DNA level? how people evaluate the orthologue to be a true one? 

ADD REPLYlink written 4 months ago by Bioinformatics30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 119 users visited in the last hour