make a dash instead those gene that do not have ortologue
1
0
Entering edit mode
@bioinformatics-10931
Last seen 2.8 years ago
United States

I want to convert a set of genes which I can use the biomart to convert them 


    musGenes <- c("Hmmr", "Tlx3","STSRAAA1", "Cpeb4")
    convertMouseGeneList <- function(x){
    require("biomaRt")
    human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
    mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
    genesV2 = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", values = x , mart = mouse, attributesL = c("hgnc_symbol"), martL = human, uniqueRows=T)
    humanx <- unique(genesV2[, 2])
    return(humanx)
    }


It will return only those that are in the database but does not show which ones are not. Is there a function in biomart to not let it overlap the data ? for example in this case it should return empty for "STSRAAA1"


for example the desired output should look like this

    Hmmr 
    Tlx3
    - 
    Cpeb4

 

biomart • 900 views
ADD COMMENT
0
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 7 hours ago
EMBL Heidelberg

Unfortunately, you can't do this directly in biomaRt, the BioMart service gives no indication of input values that don't return anything.  However you can do it yourself if you make sure you return the query values in your result - then you can see which are missing. 

First off we'll set up the same query you did before, but we won't select only the human genes:

musGenes <- c("Hmmr", "Tlx3", "STSRAAA1", "Cpeb4")

human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
genesV2 = getLDS(attributes = c("mgi_symbol"), 
                   filters = "mgi_symbol", 
                   values = musGenes, 
                   mart = mouse, 
                   attributesL = c("hgnc_symbol"), 
                   martL = human, uniqueRows=T)

From here, the strategy is to identify whether there are any of the original mouse genes missing in the result.  If there are then we make a second data.frame containing those mouse symbols paired with "-", and then bind this with the original results.

## Find any of our original values not in the results
musMissing <- setdiff(musGenes, genesV2[,'MGI.symbol'])

## Create data.frame with missing genes paired with "-"
mapMissing <- data.frame(MGI.symbol = musMissing, 
              HGNC.symbol = rep("-", length(musMissing)))

## Combine
results <- rbind(genesV2, mapMissing)

> results
  MGI.symbol HGNC.symbol
1      Cpeb4       CPEB4
2       Tlx3        TLX3
3       Hmmr        HMMR
4   STSRAAA1           -

One thing to note is that the results are not in the same order as the original query - this is true even with the original approach; BioMart doesn't preserve the order of the query values.  If you want to return things in the same order you can do so like this:

## order same as original query
> results[ match(musGenes, results[,1]), ]
  MGI.symbol HGNC.symbol
3       Hmmr        HMMR
2       Tlx3        TLX3
4   STSRAAA1           -
1      Cpeb4       CPEB4
ADD COMMENT
0
Entering edit mode

@Mike Smith thanks this is difently helpful but in case if the gene names exist and similar in both. do you know what I am trying to do? I am trying to find the orthologue of a mouse gene. is there a possibility to evaluate them based on chromosome or DNA level? how people evaluate the orthologue to be a true one? 

ADD REPLY

Login before adding your answer.

Traffic: 633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6