I try to fetch gene symbols from coordinate ranges (in good old hg19 space). It works perfectly except for one region where I get FIP1L1 in first position before the expected PDGFRA while the two genes are not even overlapping. Actually, FIP1L1 is located quite far downstream from PDGFRA.
Could someone explain this behavior and how to avoid it?
Thanks
You can reproduce my result with:
library("biomaRt") ensembl=useMart(host='feb2014.archive.ensembl.org', biomart='ENSEMBL_MART_ENSEMBL', dataset="hsapiens_gene_ensembl") getBM(attributes = "hgnc_symbol", filters = c('chromosome_name','start','end'), values = list("4","55127193","55127663"), mart=ensembl)
# hgnc_symbol
# 1 FIP1L1
# 2 PDGFRA
This is true for several intervals as seen in IGV
UCSC confirms the 'fraud' (disclaimer, this was a joke and I apology if I did hurt people's feeling)
Just to echo this, you can clearly see in the Ensembl browser that at least one isoform of FIP1L1 spans PDGFRA.
Ensembl browser
Dear James,
Thanks very much for answering my post and very sorry that you did not get my joke about 'fraude' (of course I did not mean to be rude, just trying to put it in a funny way).
Concerning your evidence that Ensembl reports a transcript spanning PDGFRA, it is nice but I kind of doubt this one is true (putting PDGFRA out of the picture at once is a bit hard for the Biologist I am - again joking, for the record :-)). I read gene fusion events exist between these two genes that might explain its presence (hypereosinophilic syndrome in http://www.uniprot.org/uniprot/Q6UN15).
Thanks you very much for the code to query UCSC, I am replacing my biomaRt code by this one and it should answer my needs when I succeed in muting the many <'select()' returned 1:1 mapping between keys and columns> messages.
I found this potential exception by looking at 300 locations in IGV and wonder how many other are not in agreement with UCSC/NCBI.
Best regards,
Stephane
But it's not so simple as all that. If you choose to use the UCSC mappings, they perpetrate the exact opposite 'fraude' on you, saying that PDGFRA overlaps FIP1L1:
Whereas Ensembl doesn't.
So it comes down (in this instance) to what the two different groups want to call the gene fusion transcript. One says PDGFRA, and the other says FIP1L1.