Off topic:Missing/inconsistent data with BiomaRt getBM() query
0
0
Entering edit mode
yg246 • 0
@yg246-8835
Last seen 8.6 years ago
Singapore

Hello,

I am trying to get the locations of the 3'UTRs of all protein coding ensembl transcripts, but I realized that some transcripts are missing in the output, and I get inconsistent results with different queries.

grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")

I do the following query to get all the 3'UTR regions of protein coding transcripts

chromosomes=c(1:22, "X", "Y")

utr3.coding=getBM(attributes=c('ensembl_gene_id', 'ensembl_transcript_id', 'chromosome_name', '3_utr_start', '3_utr_end', "transcript_biotype"), filters=c("transcript_biotype","chromosome_name"),values=list(c("protein_coding"), chromosomes), mart=grch37)

This query returned 115988 regions

dim(utr3.coding)
[1] 115988      6

When I query without filtering the "transcript_biotype", and filter it by myself on the query output, I get 130984 regions

utr3.all=getBM(attributes=c('ensembl_gene_id', 'ensembl_transcript_id', 'chromosome_name', '3_utr_start', '3_utr_end', 'transcript_biotype'), filters=c("chromosome_name"),values=chromosomes, mart=grch37)

utr3.all.coding=utr3.all[utr3.all$transcript_biotype=="protein_coding",]

dim(utr3.all.coding)
[1] 130984      6

 

When I query on a specific chromsome, I again get a different list of regions

utr3.ch17.coding=getBM(attributes=c('ensembl_gene_id', 'ensembl_transcript_id', 'chromosome_name', '3_utr_start', '3_utr_end', "transcript_biotype"), filters=c("transcript_biotype","chromosome_name"),values=list(c("protein_coding"), 17), mart=grch37)

dim(utr3.ch17)
[1] 9535    6

When I exact regions on chromosome 17 from the previous 2 queries, I found that many of the transcripts are missing

utr3.coding.17=utr3.coding[utr3.coding$chromosome_name==17,]
dim(utr3.coding.17)
[1] 6834    6

utr3.all.coding.17=utr3.all.coding[utr3.all.coding$chromosome_name==17,]
dim(utr3.all.coding.17)
[1] 8538    6

Any ideas on what is causing this discrepancy? Thank you.

biomart getBM • 556 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 864 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6