Question: Missing/inconsistent data with BiomaRt getBM() query
0
gravatar for yg246
4.1 years ago by
yg2460
Singapore
yg2460 wrote:

Hello,

I am trying to get the locations of the 3'UTRs of all protein coding ensembl transcripts, but I realized that some transcripts are missing in the output, and I get inconsistent results with different queries.

grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")

I do the following query to get all the 3'UTR regions of protein coding transcripts

chromosomes=c(1:22, "X", "Y")

utr3.coding=getBM(attributes=c('ensembl_gene_id', 'ensembl_transcript_id', 'chromosome_name', '3_utr_start', '3_utr_end', "transcript_biotype"), filters=c("transcript_biotype","chromosome_name"),values=list(c("protein_coding"), chromosomes), mart=grch37)

This query returned 115988 regions

dim(utr3.coding)
[1] 115988      6

When I query without filtering the "transcript_biotype", and filter it by myself on the query output, I get 130984 regions

utr3.all=getBM(attributes=c('ensembl_gene_id', 'ensembl_transcript_id', 'chromosome_name', '3_utr_start', '3_utr_end', 'transcript_biotype'), filters=c("chromosome_name"),values=chromosomes, mart=grch37)

utr3.all.coding=utr3.all[utr3.all$transcript_biotype=="protein_coding",]

dim(utr3.all.coding)
[1] 130984      6

 

When I query on a specific chromsome, I again get a different list of regions

utr3.ch17.coding=getBM(attributes=c('ensembl_gene_id', 'ensembl_transcript_id', 'chromosome_name', '3_utr_start', '3_utr_end', "transcript_biotype"), filters=c("transcript_biotype","chromosome_name"),values=list(c("protein_coding"), 17), mart=grch37)

dim(utr3.ch17)
[1] 9535    6

When I exact regions on chromosome 17 from the previous 2 queries, I found that many of the transcripts are missing

utr3.coding.17=utr3.coding[utr3.coding$chromosome_name==17,]
dim(utr3.coding.17)
[1] 6834    6

utr3.all.coding.17=utr3.all.coding[utr3.all.coding$chromosome_name==17,]
dim(utr3.all.coding.17)
[1] 8538    6

Any ideas on what is causing this discrepancy? Thank you.

biomart getbm • 578 views
ADD COMMENTlink written 4.1 years ago by yg2460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 403 users visited in the last hour