Gviz: feature protein_coding not working
3
0
Entering edit mode
Simon B. • 0
@simon-b-11155
Last seen 7.8 years ago
University of Dundee

Hi,

I'm trying to visualise the following transcripts: http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000172572;r=12:20369245-20684381;t=ENST00000542675

The feautre protein_coding doesn't colour the first and only protein coding transcript. One of the transcript is a processed transcript but has no protein, while the shortest is non-coding. It appears that the function wrongly identified the coding sequences. Any hint why?

Here's the image:

And the code:

library(Gviz)
library(biomaRt)

mart <- biomaRt::useMart(
  biomart="ensembl",
  dataset="hsapiens_gene_ensembl")

PDE3A_track <- c("ENST00000359062","ENST00000542675","ENST00000544307")

biomartTrack <- Gviz::BiomartGeneRegionTrack(
  filters = list(
    ensembl_transcript_id = PDE3A_track),
  protein_coding = "red",
  transcriptAnnotation="transcript",
  name="Ensembl Transcript IDs",
  background.title="darkgrey",
  biomart = mart)

axisTrack <- Gviz::GenomeAxisTrack()
displayPars(axisTrack) <- list(
  add53 = TRUE,
  labelPos = "below",
  cex = 1.5)

idxTrack <- IdeogramTrack(
  genome="hg38",
  chromosome="chr12")

Gviz::plotTracks(
  c(idxTrack, biomartTrack, axisTrack),
  showId= TRUE,
  reverseStrand = FALSE,
  chromosome = "chr12")

 

 

Thanks,

Simon

gviz biomart • 2.0k views
ADD COMMENT
0
Entering edit mode
@florianhahnenovartiscom-3784
Last seen 5.6 years ago
Switzerland

Hi Simon,

this is not really an issue with Gviz, but rather with the Ensembl Biomart. They seem to store the function on the level of the gene, not transcript. So if there is a single protein coding transcript, the gene will be annotated as protein_coding.  And that is exactly what you are seeing in the plot. You can try that out yourself by downloading the data through the Ensembl Biomart: (http://www.ensembl.org/biomart/martview)

The attribute that stores the type is called "gene_biotype". One could consider a more evolved algorithm to figure out whether a transcript is coding or non-coding by looking at the CDS start and end locations if they are available, however that will take a bit of restructuring of code. 

 

ADD COMMENT
0
Entering edit mode

Looking at the available annotation features in Biomart it looks like we could use the CDS length field as an indicator whether a transcript is indeed protein coding. That should be empty for these cases. Will look at this once I find a couple of free minutes and provide a patch.

Florian

ADD REPLY
0
Entering edit mode

Ok, that seems to do the trick. Should become available with the next package update in a couple of days. 

 

ADD REPLY
0
Entering edit mode

Hi,

so in my case, which argumentI should use after the update to pull the info about the transcripts? Will it be a lot to change in the current code? I assume I can delete protein_coding for sure.

 

Thanks for fast response,

Simon

ADD REPLY
0
Entering edit mode
Simon B. • 0
@simon-b-11155
Last seen 7.8 years ago
University of Dundee

Hi,

so in my case, which argumentI should use after the update to pull the info about the transcripts? Will it be a lot to change in the current code? I assume I can delete protein_coding for sure.

 

Thanks for fast response,

Simon

ADD COMMENT
0
Entering edit mode

You won't have to change anything. Just wait for version 1.16.4 to appear on the package server and update. It should happen sometime later today. The last SVN snapshot on the build system was taken two days ago, so a new version must be building right now.

ADD REPLY
0
Entering edit mode

Hi,

do you know when the new build will appear on Bioconductor? It's still 1.16.1.

 

Thanks,

Simon

ADD REPLY
0
Entering edit mode

Looks like there was a corrupted file checked in to the svn. The package builds locally but failed in the Bioc build system. Just committed a fix (I hope). Will keep an eye on this and let you know.

Florian

ADD REPLY
0
Entering edit mode

The new version is now available.

ADD REPLY
0
Entering edit mode

Ok, installed (also had to install missing Hmisc too) but the problem persists- the non-coding transcripts are still coloured by coding regions!

Is there a way to get around this with a specific argument?

Simon

ADD REPLY
0
Entering edit mode
@florianhahnenovartiscom-3784
Last seen 5.6 years ago
Switzerland

No, they are not. 

For your particular example I can see that only the longest transcript has coding regions. The two smaller ones are fully non-coding, and the long one also has 3' and 5' UTRs:

 table(paste(transcript(biomartTrack), ranges(biomartTrack)$feature))

ENST00000359062 protein_coding           ENST00000359062 utr3 

                            16                              1 

          ENST00000359062 utr5     ENST00000542675 non_coding 

                             1                              2 

    ENST00000544307 non_coding 

                            15 

Are you sure that your are using Gviz_1.16.4?

 

ADD COMMENT
0
Entering edit mode

Ok, worked. I had another missing package so it never updated the new version.

Thanks!

Simon

ADD REPLY

Login before adding your answer.

Traffic: 742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6