Question

Gviz: feature protein_coding not working

0

Entering edit mode

Simon B. • 0

@simon-b-11155

Last seen 7.8 years ago

University of Dundee

Hi,

I'm trying to visualise the following transcripts: http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000172572;r=12:20369245-20684381;t=ENST00000542675

The feautre protein_coding doesn't colour the first and only protein coding transcript. One of the transcript is a processed transcript but has no protein, while the shortest is non-coding. It appears that the function wrongly identified the coding sequences. Any hint why?

Here's the image:

And the code:

library(Gviz)
library(biomaRt)

mart <- biomaRt::useMart(
  biomart="ensembl",
  dataset="hsapiens_gene_ensembl")

PDE3A_track <- c("ENST00000359062","ENST00000542675","ENST00000544307")

biomartTrack <- Gviz::BiomartGeneRegionTrack(
  filters = list(
    ensembl_transcript_id = PDE3A_track),
  protein_coding = "red",
  transcriptAnnotation="transcript",
  name="Ensembl Transcript IDs",
  background.title="darkgrey",
  biomart = mart)

axisTrack <- Gviz::GenomeAxisTrack()
displayPars(axisTrack) <- list(
  add53 = TRUE,
  labelPos = "below",
  cex = 1.5)

idxTrack <- IdeogramTrack(
  genome="hg38",
  chromosome="chr12")

Gviz::plotTracks(
  c(idxTrack, biomartTrack, axisTrack),
  showId= TRUE,
  reverseStrand = FALSE,
  chromosome = "chr12")

Thanks,

Simon

gviz biomart • 2.0k views

ADD COMMENT • link updated 7.7 years ago by florian.hahne@novartis.com ★ 1.6k • written 7.8 years ago by Simon B. • 0

score 0 · Answer 1 · 2016-07-21

0

Entering edit mode

florian.hahne@novartis.com ★ 1.6k

@florianhahnenovartiscom-3784

Last seen 5.6 years ago

Switzerland

Hi Simon,

this is not really an issue with Gviz, but rather with the Ensembl Biomart. They seem to store the function on the level of the gene, not transcript. So if there is a single protein coding transcript, the gene will be annotated as protein_coding. And that is exactly what you are seeing in the plot. You can try that out yourself by downloading the data through the Ensembl Biomart: (http://www.ensembl.org/biomart/martview)

The attribute that stores the type is called "gene_biotype". One could consider a more evolved algorithm to figure out whether a transcript is coding or non-coding by looking at the CDS start and end locations if they are available, however that will take a bit of restructuring of code.

ADD COMMENT • link 7.8 years ago florian.hahne@novartis.com ★ 1.6k

0

Entering edit mode

Looking at the available annotation features in Biomart it looks like we could use the CDS length field as an indicator whether a transcript is indeed protein coding. That should be empty for these cases. Will look at this once I find a couple of free minutes and provide a patch.

Florian

ADD REPLY • link 7.8 years ago florian.hahne@novartis.com ★ 1.6k

0

Entering edit mode

Ok, that seems to do the trick. Should become available with the next package update in a couple of days.

ADD REPLY • link 7.8 years ago florian.hahne@novartis.com ★ 1.6k

0

Entering edit mode

Hi,

so in my case, which argumentI should use after the update to pull the info about the transcripts? Will it be a lot to change in the current code? I assume I can delete protein_coding for sure.

Thanks for fast response,

Simon

ADD REPLY • link 7.8 years ago Simon B. • 0

score 0 · Answer 2 · 2016-07-22

0

Entering edit mode

Simon B. • 0

@simon-b-11155

Last seen 7.8 years ago

University of Dundee

Hi,

so in my case, which argumentI should use after the update to pull the info about the transcripts? Will it be a lot to change in the current code? I assume I can delete protein_coding for sure.

Thanks for fast response,

Simon

ADD COMMENT • link 7.8 years ago Simon B. • 0

0

Entering edit mode

You won't have to change anything. Just wait for version 1.16.4 to appear on the package server and update. It should happen sometime later today. The last SVN snapshot on the build system was taken two days ago, so a new version must be building right now.

ADD REPLY • link 7.8 years ago florian.hahne@novartis.com ★ 1.6k

0

Entering edit mode

Hi,

do you know when the new build will appear on Bioconductor? It's still 1.16.1.

Thanks,

Simon

ADD REPLY • link 7.7 years ago Simon B. • 0

0

Entering edit mode

Looks like there was a corrupted file checked in to the svn. The package builds locally but failed in the Bioc build system. Just committed a fix (I hope). Will keep an eye on this and let you know.

Florian

ADD REPLY • link 7.7 years ago florian.hahne@novartis.com ★ 1.6k

0

Entering edit mode

The new version is now available.

ADD REPLY • link 7.7 years ago florian.hahne@novartis.com ★ 1.6k

0

Entering edit mode

Ok, installed (also had to install missing Hmisc too) but the problem persists- the non-coding transcripts are still coloured by coding regions!

Is there a way to get around this with a specific argument?

Simon

ADD REPLY • link 7.7 years ago Simon B. • 0

score 0 · Answer 3 · 2016-08-08

0

Entering edit mode

florian.hahne@novartis.com ★ 1.6k

@florianhahnenovartiscom-3784

Last seen 5.6 years ago

Switzerland

No, they are not.

For your particular example I can see that only the longest transcript has coding regions. The two smaller ones are fully non-coding, and the long one also has 3' and 5' UTRs:

 table(paste(transcript(biomartTrack), ranges(biomartTrack)$feature))

ENST00000359062 protein_coding           ENST00000359062 utr3 

                            16                              1 

          ENST00000359062 utr5     ENST00000542675 non_coding 

                             1                              2 

    ENST00000544307 non_coding 

                            15

Are you sure that your are using Gviz_1.16.4?

ADD COMMENT • link 7.7 years ago florian.hahne@novartis.com ★ 1.6k

0

Entering edit mode

Ok, worked. I had another missing package so it never updated the new version.

Thanks!

Simon

ADD REPLY • link 7.7 years ago Simon B. • 0