Question: Generate 3' UTR and 5' UTR ranges from a gff file
hwu1210
hwu1210
hwu1210 wrote:

I am working on a gif file that is missing the 5'UTR and 3' UTR information. For example:

​  ctg123 . gene      1050  9000  .  +  .  ID=gene00001;Name=EDEN
ctg123 . mRNA      1050  9000  .  +  .  ID=mRNA00001;Parent=gene00001;Name=EDEN.1
ctg123 . exon      1050  1500  .  +  .  ID=exon00002;Parent=mRNA00001,
ctg123 . exon      3000  3902  .  +  .  ID=exon00003;Parent=mRNA00001
ctg123 . exon      5000  5500  .  +  .  ID=exon00004;Parent=mRNA00001
ctg123 . exon      7000  9000  .  +  .  ID=exon00005;Parent=mRNA00001
ctg123 . CDS       1201  1500  .  +  0  ID=cds00001;Parent=mRNA00001;Name=edenprotein.1
ctg123 . CDS       3000  3902  .  +  0  ID=cds00001;Parent=mRNA00001;Name=edenprotein.1
ctg123 . CDS       5000  5300  .  +  0  ID=cds00001;Parent=mRNA00001;Name=edenprotein.1


Is there a way to generate rows with the 5'UTR and 3'UTR ranges? Many thanks!

Answer: Generate 3' UTR and 5' UTR ranges from a gff file
Michael Lawrence
United States
Michael Lawrence11k wrote:

To get the ranges of the UTRs, as a GRangesList or GRanges object:

library(rtracklayer)
gtf <- import.gff3("tmp.gtf")
tx <- subset(gtf, type == "mRNA")
cds <- subset(gtf, type == "CDS")
cds <- range(multisplit(cds, cds$Parent)) utrs <- psetdiff(tx, cds[tx$ID])


Thanks so much, Michael. This method can efficiently generate UTR ranges. However, is it possible to split them further to 5'UTR and 3'UTR?

Answer: Generate 3' UTR and 5' UTR ranges from a gff file
arfranco130
arfranco130
European Union
arfranco130 wrote:

It depends upon it is a model organism or not. If so, try to access to Biomart, where you can generate whatever you want.

Another possibility is to convert this gtf file to BED and use Bedtools to get the same answer. To do so, you can access to the bedtools tutorials and help