Search
Question: Extracting UTRs from exon and CDS data
0
gravatar for rubi
10 months ago by
rubi90
rubi90 wrote:

Hi,

I have a data.frame of exons per each transcript and another, corresponding, data.frame of the cds intervals:

exon.df <- data.frame(id=c(rep("id1",4),rep("id2",3),rep("id3",5)),
                      start=c(10,20,30,40,100,200,300,1000,2000,3000,4000,5000),
                      end=c(15,25,35,45,150,250,350,1500,2500,3500,4500,5500))


cds.df <- data.frame(id=c(rep("id1",3),rep("id2",3),rep("id3",3)),
                      start=c(20,30,40,125,200,300,2250,3000,4000),
                      end=c(25,35,45,150,250,325,2500,3500,4250))

 

I would like to extract the UTRs from these data for each transcript. For this example, the outcomes will be:

utr5.df <- data.frame(id=c("id1","id2","id3","id3"),
                     start=c(10,100,1000,2000),
                     end=c(15,124,1500,2249))

utr3.df <- data.frame(id=c("id2","id3","id3"),
                     start=c(326,4251,5000),
                     end=c(350,4500,5500))

Can GenomicRanges or any other package be used in any way for that?

 

 

ADD COMMENTlink modified 10 months ago by Michael Lawrence10.0k • written 10 months ago by rubi90
2
gravatar for Michael Lawrence
10 months ago by
Michael Lawrence10.0k
United States
Michael Lawrence10.0k wrote:

One way would be to add a dummy "chr" variable and call GRanges() on both your exon.df and cdf.df to get GRanges objects. Then, split() them by "id" into GRangesList objects. Call range() on the exons to get the transcript bounds, then subtract the CDS regions from those to get the UTRs.

Something like (untested):

exon.df$chr <- "foo"
cds.df$chr <- "foo"
exon.gr <- GRanges(exon.df)
cds.gr <- GRanges(cds.df)
exon.grl <- split(exon.gr, ~ id)
cds.grl <- split(cds.gr, ~ id)
utr.grl <- psetdiff(unlist(range(exon.grl)), cds.grl)
stack(utr.grl, "id")
ADD COMMENTlink modified 10 months ago • written 10 months ago by Michael Lawrence10.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 137 users visited in the last hour