Search
Question: Extracting UTRs from exon and CDS data
0
gravatar for rubi
3 months ago by
rubi70
rubi70 wrote:

Hi,

I have a data.frame of exons per each transcript and another, corresponding, data.frame of the cds intervals:

exon.df <- data.frame(id=c(rep("id1",4),rep("id2",3),rep("id3",5)),
                      start=c(10,20,30,40,100,200,300,1000,2000,3000,4000,5000),
                      end=c(15,25,35,45,150,250,350,1500,2500,3500,4500,5500))


cds.df <- data.frame(id=c(rep("id1",3),rep("id2",3),rep("id3",3)),
                      start=c(20,30,40,125,200,300,2250,3000,4000),
                      end=c(25,35,45,150,250,325,2500,3500,4250))

 

I would like to extract the UTRs from these data for each transcript. For this example, the outcomes will be:

utr5.df <- data.frame(id=c("id1","id2","id3","id3"),
                     start=c(10,100,1000,2000),
                     end=c(15,124,1500,2249))

utr3.df <- data.frame(id=c("id2","id3","id3"),
                     start=c(326,4251,5000),
                     end=c(350,4500,5500))

Can GenomicRanges or any other package be used in any way for that?

 

 

ADD COMMENTlink modified 3 months ago by Michael Lawrence9.8k • written 3 months ago by rubi70
2
gravatar for Michael Lawrence
3 months ago by
United States
Michael Lawrence9.8k wrote:

One way would be to add a dummy "chr" variable and call GRanges() on both your exon.df and cdf.df to get GRanges objects. Then, split() them by "id" into GRangesList objects. Call range() on the exons to get the transcript bounds, then subtract the CDS regions from those to get the UTRs.

Something like (untested):

exon.df$chr <- "foo"
cds.df$chr <- "foo"
exon.gr <- GRanges(exon.df)
cds.gr <- GRanges(cds.df)
exon.grl <- split(exon.gr, ~ id)
cds.grl <- split(cds.gr, ~ id)
utr.grl <- psetdiff(unlist(range(exon.grl)), cds.grl)
stack(utr.grl, "id")
ADD COMMENTlink modified 3 months ago • written 3 months ago by Michael Lawrence9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 140 users visited in the last hour