Search
Question: Extracting UTRs from exon and CDS data
0
gravatar for rubi
16 months ago by
rubi90
rubi90 wrote:

Hi,

I have a data.frame of exons per each transcript and another, corresponding, data.frame of the cds intervals:

exon.df <- data.frame(id=c(rep("id1",4),rep("id2",3),rep("id3",5)),
                      start=c(10,20,30,40,100,200,300,1000,2000,3000,4000,5000),
                      end=c(15,25,35,45,150,250,350,1500,2500,3500,4500,5500))


cds.df <- data.frame(id=c(rep("id1",3),rep("id2",3),rep("id3",3)),
                      start=c(20,30,40,125,200,300,2250,3000,4000),
                      end=c(25,35,45,150,250,325,2500,3500,4250))

 

I would like to extract the UTRs from these data for each transcript. For this example, the outcomes will be:

utr5.df <- data.frame(id=c("id1","id2","id3","id3"),
                     start=c(10,100,1000,2000),
                     end=c(15,124,1500,2249))

utr3.df <- data.frame(id=c("id2","id3","id3"),
                     start=c(326,4251,5000),
                     end=c(350,4500,5500))

Can GenomicRanges or any other package be used in any way for that?

 

 

ADD COMMENTlink modified 16 months ago by Michael Lawrence10k • written 16 months ago by rubi90
2
gravatar for Michael Lawrence
16 months ago by
United States
Michael Lawrence10k wrote:

One way would be to add a dummy "chr" variable and call GRanges() on both your exon.df and cdf.df to get GRanges objects. Then, split() them by "id" into GRangesList objects. Call range() on the exons to get the transcript bounds, then subtract the CDS regions from those to get the UTRs.

Something like (untested):

exon.df$chr <- "foo"
cds.df$chr <- "foo"
exon.gr <- GRanges(exon.df)
cds.gr <- GRanges(cds.df)
exon.grl <- split(exon.gr, ~ id)
cds.grl <- split(cds.gr, ~ id)
utr.grl <- psetdiff(unlist(range(exon.grl)), cds.grl)
stack(utr.grl, "id")
ADD COMMENTlink modified 16 months ago • written 16 months ago by Michael Lawrence10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 277 users visited in the last hour