Question

How to make combined annotation of 5'UTRs and CDS's?

0

Entering edit mode

anmej • 0

@anmej-20275

Last seen 6.9 years ago

Hello everyone.

I want to extract the annotation of 5UTR+CDS region of every transcript in the hg19 annotation, to search for alternative ORFs. This is what I've managed to do so far:

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

fiveUTRs = fiveUTRsByTranscript(txdb, use.names = TRUE)
names5UTR = names(fiveUTRs)
cds =  cdsBy(txdb, "tx", use.names=TRUE)
namesCDS = names(cds)
names5UTRCDS = intersect(namesCDS,names5UTR)

fiveUTRs = fiveUTRs[names5UTRCDS]
cds = cds[names5UTRCDS]

fiveUTRCDS = GRangesList()
for (i in 1:length(names5UTRCDS)){
    x = GRangesList(c(unlist(fiveUTRs[i]),unlist(cds[i])))
    names(x) = names(fiveUTRs[i])
    fiveUTRCDS = c(fiveUTRCDS,x)
}

I'm basically looping over both lists and concatenating every element. It works, but is very slow and inelegant. Surely there must be a better, functional way do to it? Some way to "zip" the two listsl?

Thanks.

annotation concatenation GRangesList GenomicRanges • 1.4k views

ADD COMMENT • link 6.9 years ago anmej • 0

score 1 · Accepted Answer · 2019-03-22

1

Entering edit mode

anmej • 0

@anmej-20275

Last seen 6.9 years ago

I found the answer, and it is embarrassingly simple. Somehow I failed to notice the existence of pair-wise set functions.

fiveUTRCDS = pc(fiveUTRs, cds)
fiveUTRCDS = reduce(fiveUTRCDS)

ADD COMMENT • link 6.9 years ago anmej • 0