Question: How to make combined annotation of 5'UTRs and CDS's?
5 months ago by
anmej0
anmej0 wrote:

Hello everyone.

I want to extract the annotation of 5UTR+CDS region of every transcript in the hg19 annotation, to search for alternative ORFs. This is what I've managed to do so far:

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

fiveUTRs = fiveUTRsByTranscript(txdb, use.names = TRUE)
names5UTR = names(fiveUTRs)
cds =  cdsBy(txdb, "tx", use.names=TRUE)
namesCDS = names(cds)
names5UTRCDS = intersect(namesCDS,names5UTR)

fiveUTRs = fiveUTRs[names5UTRCDS]
cds = cds[names5UTRCDS]

fiveUTRCDS = GRangesList()
for (i in 1:length(names5UTRCDS)){
x = GRangesList(c(unlist(fiveUTRs[i]),unlist(cds[i])))
names(x) = names(fiveUTRs[i])
fiveUTRCDS = c(fiveUTRCDS,x)
}


I'm basically looping over both lists and concatenating every element. It works, but is very slow and inelegant. Surely there must be a better, functional way do to it? Some way to "zip" the two listsl?

Thanks.

modified 5 months ago • written 5 months ago by anmej0
Answer: How to make combined annotation of 5'UTRs and CDS's?
5 months ago by
anmej0
anmej0 wrote:

I found the answer, and it is embarrassingly simple. Somehow I failed to notice the existence of pair-wise set functions.

fiveUTRCDS = pc(fiveUTRs, cds)
fiveUTRCDS = reduce(fiveUTRCDS)