Search
Question: Combine 5' leaders and 1st exon of cds into GRangesList
1
8 months ago by
hauken_heyken40 wrote:

Ok, so essence of question is: I have a big GRangeslist of Open reading frames in the 5' leaders,  convert them from genomic to transcriptcoordinates.

I know I can use :

mapToTranscripts

but!!

Ok, so my problem, I've used redefined leaders, they are extended both start and end, and now I need the transcript coordinates back again, because I don't have the redefined leaders anymore, this is a lot of data, so I will not recompute the tx ranges from scratch, I want to be clever.

Usually I could have done something like this:

#ORFs: a list of orfs in the 5' leader

#fiveUTRs: the GRangeslist of 5' leaders

txRanges = mapToTranscripts(x = ORFs, transcripts = fiveUTRs)

But I extend my orfs into the first exon, so I need to redefine fiveUTRs to include the first exon in each for each gene, plan is now like this:

fiveUTRsWithExon = lapply(1:length(fiveUTRs), function(x) insertFirstCDS(unlist(fiveUTRs[x]),x))

insertFirstCDS = function(fiveTemp,x){

firstExon = unlist(cds[names(cds) == names(shiftedfiveUTRs[x])])[1]
return( sort(c(fiveTemp,firstExon)) ) #return sorted combination
}

This is terribly slow even for just a few 100 MB of data, and I need to do several TB of data, so any idea ?

modified 8 months ago • written 8 months ago by hauken_heyken40
4
8 months ago by
United States
Michael Lawrence10k wrote:

Subscripting into cds should work by name, and you can do that in a vectorized way:

cdsForUTRs <- cds[names(fiveUTRs)]

You can use the phead() function to select the first exons without looping:

firstExons <- phead(cdsForUTRs, 1L)

Then combine them using the element-wise pc():

fiveUTRsWithExon <- pc(fiveUTRs, firstExons)

1
8 months ago by
hauken_heyken40 wrote:

WOW!!! Sir you are brilliant, just needed to remove some meta-columns from the cds, and worked instantly, runtime less than 5 seconds total,  amazing!