Hi,
I am trying to get the concatenated GRanges of ORFs for each gene.
I thought I could just get a list of cds from the txdb database and concatenate this list from the first to the last position of each cds. But I got stuck
What i do so far is:
cds <- cds(txdb, columns=c("TXNAME","EXONRANK"))
cds_grl <- multisplit(cds, cds$TXNAME)
wich returns a list of GRange object
Do you know how I could concatenate each GR element of the list to get the first and last possible CDS GRranges for each gene?
Maybe there is another method ?
Best, Quentin

Thinking about this further, I am not sure my nor Michael's answer is 100% correct. For example, there is SAMD11, which has many transcripts:
And it's here:
> cds_grl[5] GRangesList object of length 1: $uc001abw.1 GRanges object with 13 ranges and 2 metadata columns: seqnames ranges strand | TXNAME <Rle> <IRanges> <Rle> | <CharacterList> [1] chr1 861322-861393 + | uc001abv.1,uc001abw.1,uc031pjl.1,... [2] chr1 865535-865716 + | uc001abv.1,uc001abw.1,uc031pjl.1,... [3] chr1 866419-866469 + | uc001abv.1,uc001abw.1,uc031pjl.1,... [4] chr1 871152-871276 + | uc001abw.1,uc031pjl.1,uc031pjm.1,... [5] chr1 874420-874509 + | uc001abw.1,uc031pjl.1,uc031pjm.1,... ... ... ... ... . ... [9] chr1 877790-877868 + | uc001abw.1,uc031pjl.1,uc031pjm.1,... [10] chr1 877939-878438 + | uc001abw.1,uc031pjl.1,uc031pjm.1,... [11] chr1 878633-878757 + | uc001abw.1,uc031pjl.1,uc031pjm.1,... [12] chr1 879078-879188 + | uc001abw.1,uc031pjm.1,uc001abx.2,... [13] chr1 879288-879533 + | uc001abw.1,uc001abx.2,uc031pjp.1,...And here:
So you don't have a single GRanges item for all the CDS for this gene. We can use reduce to combine transcripts:
> cds_grl3 <- reduce(cds_grl2) > cds_grl3[cds_grl3 %over% gns["148398",],] GRanges object with 1 range and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chr1 861322-879706 + ------- seqinfo: 93 sequences (1 circular) from hg19 genomeWhich is cool, but reduce will also take any overlapping, unrelated transcripts and then reduce them to a single range as well, which isn't cool. AND, this still won't give you a single GRanges item for a single gene if the CDS aren't overlapping, which is a thing.
I assumed by "gene" the OP meant "transcript". To do it by gene, you would need to use the gene IDs instead. Trans-splicing is another complication though, where there will be multiple ranges per transcript, on different sequences and/or strands. But it's easy to restrict to length one elements before doing the
unlist().