Hello Luna.
The findUORFs function in ORFik identifies upstream open reading frames by extracting and concatenating the sequences of exons in the fiveUTRs GRangesList from the provided fasta genome file. This process creates a continuous transcript sequence for each 5' UTR, allowing open reading frames to span splice junctions naturally. Stop codons that cross exon boundaries are detected because the search occurs on the spliced sequence.
If findUORFs returns open reading frames that end at the conclusion of one exon instead of extending to a stop codon in the subsequent exon, the exons in your fiveUTRs GRangesList may not be ordered correctly. For transcripts on the positive strand, exons must be sorted in increasing order by start position. For transcripts on the negative strand, exons must be sorted in decreasing order by end position. Incorrect order can lead to improper sequence concatenation, resulting in erroneous frame alignment and stop codon detection.
To address this, apply the sortPerGroup function from ORFik to your fiveUTRs object before calling findUORFs. Here is an example:
library(ORFik)
# Assume fiveUTRs is your GRangesList of 5' UTRs
fiveUTRs_sorted <- sortPerGroup(fiveUTRs)
# Now use the sorted object
uORFs <- findUORFs(fiveUTRs_sorted, fa, startCodon = "ATG", stopCodon = c("TAA", "TAG", "TGA"), cds = your_cds_object)
Regarding your attempt with findMapORFs, this function also requires properly ordered exons in the input GRangesList. It includes an argument grl_is_sorted (default FALSE), which triggers internal sorting if set to FALSE. However, explicitly sorting with sortPerGroup beforehand ensures consistency. If you provide the sequences manually to findMapORFs, confirm that they match the concatenated spliced sequences.
To verify the issue for a specific transcript, extract its sequence using txSeqsFromFa(fiveUTRs_sorted[transcript_id], fa) and manually inspect the frame and stop codons.
Kevin
Please provide a self-contained example so people can reproduce what you are seeing. Also include the output of
sessionInfo