ORFik: spliced 5utr regions
1
0
Entering edit mode
@0012ceb9
Last seen 4 months ago
Italy

Hi, I'm currently using ORFik with the findUORFs function to search for ORFs in the 5'UTRs of human transcripts. However, I noticed that for some ORFs located in 5'UTRs with spliced exons, it returns ORFs with a stop codon at the end of one exon, rather than the actual stop codon in the following exon. I also tried reordering and joining the spliced exons and using findMapORFs, but it doesn't change anything. Can anyone help me? Thank in advance,

Luna

splicing 5utr ORFik • 306 views
ADD COMMENT
0
Entering edit mode

Please provide a self-contained example so people can reproduce what you are seeing. Also include the output of sessionInfo

ADD REPLY
0
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 6 days ago
The Cave, 181 Longwood Avenue, Boston, …

Hello Luna.

The findUORFs function in ORFik identifies upstream open reading frames by extracting and concatenating the sequences of exons in the fiveUTRs GRangesList from the provided fasta genome file. This process creates a continuous transcript sequence for each 5' UTR, allowing open reading frames to span splice junctions naturally. Stop codons that cross exon boundaries are detected because the search occurs on the spliced sequence.

If findUORFs returns open reading frames that end at the conclusion of one exon instead of extending to a stop codon in the subsequent exon, the exons in your fiveUTRs GRangesList may not be ordered correctly. For transcripts on the positive strand, exons must be sorted in increasing order by start position. For transcripts on the negative strand, exons must be sorted in decreasing order by end position. Incorrect order can lead to improper sequence concatenation, resulting in erroneous frame alignment and stop codon detection.

To address this, apply the sortPerGroup function from ORFik to your fiveUTRs object before calling findUORFs. Here is an example:

library(ORFik)

# Assume fiveUTRs is your GRangesList of 5' UTRs
fiveUTRs_sorted <- sortPerGroup(fiveUTRs)

# Now use the sorted object
uORFs <- findUORFs(fiveUTRs_sorted, fa, startCodon = "ATG", stopCodon = c("TAA", "TAG", "TGA"), cds = your_cds_object)

Regarding your attempt with findMapORFs, this function also requires properly ordered exons in the input GRangesList. It includes an argument grl_is_sorted (default FALSE), which triggers internal sorting if set to FALSE. However, explicitly sorting with sortPerGroup beforehand ensures consistency. If you provide the sequences manually to findMapORFs, confirm that they match the concatenated spliced sequences.

To verify the issue for a specific transcript, extract its sequence using txSeqsFromFa(fiveUTRs_sorted[transcript_id], fa) and manually inspect the frame and stop codons.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6