Hi,
I am having a bit of trouble to get the introns and exons from a transcript. Have a look at this snippet.
library(GenomicFeatures) file <- "test.gff3" print(readLines(file)) txdb <- makeTxDbFromGFF(file) print(transcripts(txdb)) print(intronsByTranscript(txdb)) print(fiveUTRsByTranscript(txdb)) print(threeUTRsByTranscript(txdb))
intronsByTranscript returns to many results, because it includes the 5'/3' UTR regions, whereas fiveUTRsByTranscript and threeUTRsByTranscript do not contain any data. What would be the problem using this style of getting the data?
On the bigger picture: I want to shift coordinates from mapped reads so that the intron sequences are removed and global coordinates are converted to local transcript reads. I achieved this in part using the code:
transcript <- transcripts(txdb, filter = list(tx_name = "YAL030W_mRNA")) names(transcript) <- GenomeInfoDb::seqnames(transcript) data <- GAlignments() # this is read in from BamFile reads <- as(data,"GRanges") hits <- mapToTranscripts(reads,transcript) reads <- reads[hits$xHits,] IRanges::ranges(reads) <- IRanges::ranges(hits)
The missing bit is to shift the reads, so that the intron gap is closed. Is there any build function for this I missed?
Thanks for any help and advice,
Best regards,
Felix
test.gff3 contains the following text:
##gff-version 3
##source-version rtracklayer 1.38.0
##date 2018-02-02
chrI SGD chromosome 1 230218 . - . ID=chrI;Name=chrI;
chrI SGD gene 87263 87857 . + . ID=YAL030W;Name=YAL030W;gene=SNC1
chrI SGD mRNA 87263 87857 . + . ID=YAL030W_mRNA;Name=YAL030W_mRNA;Parent=YAL030W;
chrI SGD five_prime_UTR 87263 87285 . + . ID=YAL030W_5UTR;Name=YAL030W_5UTR;Parent=YAL030W_mRNA;
chrI SGD CDS 87286 87387 . + 0 Name=YAL030W_CDS;Parent=YAL030W_mRNA;
chrI SGD intron 87388 87500 . + . Name=YAL030W_intron;Parent=YAL030W_mRNA;
chrI SGD CDS 87501 87752 . + 0 Name=YAL030W_CDS;Parent=YAL030W_mRNA;
chrI SGD three_prime_UTR 87753 87857 . + . ID=YAL030W_3UTR;Name=YAL030W_3UTR;Parent=YAL030W_mRNA;