Find Intron Length from Annotation File?
1
0
Entering edit mode
smurfblack • 0
@smurfblack-9905
Last seen 8.1 years ago

I'm working with model species Arabidopsis thaliana and want to figure out the length of introns from the annotation file (or something similar?). Since only the exons are annotated I wanted to ask what a good approach would be on getting sequences (length in particular) of introns? I try to filter the largest introns of corresponding genes.

Thanks.

annotation arabidopsis thaliana Tutorial • 1.6k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 4 days ago
United States

Use the TxDb.Athaliana.BioMart.plantsmart28 package

library(TxDb.Athaliana.BioMart.plantsmart28)
txdb = TxDb.Athaliana.BioMart.plantsmart28

Get the introns grouped by transcript

inByTr = intronsByTranscript(TxDb.Athaliana.BioMart.plantsmart28, use.names=TRUE)

and the relationship between transcript and gene

geneid = mapIds(txdb, names(inByTr), "GENEID", "TXNAME")

Its not clear from your question what you want to do to 'filter' the introns; here are the widths

> wd = width(inByTr)
> wd
IntegerList of length 41671
[["AT1G01010.1"]] 82 209 100 78 112
[["AT1G01040.1"]] 90 96 78 88 81 83 88 90 85 86 90 84 89 276 84 79 81 98 85
[["AT1G01040.2"]] 90 96 78 88 81 83 88 90 85 86 90 81 89 276 84 79 81 98 85
[["AT1G01046.1"]] integer(0)
[["AT1G01073.1"]] integer(0)
[["AT1G01110.2"]] 87 207 300 78
[["AT1G01110.1"]] 300 78
[["AT1G01115.1"]] integer(0)
[["AT1G01160.1"]] 417 123 92 81
[["AT1G01160.2"]] 245 70 123 92 81
...
<41661 more elements>

And here are the 'introns' grouped by gene

> splitAsList(unlist(unname(inByTr)), rep(geneid, lengths(inByTr)))
GRangesList object of length 22523:
$AT1G01010 
GRanges object with 5 ranges and 0 metadata columns:
      seqnames       ranges strand
         <Rle>    <IRanges>  <Rle>
  [1]        1 [3914, 3995]      +
  [2]        1 [4277, 4485]      +
  [3]        1 [4606, 4705]      +
  [4]        1 [5096, 5173]      +
  [5]        1 [5327, 5438]      +

...
<22522 more elements>
-------
seqinfo: 7 sequences (1 circular) from an unspecified genome

 

ADD COMMENT

Login before adding your answer.

Traffic: 649 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6