Search
Question: Find Intron Length from Annotation File?
0
gravatar for smurfblack
20 months ago by
smurfblack0
smurfblack0 wrote:

I'm working with model species Arabidopsis thaliana and want to figure out the length of introns from the annotation file (or something similar?). Since only the exons are annotated I wanted to ask what a good approach would be on getting sequences (length in particular) of introns? I try to filter the largest introns of corresponding genes.

Thanks.

ADD COMMENTlink modified 20 months ago by Martin Morgan ♦♦ 20k • written 20 months ago by smurfblack0
1
gravatar for Martin Morgan
20 months ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:

Use the TxDb.Athaliana.BioMart.plantsmart28 package

library(TxDb.Athaliana.BioMart.plantsmart28)
txdb = TxDb.Athaliana.BioMart.plantsmart28

Get the introns grouped by transcript

inByTr = intronsByTranscript(TxDb.Athaliana.BioMart.plantsmart28, use.names=TRUE)

and the relationship between transcript and gene

geneid = mapIds(txdb, names(inByTr), "GENEID", "TXNAME")

Its not clear from your question what you want to do to 'filter' the introns; here are the widths

> wd = width(inByTr)
> wd
IntegerList of length 41671
[["AT1G01010.1"]] 82 209 100 78 112
[["AT1G01040.1"]] 90 96 78 88 81 83 88 90 85 86 90 84 89 276 84 79 81 98 85
[["AT1G01040.2"]] 90 96 78 88 81 83 88 90 85 86 90 81 89 276 84 79 81 98 85
[["AT1G01046.1"]] integer(0)
[["AT1G01073.1"]] integer(0)
[["AT1G01110.2"]] 87 207 300 78
[["AT1G01110.1"]] 300 78
[["AT1G01115.1"]] integer(0)
[["AT1G01160.1"]] 417 123 92 81
[["AT1G01160.2"]] 245 70 123 92 81
...
<41661 more elements>

And here are the 'introns' grouped by gene

> splitAsList(unlist(unname(inByTr)), rep(geneid, lengths(inByTr)))
GRangesList object of length 22523:
$AT1G01010 
GRanges object with 5 ranges and 0 metadata columns:
      seqnames       ranges strand
         <Rle>    <IRanges>  <Rle>
  [1]        1 [3914, 3995]      +
  [2]        1 [4277, 4485]      +
  [3]        1 [4606, 4705]      +
  [4]        1 [5096, 5173]      +
  [5]        1 [5327, 5438]      +

...
<22522 more elements>
-------
seqinfo: 7 sequences (1 circular) from an unspecified genome

 

ADD COMMENTlink written 20 months ago by Martin Morgan ♦♦ 20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 355 users visited in the last hour