Question

How to find index position of desired feature_by in TxDb granges?

0

Entering edit mode

vinod.acear ▴ 50

@vinodacear-8884

Last seen 3.6 years ago

India

I have granges (feat_ranges) i want to find index of row corresponding to a gene in this range. Example given below

library(GenomicFeatures)
library("biomaRt")
library(GenomeInfoDb)
txdb <- makeTxDbFromBiomart(biomart = "ensembl", dataset = "scerevisiae_gene_ensembl")
myfeatures <- c("tx_type", "promoter", "intron", "exon", "cds", "intergenic")
feat_ranges <- genFeatures(txdb, featuretype=myfeatures, reduce_ranges=FALSE, upstream=1000, downstream=0)

I want to find index row corresponding to "YAL066W" (feature_by metadata column) as value 4.

thanks

> feat_ranges
GRangesList object of length 12:
$protein_coding 
GRanges object with 6692 ranges and 3 metadata columns:
      seqnames         ranges strand   |      feature_by featuretype_id    featuretype
         <Rle>      <IRanges>  <Rle>   | <CharacterList>    <character>    <character>
             1 [  335,   649]      +   |         YAL069W        YAL069W protein_coding
             1 [  538,   792]      +   |       YAL068W-A      YAL068W-A protein_coding
             1 [ 2480,  2707]      +   |       YAL067W-A      YAL067W-A protein_coding
             1 [10091, 10399]      +   |         YAL066W        YAL066W protein_coding
             1 [12046, 12426]      +   |       YAL064W-B      YAL064W-B protein_coding
  ...      ...            ...    ... ...             ...            ...            ...
            17 [65770, 66174]      +   |           Q0182          Q0182 protein_coding
            17 [73758, 74513]      +   |           Q0250          Q0250 protein_coding
            17 [74495, 75984]      +   |           Q0255          Q0255 protein_coding
            17 [79213, 80022]      +   |           Q0275          Q0275 protein_coding
            17 [85554, 85709]      +   |           Q0297          Q0297 protein_coding

...
<11 more elements>
-------
seqinfo: 17 sequences (1 circular) from an unspecified genome

granges txdb genomicfeatures • 1.4k views

ADD COMMENT • link updated 8.1 years ago by Martin Morgan 25k • written 8.1 years ago by vinod.acear ▴ 50

score 0 · Answer 1 · 2016-03-05

Here's some simpler data

library(GenomicRanges)
gr <- GRanges("A", IRanges(1:5, 20),
    f=CharacterList(list("a", "b", c("c", "d"), "e", c("g", "h", "i"))))

It looks like this

> gr
GRanges object with 5 ranges and 1 metadata column:
      seqnames    ranges strand |               f
         <Rle> <IRanges>  <Rle> | <CharacterList>
  [1]        A   [1, 20]      * |               a
  [2]        A   [2, 20]      * |               b
  [3]        A   [3, 20]      * |             c,d
  [4]        A   [4, 20]      * |               e
  [5]        A   [5, 20]      * |           g,h,i
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Access the metadata column, a 'CharacterList' (list of character vectors), and test for identity

> gr$f
CharacterList of length 5
[[1]] a
[[2]] b
[[3]] c d
[[4]] e
[[5]] g h i
> gr$f == "d"
LogicalList of length 5
[[1]] FALSE
[[2]] FALSE
[[3]] FALSE TRUE
[[4]] FALSE
[[5]] FALSE FALSE FALSE

Summarize each list element, checking to see if any match

> any(gr$f == "d")
[1] FALSE FALSE  TRUE FALSE FALSE

and calculate the index

> which(any(gr$f == "d"))
[1] 3

A more complicated operation would be to use 'grepl', which requires that the CharacterList be unlisted and relisted

which(any(relist(grepl("c", unlist(gr$f)), gr$f)))

or indexing into the geometry of the unlisted vector

rep(seq_along(gr), lengths(gr))[grep("c", unlist(gr$f))]