How to find index position of desired feature_by in TxDb granges?
1
0
Entering edit mode
vinod.acear ▴ 50
@vinodacear-8884
Last seen 3.6 years ago
India

I have granges (feat_ranges) i want to find index of row corresponding to a gene in this range. Example given below 

library(GenomicFeatures)
library("biomaRt")
library(GenomeInfoDb)
txdb <- makeTxDbFromBiomart(biomart = "ensembl", dataset = "scerevisiae_gene_ensembl")
myfeatures <- c("tx_type", "promoter", "intron", "exon", "cds", "intergenic")
feat_ranges <- genFeatures(txdb, featuretype=myfeatures, reduce_ranges=FALSE, upstream=1000, downstream=0)

I want to find index row corresponding to "YAL066W" (feature_by metadata column) as value 4.

thanks

> feat_ranges
GRangesList object of length 12:
$protein_coding 
GRanges object with 6692 ranges and 3 metadata columns:
      seqnames         ranges strand   |      feature_by featuretype_id    featuretype
         <Rle>      <IRanges>  <Rle>   | <CharacterList>    <character>    <character>
             1 [  335,   649]      +   |         YAL069W        YAL069W protein_coding
             1 [  538,   792]      +   |       YAL068W-A      YAL068W-A protein_coding
             1 [ 2480,  2707]      +   |       YAL067W-A      YAL067W-A protein_coding
             1 [10091, 10399]      +   |         YAL066W        YAL066W protein_coding
             1 [12046, 12426]      +   |       YAL064W-B      YAL064W-B protein_coding
  ...      ...            ...    ... ...             ...            ...            ...
            17 [65770, 66174]      +   |           Q0182          Q0182 protein_coding
            17 [73758, 74513]      +   |           Q0250          Q0250 protein_coding
            17 [74495, 75984]      +   |           Q0255          Q0255 protein_coding
            17 [79213, 80022]      +   |           Q0275          Q0275 protein_coding
            17 [85554, 85709]      +   |           Q0297          Q0297 protein_coding

...
<11 more elements>
-------
seqinfo: 17 sequences (1 circular) from an unspecified genome

 

granges txdb genomicfeatures • 1.4k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 hours ago
United States

Here's some simpler data

library(GenomicRanges)
gr <- GRanges("A", IRanges(1:5, 20),
    f=CharacterList(list("a", "b", c("c", "d"), "e", c("g", "h", "i"))))

It looks like this

> gr
GRanges object with 5 ranges and 1 metadata column:
      seqnames    ranges strand |               f
         <Rle> <IRanges>  <Rle> | <CharacterList>
  [1]        A   [1, 20]      * |               a
  [2]        A   [2, 20]      * |               b
  [3]        A   [3, 20]      * |             c,d
  [4]        A   [4, 20]      * |               e
  [5]        A   [5, 20]      * |           g,h,i
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Access the metadata column, a 'CharacterList' (list of character vectors), and test for identity

> gr$f
CharacterList of length 5
[[1]] a
[[2]] b
[[3]] c d
[[4]] e
[[5]] g h i
> gr$f == "d"
LogicalList of length 5
[[1]] FALSE
[[2]] FALSE
[[3]] FALSE TRUE
[[4]] FALSE
[[5]] FALSE FALSE FALSE

Summarize each list element, checking to see if any match

> any(gr$f == "d")
[1] FALSE FALSE  TRUE FALSE FALSE

and calculate the index

> which(any(gr$f == "d"))
[1] 3

A more complicated operation would be to use 'grepl', which requires that the CharacterList be unlisted and relisted

which(any(relist(grepl("c", unlist(gr$f)), gr$f)))

or indexing into the geometry of the unlisted vector

rep(seq_along(gr), lengths(gr))[grep("c", unlist(gr$f))]

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6