Question: Accessing GRangesList rows in Rcpp
gravatar for hauken_heyken
4 months ago by
hauken_heyken40 wrote:

So a question that could have been asked in rcpp forum, but maybe you have already made a wrapper for it, so I ask here.

I want to access the ith row of a GRangesList:

#include <Rcpp.h>

using namespace Rcpp;

Environment envGRanges("package:GenomicRanges");

Function GRangesList = envGRanges["GRangesList"];

S4  getGRL_byIndex(S4  fiveUTR, int i ){
  S4 grangesObj("GRangesList");  
  grangesObj = as<GRangesList>fiveUTRs[i];

 return grangesObj;


Error is: no match for operator = (types: s4 and unresolved overload function type)

Is this possible to do ?


ADD COMMENTlink modified 4 months ago by Martin Morgan ♦♦ 21k • written 4 months ago by hauken_heyken40

FWIW, It will not be faster to manipulate a GRanges object in C than in R. If there were some operations I wanted to perform on a GRanges object that were not supported in R (it is challenging to know what these operations are -- I wonder what you are trying to accomplish?) then my strategy would be to extract the relevant information from the GRanges object in R, and pass as simple vectors to / from C.

ADD REPLYlink written 4 months ago by Martin Morgan ♦♦ 21k

This is not my goal, it's only a part of the program, my whole program takes a GRangeslist and fastaSequences, and returns a GRangesList with all orfs of the original GRangeslist.

I need this to access each GRangesList by transcript

ADD REPLYlink written 4 months ago by hauken_heyken40

I'm thinking about the simple vector possibility, but I still need a grouped vector by transcripts, makes it harder..


if grangeslist have 2 transcript with two exons each, I would need to make a class or something for this, would be easier to use GRangesList API if it exists..

ADD REPLYlink written 4 months ago by hauken_heyken40
gravatar for Martin Morgan
4 months ago by
Martin Morgan ♦♦ 21k
United States
Martin Morgan ♦♦ 21k wrote:

Very roughly, I would write my C++ code to rely on base R vectors as inputs, implement the body using standard C++ data structures, and return the output as a list of base R vectors

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
List orfs(
    CharacterVector seqname,
    IntegerVector start,
    IntegerVector end,
    IntegerVector input_index,
    CharacterVector fasta)
    std::vector<int> result_index;         // index of input that output belongs to
    std::vector<std::string> result_value; // output values
    // create result in C++
    // ...
    // then return to R, e.g., allowing many orf per input
    return List::create(
        Named("index") = wrap(result_index),
        Named("orf") = wrap(result_value));

and would write an R wrapper that works on the class at the R level

rorfs <- function(grl) {
    gr <- unlist(grl, use.names=FALSE)
    result <- orfs(
        rep(seq_along(grl), lengths(grl)),
    ## manipulate result to final form
    orf <- NA_character_
    orf[result$index] <- result$orf
    gr$orf <- orf
    relist(gr, grl)


ADD COMMENTlink written 4 months ago by Martin Morgan ♦♦ 21k

This will work, and make it a lot easier, and more like how C++ should be, since I'm polluting my code with a lot of R API right now, which makes it slower I think.

Thank you, I will accept your answer.

Another point you make that is important, is that if you vectorize R code efficiently, then it is not slower than Rcpp. Therefor my question is not really useful, because you can group granges by relist(), which is already Cpp code I guess ?

ADD REPLYlink written 4 months ago by hauken_heyken40

In general terms, R is efficient when it drops to C (most of R and many packages are written in C, not C++) for the implementation. The 'art' is identifying when an operation requires new C / C++ code, versus using convenient, flexible, and TESTED R code.

relist() in the GenomicRanges world is 'fast' because of clever object design -- generally, one gets an object that has 'list-like semantics', but is actually implemented as a single underlying object (GRanges) and a 'partitioning' vector that indicates how the elements of the underlying object are to be grouped -- 'relist()'ing just adds a vector to an object, and 'unlist()'  just removes the vector. There is certainly C code involved in these operations, but unlist() and relist() are not implemented directly in C.

ADD REPLYlink written 4 months ago by Martin Morgan ♦♦ 21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 140 users visited in the last hour