Search
Question: Accessing GRangesList rows in Rcpp
0
gravatar for hauken_heyken
19 days ago by
hauken_heyken40 wrote:

So a question that could have been asked in rcpp forum, but maybe you have already made a wrapper for it, so I ask here.

I want to access the ith row of a GRangesList:

#include <Rcpp.h>

using namespace Rcpp;

Environment envGRanges("package:GenomicRanges");

Function GRangesList = envGRanges["GRangesList"];


S4  getGRL_byIndex(S4  fiveUTR, int i ){
  S4 grangesObj("GRangesList");  
  grangesObj = as<GRangesList>fiveUTRs[i];

 return grangesObj;
}

 

Error is: no match for operator = (types: s4 and unresolved overload function type)

Is this possible to do ?

 

ADD COMMENTlink modified 17 days ago by Martin Morgan ♦♦ 20k • written 19 days ago by hauken_heyken40
1

FWIW, It will not be faster to manipulate a GRanges object in C than in R. If there were some operations I wanted to perform on a GRanges object that were not supported in R (it is challenging to know what these operations are -- I wonder what you are trying to accomplish?) then my strategy would be to extract the relevant information from the GRanges object in R, and pass as simple vectors to / from C.

ADD REPLYlink written 19 days ago by Martin Morgan ♦♦ 20k

This is not my goal, it's only a part of the program, my whole program takes a GRangeslist and fastaSequences, and returns a GRangesList with all orfs of the original GRangeslist.

I need this to access each GRangesList by transcript

ADD REPLYlink written 17 days ago by hauken_heyken40

I'm thinking about the simple vector possibility, but I still need a grouped vector by transcripts, makes it harder..

i.g

if grangeslist have 2 transcript with two exons each, I would need to make a class or something for this, would be easier to use GRangesList API if it exists..

ADD REPLYlink written 17 days ago by hauken_heyken40
2
gravatar for Martin Morgan
17 days ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:

Very roughly, I would write my C++ code to rely on base R vectors as inputs, implement the body using standard C++ data structures, and return the output as a list of base R vectors

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
List orfs(
    CharacterVector seqname,
    IntegerVector start,
    IntegerVector end,
    IntegerVector input_index,
    CharacterVector fasta)
{
    std::vector<int> result_index;         // index of input that output belongs to
    std::vector<std::string> result_value; // output values
    // create result in C++
    // ...
    // then return to R, e.g., allowing many orf per input
    return List::create(
        Named("index") = wrap(result_index),
        Named("orf") = wrap(result_value));
}

and would write an R wrapper that works on the class at the R level

rorfs <- function(grl) {
    gr <- unlist(grl, use.names=FALSE)
    result <- orfs(
        as.character(seqnames(gr)),
        start(gr),
        end(gr),
        rep(seq_along(grl), lengths(grl)),
        "fasta"
    )
    ## manipulate result to final form
    orf <- NA_character_
    orf[result$index] <- result$orf
    gr$orf <- orf
    relist(gr, grl)
}

 

ADD COMMENTlink written 17 days ago by Martin Morgan ♦♦ 20k

This will work, and make it a lot easier, and more like how C++ should be, since I'm polluting my code with a lot of R API right now, which makes it slower I think.

Thank you, I will accept your answer.

Another point you make that is important, is that if you vectorize R code efficiently, then it is not slower than Rcpp. Therefor my question is not really useful, because you can group granges by relist(), which is already Cpp code I guess ?

ADD REPLYlink written 16 days ago by hauken_heyken40

In general terms, R is efficient when it drops to C (most of R and many packages are written in C, not C++) for the implementation. The 'art' is identifying when an operation requires new C / C++ code, versus using convenient, flexible, and TESTED R code.

relist() in the GenomicRanges world is 'fast' because of clever object design -- generally, one gets an object that has 'list-like semantics', but is actually implemented as a single underlying object (GRanges) and a 'partitioning' vector that indicates how the elements of the underlying object are to be grouped -- 'relist()'ing just adds a vector to an object, and 'unlist()'  just removes the vector. There is certainly C code involved in these operations, but unlist() and relist() are not implemented directly in C.

ADD REPLYlink written 16 days ago by Martin Morgan ♦♦ 20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 129 users visited in the last hour