Accessing GRangesList rows in Rcpp
1
0
Entering edit mode
@hauken_heyken-13992
Last seen 2.1 years ago
Bergen

So a question that could have been asked in rcpp forum, but maybe you have already made a wrapper for it, so I ask here.

I want to access the ith row of a GRangesList:

#include <Rcpp.h>

using namespace Rcpp;

Environment envGRanges("package:GenomicRanges");

Function GRangesList = envGRanges["GRangesList"];


S4  getGRL_byIndex(S4  fiveUTR, int i ){
  S4 grangesObj("GRangesList");  
  grangesObj = as<GRangesList>fiveUTRs[i];

 return grangesObj;
}

 

Error is: no match for operator = (types: s4 and unresolved overload function type)

Is this possible to do ?

 

Rcpp GRangesList • 1.7k views
ADD COMMENT
1
Entering edit mode

FWIW, It will not be faster to manipulate a GRanges object in C than in R. If there were some operations I wanted to perform on a GRanges object that were not supported in R (it is challenging to know what these operations are -- I wonder what you are trying to accomplish?) then my strategy would be to extract the relevant information from the GRanges object in R, and pass as simple vectors to / from C.

ADD REPLY
0
Entering edit mode

This is not my goal, it's only a part of the program, my whole program takes a GRangeslist and fastaSequences, and returns a GRangesList with all orfs of the original GRangeslist.

I need this to access each GRangesList by transcript

ADD REPLY
0
Entering edit mode

I'm thinking about the simple vector possibility, but I still need a grouped vector by transcripts, makes it harder..

i.g

if grangeslist have 2 transcript with two exons each, I would need to make a class or something for this, would be easier to use GRangesList API if it exists..

ADD REPLY
2
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

Very roughly, I would write my C++ code to rely on base R vectors as inputs, implement the body using standard C++ data structures, and return the output as a list of base R vectors

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
List orfs(
    CharacterVector seqname,
    IntegerVector start,
    IntegerVector end,
    IntegerVector input_index,
    CharacterVector fasta)
{
    std::vector<int> result_index;         // index of input that output belongs to
    std::vector<std::string> result_value; // output values
    // create result in C++
    // ...
    // then return to R, e.g., allowing many orf per input
    return List::create(
        Named("index") = wrap(result_index),
        Named("orf") = wrap(result_value));
}

and would write an R wrapper that works on the class at the R level

rorfs <- function(grl) {
    gr <- unlist(grl, use.names=FALSE)
    result <- orfs(
        as.character(seqnames(gr)),
        start(gr),
        end(gr),
        rep(seq_along(grl), lengths(grl)),
        "fasta"
    )
    ## manipulate result to final form
    orf <- NA_character_
    orf[result$index] <- result$orf
    gr$orf <- orf
    relist(gr, grl)
}

 

ADD COMMENT
0
Entering edit mode

This will work, and make it a lot easier, and more like how C++ should be, since I'm polluting my code with a lot of R API right now, which makes it slower I think.

Thank you, I will accept your answer.

Another point you make that is important, is that if you vectorize R code efficiently, then it is not slower than Rcpp. Therefor my question is not really useful, because you can group granges by relist(), which is already Cpp code I guess ?

ADD REPLY
0
Entering edit mode

In general terms, R is efficient when it drops to C (most of R and many packages are written in C, not C++) for the implementation. The 'art' is identifying when an operation requires new C / C++ code, versus using convenient, flexible, and TESTED R code.

relist() in the GenomicRanges world is 'fast' because of clever object design -- generally, one gets an object that has 'list-like semantics', but is actually implemented as a single underlying object (GRanges) and a 'partitioning' vector that indicates how the elements of the underlying object are to be grouped -- 'relist()'ing just adds a vector to an object, and 'unlist()'  just removes the vector. There is certainly C code involved in these operations, but unlist() and relist() are not implemented directly in C.

ADD REPLY

Login before adding your answer.

Traffic: 812 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6