Question: Splitting lines of a GRanges object based on character list
0
gravatar for stephen.williams
13 months ago by
stephen.williams10 wrote:

I have a Granges object that was generated using some of the really nice info from this page (Mapping genome regions to gene symbols). I'm finding overlaps between my query Granges and my subject Granges (Homo.sapiens) and assigning gene symbols to the given locus. However when two genes overlap the same locus you get something like this. 

 

     seqnames                 ranges strand |     numBC    SYMBOL
         <Rle>              <IRanges>  <Rle> | <integer>    <CharacterList>
  [1]    chr12 [122692988, 122693157]      * |       174    DIABLO,VPS33A
  [2]    chr12 [122693161, 122693336]      * |       167    DIABLO,VPS33A
  [3]    chr12 [122694166, 122694413]      * |       133    DIABLO,VPS33A

 

Using the script

grange_test<- makeGRangesFromDataFrame(bc_test, keep.extra.columns=TRUE)
symInCnv_test = splitColumnByOverlap(hs, grange_test, "SYMBOL")
grange_test$SYMBOL <- symInCnv_test

 

However, the function 

splitColumnByOverlap <-
    function(query, subject, column="ENTREZID", ...)
{
    olaps <- findOverlaps(query, subject, ...)
    f1 <- factor(subjectHits(olaps),
                 levels=seq_len(subjectLength(olaps)))
    splitAsList(mcols(query)[[column]][queryHits(olaps)], f1)
}

creates a character list for the gene symbol. For a variety of reasons I actually need each gene to be in a new line as seen below. 

seqnames                 ranges strand |     numBC    SYMBOL
         <Rle>              <IRanges>  <Rle> | <integer>    <Character>
  [1]    chr12 [122692988, 122693157]      * |       174    DIABLO
  [2]    chr12 [122692988, 122693157]      * |       174    VPS33A
  [3]    chr12 [122693161, 122693336]      * |       167    DIABLO
  [4]    chr12 [122693161, 122693336]      * |       167    VPS33A
  [5]    chr12 [122694166, 122694413]      * |       133    DIABLO
  [6]    chr12 [122694166, 122694413]      * |       133    VPS33A

Can anyone think of a way to do this (GenomicRanges, fix  splitColumnByOverlap(), tidy, or otherwise)?

I've tried making my ending Granges a data.frame and splitting a variety of ways but nothing gets me where I need to be. Any help would be greatly appreciated. 

Thanks.

granges grangeslist • 330 views
ADD COMMENTlink modified 13 months ago by Michael Lawrence10k • written 13 months ago by stephen.williams10
Answer: Splitting lines of a GRanges object based on character list
2
gravatar for Michael Lawrence
13 months ago by
United States
Michael Lawrence10k wrote:
expand(grange_test, "SYMBOL")
ADD COMMENTlink written 13 months ago by Michael Lawrence10k

Thanks for the reply but this does not work.   

grange_test <- as.data.frame(grange_test) 
expand(grange_test, "SYMBOL")

Gives

# A tibble: 1 x 1
  `"SYMBOL"`
  <chr>     
1 SYMBOL    
ADD REPLYlink modified 13 months ago • written 13 months ago by stephen.williams10

Why are you coercing to a data frame first?

ADD REPLYlink written 13 months ago by Michael Lawrence10k

expand does not seem to work with Granges

expand(grange_test, "SYMBOL")
Error in UseMethod("expand_") : 
  no applicable method for 'expand_' applied to an object of class "c('GRanges', 'GenomicRanges', 'GRanges_OR_NULL', 'GRangesOrIRanges', 'Vector', 'GenomicRanges_OR_missing', 'GenomicRanges_OR_GRangesList', 'GenomicRanges_OR_GenomicRangesList', 'Annotated')"
ADD REPLYlink modified 13 months ago • written 13 months ago by stephen.williams10

I've gotten fairly close using

grange_test <- 
as.data.frame(grange_test) %>% 
  mutate(SYMBOL = strsplit(as.character(SYMBOL), ",")) %>% 
  unnest(SYMBOL)

But the resulting "SYMBOL" column has a bunch of left over characters that I'm having a hard time removing

seqnames     start       end    numBC    SYMBOL
chr3     150601398    150601565   168    c("CLRN1-AS1"
chr3     150601398    150601565   168    "CLRN1")
ADD REPLYlink written 13 months ago by stephen.williams10

Success! Your method worked but you have to use 

S4Vectors::expand

not

Matrix::expand

or

tidyr::expand
ADD REPLYlink modified 13 months ago • written 13 months ago by stephen.williams10

Depending on the context, of course.

ADD REPLYlink written 13 months ago by Michael Lawrence10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 124 users visited in the last hour