Concatenate individual genomic intervals into populational regions
1
0
Entering edit mode
@vinicius-henrique-da-silva-6713
Last seen 11 months ago
Brazil

I would like to concatenate individual genomic intervals into common regions. 

My input: 

   dfin <- "chr start end sample type
            1   10    20   NE1    loss
            1   5     15   NE2    gain
            1   25    30   NE1    gain
            2   40    50   NE1    loss
            2   40    60   NE2    loss
            3   20    30   NE1    gain"
    dfin <- read.table(text=dfin, header=T)

My expected output: 

    dfout <- "chr start end samples type
            1   5     20   NE1-NE2  both
            1   25    30   NE1      gain
            2   40    60   NE1-NE2  loss
            3   20    30   NE1      gain"
    dfout <- read.table(text=dfout, header=T)

The intervals in dfin will never overlap in the same animal, just between animals (columns sample and samples, respectively). The column type have two factors (loss and gain) in dfin and is expected to have three factors in dfout (loss, gain and both, which occur when the concatenated region in dfout was based on both loss and gain).   

Any idea to deal with that?

genomicranges • 834 views
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 9 hours ago
Seattle, WA, United States
Hi,
library(GenomicRanges)
gr <- as(dfin, "GRanges")

out_gr <- reduce(gr, with.revmap=TRUE)
out_samples <- extractList(as.character(mcols(gr)$sample),
                           mcols(out_gr)$revmap)
out_samples <- as.factor(unstrsplit(sort(out_samples), sep="-"))
out_type <- extractList(as.character(mcols(gr)$type),
                        mcols(out_gr)$revmap)
out_type[elementLengths(out_type) == 2] <- "both"
out_type <- factor(as.character(out_type),
                   levels=c(levels(mcols(gr)$type), "both"))
mcols(out_gr) <- DataFrame(samples=out_samples, type=out_type)
dfout <- as.data.frame(out_gr)
dfout
#   seqnames start end width strand samples type
# 1        1     5  20    16      * NE1-NE2 both
# 2        1    25  30     6      *     NE1 gain
# 3        2    40  60    21      * NE1-NE2 both
# 4        3    20  30    11      *     NE1 gain

Cheers,

H.

ADD COMMENT

Login before adding your answer.

Traffic: 662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6