Question

Merge GRanges, ignore strand

0

Entering edit mode

Marcus • 0

@5c953da8

Last seen 12 months ago

Germany

Hello Everybody!

I have two GRanges objects. One is my own data, containing 2000 sequence coordinates and metadata. The other one is from a database, coontaining 50.000.000 coordinates and metadata. However, this one does not have any strand information (which makes sense in this case). Now I want to merge my data with their data to get info that they provide for my genome positions. But Thatfore I have to ignore the strand info. How do I do that?

I found "mergeByOverlaps" but that gives not only the best overlap and therefor does not make sense.

Code should be placed in three backticks as shown below

#Database GRanges
GRanges object with 66426332 ranges and 1 metadata column:
             seqnames            ranges strand | Score
                <Rle>         <IRanges>  <Rle> |     <numeric>
         [1]     chr1       69091-69092      * |   0.000597936
         [2]     chr1       69092-69093      * |   0.004839474
         [3]     chr1       69093-69094      * |   0.271235400
         [4]     chr1       69094-69095      * |   0.000220117
         [5]     chr1       69095-69096      * |   0.000752375
         ...      ...               ...    ... .           ...
# my GRanges object
GRanges object with 2020 ranges and 1 metadata column:
         seqnames              ranges strand |           exp
            <Rle>           <IRanges>  <Rle> | <character>
     [1]     chr1   53947981-53947982      + |         yes
     [2]     chr1   66585848-66585849      + |         yes
     [3]     chr1   98738803-98738804      + |         yes
     [4]     chr1 117456206-117456207      + |         no
     [5]     chr1 154262226-154262225      - |         yes

# trying my best to merge them
merge(Database, myGR)

GRanges object with 0 ranges and 2 metadata columns:
   seqnames    ranges strand |    Score           exp
      <Rle> <IRanges>  <Rle> |     <numeric> <character>

mergeByOverlaps(Database, myGR)

DataFrame with 2828 rows and 4 columns
                   Database             Score                        myGR                          exp
                    <GRanges>     <numeric>                  <GRanges> <character>
1        chr1:1106649-1106650   2.11417e-06     chr1:1106650-1106649:-          no
2        chr1:1301987-1301988   2.24527e-05     chr1:1301988-1301987:-          no
3        chr1:1309602-1309603   6.46944e-05     chr1:1309603-1309604:+          no
4        chr1:1309603-1309604   9.95149e-01     chr1:1309603-1309604:+          no
5        chr1:1309604-1309605   4.53109e-05     chr1:1309603-1309604:+          no

I am trying to find a solution since a week and soon my brain will explode. So maybe someone can help :)

Sequencing SummarizedExperiment GRanges • 1.4k views

ADD COMMENT • link 20 months ago Marcus • 0

0

Entering edit mode

What do you mean by 'best overlap'? Are all of the ranges in both datasets of length 2? Or are these actually meant to be a single base position, but 0-start, half-open counting?

ADD REPLY • link 20 months ago James W. MacDonald 65k

0

Entering edit mode

If you look at the result of mergeByOverlaps(Database, myGR) and compare row 3, 4 and 5, I would only want row 4 where both nucleotides are overlapping. So actually want an exact merge but it is not working because of the missing strand info :( All ranges are of length 2 and that is also what I am looking for. Can I maybe somehow delete the strand info in myGR?

ADD REPLY • link 20 months ago Marcus • 0

0

Entering edit mode

I actually thought that I found a way.

However, it does not work cause the Ranges in myGR that are on the neg strand are descending (eg 839444 - 839443) while those on the pos strand are increasing of course. So even if I remove the strand info, I only get those on the + strand merged.

Thats what I did:

myData <- data.frame(myGR)

myData$strand <- c("*")

newGR <- makeGRangesFromDataFrame(myData, start.field="start", end.field = "end", keep.extra.columns = TRUE, seqnames.field="seqnames")

merge(Database, newGR)

ADD REPLY • link 20 months ago Marcus • 0

score 2 · Accepted Answer · 2022-09-01

Does this help?

> gr1 <- GRanges(rep("chr1", 5), IRanges(1:5, width = 2), rep("+", 5))
> gr2 <- GRanges(rep("chr1", 5), IRanges(c(1,3,5,7,9), width = 2), rep("-", 5))
> mergeByOverlaps(gr1, gr2)
DataFrame with 0 rows and 2 columns
> mergeByOverlaps(gr1, gr2, ignore.strand = TRUE)
DataFrame with 7 rows and 2 columns
         gr1        gr2
   <GRanges>  <GRanges>
1 chr1:1-2:+ chr1:1-2:-
2 chr1:2-3:+ chr1:1-2:-
3 chr1:2-3:+ chr1:3-4:-
4 chr1:3-4:+ chr1:3-4:-
5 chr1:4-5:+ chr1:3-4:-
6 chr1:4-5:+ chr1:5-6:-
7 chr1:5-6:+ chr1:5-6:-
> mergeByOverlaps(gr1, gr2, ignore.strand = TRUE, minoverlap = 2)
DataFrame with 3 rows and 2 columns
         gr1        gr2
   <GRanges>  <GRanges>
1 chr1:1-2:+ chr1:1-2:-
2 chr1:3-4:+ chr1:3-4:-
3 chr1:5-6:+ chr1:5-6:-