Merge GRanges, ignore strand
1
0
Entering edit mode
Marcus • 0
@5c953da8
Last seen 12 months ago
Germany

Hello Everybody!

I have two GRanges objects. One is my own data, containing 2000 sequence coordinates and metadata. The other one is from a database, coontaining 50.000.000 coordinates and metadata. However, this one does not have any strand information (which makes sense in this case). Now I want to merge my data with their data to get info that they provide for my genome positions. But Thatfore I have to ignore the strand info. How do I do that?

I found "mergeByOverlaps" but that gives not only the best overlap and therefor does not make sense.

Code should be placed in three backticks as shown below

#Database GRanges
GRanges object with 66426332 ranges and 1 metadata column:
             seqnames            ranges strand | Score
                <Rle>         <IRanges>  <Rle> |     <numeric>
         [1]     chr1       69091-69092      * |   0.000597936
         [2]     chr1       69092-69093      * |   0.004839474
         [3]     chr1       69093-69094      * |   0.271235400
         [4]     chr1       69094-69095      * |   0.000220117
         [5]     chr1       69095-69096      * |   0.000752375
         ...      ...               ...    ... .           ...
# my GRanges object
GRanges object with 2020 ranges and 1 metadata column:
         seqnames              ranges strand |           exp
            <Rle>           <IRanges>  <Rle> | <character>
     [1]     chr1   53947981-53947982      + |         yes
     [2]     chr1   66585848-66585849      + |         yes
     [3]     chr1   98738803-98738804      + |         yes
     [4]     chr1 117456206-117456207      + |         no
     [5]     chr1 154262226-154262225      - |         yes

# trying my best to merge them
merge(Database, myGR)

GRanges object with 0 ranges and 2 metadata columns:
   seqnames    ranges strand |    Score           exp
      <Rle> <IRanges>  <Rle> |     <numeric> <character>

mergeByOverlaps(Database, myGR)

DataFrame with 2828 rows and 4 columns
                   Database             Score                        myGR                          exp
                    <GRanges>     <numeric>                  <GRanges> <character>
1        chr1:1106649-1106650   2.11417e-06     chr1:1106650-1106649:-          no
2        chr1:1301987-1301988   2.24527e-05     chr1:1301988-1301987:-          no
3        chr1:1309602-1309603   6.46944e-05     chr1:1309603-1309604:+          no
4        chr1:1309603-1309604   9.95149e-01     chr1:1309603-1309604:+          no
5        chr1:1309604-1309605   4.53109e-05     chr1:1309603-1309604:+          no

I am trying to find a solution since a week and soon my brain will explode. So maybe someone can help :)

Sequencing SummarizedExperiment GRanges • 1.4k views
ADD COMMENT
0
Entering edit mode

What do you mean by 'best overlap'? Are all of the ranges in both datasets of length 2? Or are these actually meant to be a single base position, but 0-start, half-open counting?

ADD REPLY
0
Entering edit mode

If you look at the result of mergeByOverlaps(Database, myGR) and compare row 3, 4 and 5, I would only want row 4 where both nucleotides are overlapping. So actually want an exact merge but it is not working because of the missing strand info :( All ranges are of length 2 and that is also what I am looking for. Can I maybe somehow delete the strand info in myGR?

ADD REPLY
0
Entering edit mode

I actually thought that I found a way.

However, it does not work cause the Ranges in myGR that are on the neg strand are descending (eg 839444 - 839443) while those on the pos strand are increasing of course. So even if I remove the strand info, I only get those on the + strand merged.

Thats what I did:

myData <- data.frame(myGR)

myData$strand <- c("*")

newGR <- makeGRangesFromDataFrame(myData, start.field="start", end.field = "end", keep.extra.columns = TRUE, seqnames.field="seqnames")

merge(Database, newGR)
ADD REPLY
2
Entering edit mode
@james-w-macdonald-5106
Last seen 23 hours ago
United States

Does this help?

> gr1 <- GRanges(rep("chr1", 5), IRanges(1:5, width = 2), rep("+", 5))
> gr2 <- GRanges(rep("chr1", 5), IRanges(c(1,3,5,7,9), width = 2), rep("-", 5))
> mergeByOverlaps(gr1, gr2)
DataFrame with 0 rows and 2 columns
> mergeByOverlaps(gr1, gr2, ignore.strand = TRUE)
DataFrame with 7 rows and 2 columns
         gr1        gr2
   <GRanges>  <GRanges>
1 chr1:1-2:+ chr1:1-2:-
2 chr1:2-3:+ chr1:1-2:-
3 chr1:2-3:+ chr1:3-4:-
4 chr1:3-4:+ chr1:3-4:-
5 chr1:4-5:+ chr1:3-4:-
6 chr1:4-5:+ chr1:5-6:-
7 chr1:5-6:+ chr1:5-6:-
> mergeByOverlaps(gr1, gr2, ignore.strand = TRUE, minoverlap = 2)
DataFrame with 3 rows and 2 columns
         gr1        gr2
   <GRanges>  <GRanges>
1 chr1:1-2:+ chr1:1-2:-
2 chr1:3-4:+ chr1:3-4:-
3 chr1:5-6:+ chr1:5-6:-
ADD COMMENT
0
Entering edit mode

That does help. I also didnt know I can use ignore.strand = TRUE when merging GRanges. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6