Hello Everybody!
I have two GRanges objects. One is my own data, containing 2000 sequence coordinates and metadata. The other one is from a database, coontaining 50.000.000 coordinates and metadata. However, this one does not have any strand information (which makes sense in this case). Now I want to merge my data with their data to get info that they provide for my genome positions. But Thatfore I have to ignore the strand info. How do I do that?
I found "mergeByOverlaps" but that gives not only the best overlap and therefor does not make sense.
Code should be placed in three backticks as shown below
#Database GRanges
GRanges object with 66426332 ranges and 1 metadata column:
seqnames ranges strand | Score
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 69091-69092 * | 0.000597936
[2] chr1 69092-69093 * | 0.004839474
[3] chr1 69093-69094 * | 0.271235400
[4] chr1 69094-69095 * | 0.000220117
[5] chr1 69095-69096 * | 0.000752375
... ... ... ... . ...
# my GRanges object
GRanges object with 2020 ranges and 1 metadata column:
seqnames ranges strand | exp
<Rle> <IRanges> <Rle> | <character>
[1] chr1 53947981-53947982 + | yes
[2] chr1 66585848-66585849 + | yes
[3] chr1 98738803-98738804 + | yes
[4] chr1 117456206-117456207 + | no
[5] chr1 154262226-154262225 - | yes
# trying my best to merge them
merge(Database, myGR)
GRanges object with 0 ranges and 2 metadata columns:
seqnames ranges strand | Score exp
<Rle> <IRanges> <Rle> | <numeric> <character>
mergeByOverlaps(Database, myGR)
DataFrame with 2828 rows and 4 columns
Database Score myGR exp
<GRanges> <numeric> <GRanges> <character>
1 chr1:1106649-1106650 2.11417e-06 chr1:1106650-1106649:- no
2 chr1:1301987-1301988 2.24527e-05 chr1:1301988-1301987:- no
3 chr1:1309602-1309603 6.46944e-05 chr1:1309603-1309604:+ no
4 chr1:1309603-1309604 9.95149e-01 chr1:1309603-1309604:+ no
5 chr1:1309604-1309605 4.53109e-05 chr1:1309603-1309604:+ no
I am trying to find a solution since a week and soon my brain will explode. So maybe someone can help :)
What do you mean by 'best overlap'? Are all of the ranges in both datasets of length 2? Or are these actually meant to be a single base position, but 0-start, half-open counting?
If you look at the result of mergeByOverlaps(Database, myGR) and compare row 3, 4 and 5, I would only want row 4 where both nucleotides are overlapping. So actually want an exact merge but it is not working because of the missing strand info :( All ranges are of length 2 and that is also what I am looking for. Can I maybe somehow delete the strand info in myGR?
I actually thought that I found a way.
However, it does not work cause the Ranges in myGR that are on the neg strand are descending (eg 839444 - 839443) while those on the pos strand are increasing of course. So even if I remove the strand info, I only get those on the + strand merged.
Thats what I did: