I have 2 GRanges objects:
> cnv1
GRanges object with 3 ranges and 7 metadata columns:
seqnames ranges strand | T10229 T10380 T10713 T2523
<Rle> <IRanges> <Rle> | <factor> <factor> <factor> <factor>
[1] 1 [ 65509, 721942] * | CN1 CN2 CN2 CN2
[2] 1 [ 762280, 3417947] * | CN1 CN1 CN1 CN1
[3] 1 [7289685, 7737801] * | CN2 CN1 CN2 CN2
T2378 T2501 T2556
<logical> <logical> <logical>
[1] <NA> <NA> <NA>
[2] <NA> <NA> <NA>
[3] <NA> <NA> <NA>
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
> cnv2
GRanges object with 5 ranges and 7 metadata columns:
seqnames ranges strand | T10229 T10380 T10713
<Rle> <IRanges> <Rle> | <logical> <logical> <logical>
[1] 1 [ 14620, 783151] * | <NA> <NA> <NA>
[2] 1 [1374455, 1438931] * | <NA> <NA> <NA>
[3] 1 [1593783, 1596048] * | <NA> <NA> <NA>
[4] 1 [1601455, 1630886] * | <NA> <NA> <NA>
[5] 1 [1634934, 1683496] * | <NA> <NA> <NA>
T2523 T2378 T2501 T2556
<logical> <factor> <factor> <factor>
[1] <NA> CN2 CN2 CN2
[2] <NA> CN2 CN2 CN2
[3] <NA> CN2 CN2 CN2
[4] <NA> CN2 CN2 CN2
[5] <NA> CN3 CN1 CN1
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
If I combine them:
cnv = c(cnv1, cnv2)
> cnv
GRanges object with 8 ranges and 7 metadata columns:
seqnames ranges strand | T10229 T10380 T10713 T2523
<Rle> <IRanges> <Rle> | <factor> <factor> <factor> <factor>
[1] 1 [ 65509, 721942] * | CN1 CN2 CN2 CN2
[2] 1 [ 762280, 3417947] * | CN1 CN1 CN1 CN1
[3] 1 [7289685, 7737801] * | CN2 CN1 CN2 CN2
[4] 1 [ 14620, 783151] * | <NA> <NA> <NA> <NA>
[5] 1 [1374455, 1438931] * | <NA> <NA> <NA> <NA>
[6] 1 [1593783, 1596048] * | <NA> <NA> <NA> <NA>
[7] 1 [1601455, 1630886] * | <NA> <NA> <NA> <NA>
[8] 1 [1634934, 1683496] * | <NA> <NA> <NA> <NA>
T2378 T2501 T2556
<factor> <factor> <factor>
[1] <NA> <NA> <NA>
[2] <NA> <NA> <NA>
[3] <NA> <NA> <NA>
[4] CN2 CN2 CN2
[5] CN2 CN2 CN2
[6] CN2 CN2 CN2
[7] CN2 CN2 CN2
[8] CN3 CN1 CN1
( In the end, all NAs will be replaced by CN2)
If I apply reduce function on this, I lose all elementMetadata:
> reduce(cnv)
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 1 [ 14620, 3417947] *
[2] 1 [7289685, 7737801] *
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
-----------------------------------------------------------------------------------------------------------------
Two problems here:
(1) This is too much collpased. I need to use same level of collapsing as cn.mops. I could not find in source code what cn.mops is using for "min.gapwidth".
(2) I need to create a new elementMetadata for cnv. The only thing I can figure out is comparing the ranges of each row in cnv1 and cnv2 with the ranges of cnv. Is there any easier way of doing the whole thing?
--------------------------------------------------------------------------------------------------------------------
Also, originally cnv1 and cnv2 did not have compatible elementMetadata, so I did:
n.col.1 = ncol(mcols(cnv1))
n.col.2 = ncol(mcols(cnv2))
n.col = n.col.1 + n.col.2
sam.names = c(names(mcols(cnv1)), names(mcols(cnv2)))
cn1 = as.data.frame(matrix(NA, nrow=length(seq_along(cnv1)), ncol=n.col))
names(cn1) = sam.names
cn1[ ,1:n.col.1] = as.data.frame(mcols(cnv1))
mcols(cnv1) = cn1
cn2 = as.data.frame(matrix(NA, nrow=length(seq_along(cnv2)), ncol=n.col))
names(cn2) = sam.names
cn2[ ,(n.col.1+1):n.col] = as.data.frame(mcols(cnv2))
mcols(cnv2) = cn2
Any short cut for this?
--------------------------------------------------------------------------------------------------------
Even when 2 GRanges objects are completely compatible, if I combine them with:
cnv = c(cnv1, cnv2)
cnv is a list of 2 GRanges object.
Thank you very much for your reply.
(1) I need 2 ways of combining. One is merging, one is concatenating for 2 GRanges objects with compatible elementMetadata.
For the second case, if I run cnv = c(cnv1, cnv2) on GRanges objects created by cn.mops, this gives GRangesList object. If I run cnv = c(cnv1, cnv2) on GRanges objects created by myself, this gives GRanges object. The only difference I can say is that in my GRanges objects, the columns in elementMetadata are characters, while those in the other case are factors.
(2) By "too much collapsed", I meant there are too few rows after merging. Probably "min.gapwidth" can adjust this? I have to give same value to this parameter as cn.mops.
Please post the actual code that produces that inconsistency. That could happen if you pass named arguments to c(). The min.gapwidth could make it so that only overlapping (not adjacent) ranges are merged.
> class(CNVs)
[1] "CNVDetectionResult"
attr(,"package")
[1] "cn.mops"
> cnv = cnvs(CNVs)[1:4, ]
> cnv
GRanges object with 4 ranges and 4 metadata columns:
seqnames ranges strand | sampleName median mean
<Rle> <IRanges> <Rle> | <factor> <numeric> <numeric>
[1] 1 [ 65509, 721942] * | T10229 -0.9529843 -1.5198
[2] 1 [ 1643790, 1645523] * | T10229 -4.1718849 -3.5928
[3] 1 [21806582, 21809808] * | T10229 -3.0625941 -2.9810
[4] 1 [24172257, 24192026] * | T10229 -0.9788206 -0.9915
CN
<character>
[1] CN1
[2] CN1
[3] CN1
[4] CN1
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
> c(c(), cnv)
[[1]]
GRanges object with 4 ranges and 4 metadata columns:
seqnames ranges strand | sampleName median mean
<Rle> <IRanges> <Rle> | <factor> <numeric> <numeric>
[1] 1 [ 65509, 721942] * | T10229 -0.9529843 -1.5198
[2] 1 [ 1643790, 1645523] * | T10229 -4.1718849 -3.5928
[3] 1 [21806582, 21809808] * | T10229 -3.0625941 -2.9810
[4] 1 [24172257, 24192026] * | T10229 -0.9788206 -0.9915
CN
<character>
[1] CN1
[2] CN1
[3] CN1
[4] CN1
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
I'm not sure that code is relevant, but anyway, that happens because
c()
producesNULL
andNULL
is not a GRanges, so it won't make a GRangesList. One could argue thatc()
should always dropNULL
and dispatch on the first non-NULL
argument. Btw,c()
is a bit weird in that it is a variadic primitive, so dispatch happens only on the first argument, so maybe we should fix that, too. That would require changes in R though, so I would just not do what you're doing.I tried, and it works now. Concatenating is working correctly now.
For merging with the function "reduce", min.gapwidth cannot do what I am trying to do:
Ranges separated by a gap of at least
min.gapwidth
positions are not merged.Is there any way to set a restriction about at least how much overlap the ranges should have in order to be merged?
Thank you very much for your help.