Question

reduce function after combining 2 GRanges objects from cn.mops output

0

Entering edit mode

Haiying.Kong ▴ 110

@haiyingkong-9254

Last seen 5.0 years ago

Germany

I have 2 GRanges objects:

> cnv1
GRanges object with 3 ranges and 7 metadata columns:
seqnames ranges strand | T10229 T10380 T10713 T2523
<Rle> <IRanges> <Rle> | <factor> <factor> <factor> <factor>
[1] 1 [ 65509, 721942] * | CN1 CN2 CN2 CN2
[2] 1 [ 762280, 3417947] * | CN1 CN1 CN1 CN1
[3] 1 [7289685, 7737801] * | CN2 CN1 CN2 CN2
T2378 T2501 T2556
<logical> <logical> <logical>
[1] <NA> <NA> <NA>
[2] <NA> <NA> <NA>
[3] <NA> <NA> <NA>
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
> cnv2
GRanges object with 5 ranges and 7 metadata columns:
seqnames ranges strand | T10229 T10380 T10713
<Rle> <IRanges> <Rle> | <logical> <logical> <logical>
[1] 1 [ 14620, 783151] * | <NA> <NA> <NA>
[2] 1 [1374455, 1438931] * | <NA> <NA> <NA>
[3] 1 [1593783, 1596048] * | <NA> <NA> <NA>
[4] 1 [1601455, 1630886] * | <NA> <NA> <NA>
[5] 1 [1634934, 1683496] * | <NA> <NA> <NA>
T2523 T2378 T2501 T2556
<logical> <factor> <factor> <factor>
[1] <NA> CN2 CN2 CN2
[2] <NA> CN2 CN2 CN2
[3] <NA> CN2 CN2 CN2
[4] <NA> CN2 CN2 CN2
[5] <NA> CN3 CN1 CN1
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths

If I combine them:

cnv = c(cnv1, cnv2)

> cnv

GRanges object with 8 ranges and 7 metadata columns:
seqnames ranges strand | T10229 T10380 T10713 T2523
<Rle> <IRanges> <Rle> | <factor> <factor> <factor> <factor>
[1] 1 [ 65509, 721942] * | CN1 CN2 CN2 CN2
[2] 1 [ 762280, 3417947] * | CN1 CN1 CN1 CN1
[3] 1 [7289685, 7737801] * | CN2 CN1 CN2 CN2
[4] 1 [ 14620, 783151] * | <NA> <NA> <NA> <NA>
[5] 1 [1374455, 1438931] * | <NA> <NA> <NA> <NA>
[6] 1 [1593783, 1596048] * | <NA> <NA> <NA> <NA>
[7] 1 [1601455, 1630886] * | <NA> <NA> <NA> <NA>
[8] 1 [1634934, 1683496] * | <NA> <NA> <NA> <NA>
T2378 T2501 T2556
<factor> <factor> <factor>
[1] <NA> <NA> <NA>
[2] <NA> <NA> <NA>
[3] <NA> <NA> <NA>
[4] CN2 CN2 CN2
[5] CN2 CN2 CN2
[6] CN2 CN2 CN2
[7] CN2 CN2 CN2
[8] CN3 CN1 CN1

( In the end, all NAs will be replaced by CN2)

If I apply reduce function on this, I lose all elementMetadata:

> reduce(cnv)
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 1 [ 14620, 3417947] *
[2] 1 [7289685, 7737801] *
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths

-----------------------------------------------------------------------------------------------------------------

Two problems here:

(1) This is too much collpased. I need to use same level of collapsing as cn.mops. I could not find in source code what cn.mops is using for "min.gapwidth".

(2) I need to create a new elementMetadata for cnv. The only thing I can figure out is comparing the ranges of each row in cnv1 and cnv2 with the ranges of cnv. Is there any easier way of doing the whole thing?

--------------------------------------------------------------------------------------------------------------------

Also, originally cnv1 and cnv2 did not have compatible elementMetadata, so I did:

n.col.1 = ncol(mcols(cnv1))
n.col.2 = ncol(mcols(cnv2))
n.col = n.col.1 + n.col.2
sam.names = c(names(mcols(cnv1)), names(mcols(cnv2)))

cn1 = as.data.frame(matrix(NA, nrow=length(seq_along(cnv1)), ncol=n.col))
names(cn1) = sam.names
cn1[ ,1:n.col.1] = as.data.frame(mcols(cnv1))
mcols(cnv1) = cn1

cn2 = as.data.frame(matrix(NA, nrow=length(seq_along(cnv2)), ncol=n.col))
names(cn2) = sam.names
cn2[ ,(n.col.1+1):n.col] = as.data.frame(mcols(cnv2))
mcols(cnv2) = cn2

Any short cut for this?

--------------------------------------------------------------------------------------------------------

Even when 2 GRanges objects are completely compatible, if I combine them with:

cnv = c(cnv1, cnv2)

cnv is a list of 2 GRanges object.

GRanges cn.mops • 1.8k views

ADD COMMENT • link updated 6.8 years ago by Michael Lawrence ★ 11k • written 6.8 years ago by Haiying.Kong ▴ 110

score 2 · Accepted Answer · 2017-06-27

2

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 2.4 years ago

United States

See the with.revmap= argument for reduce(). It gives you the subscripts for merging the original input into the reduced output. I'm not sure what you mean by "too much collapsed". How much collapsed do you want it to be? For merging the tables, you could do this:

mcols(cnv1)$dummy <- seq_along(cnv1)
mcols(cnv2)$dummy <- seq_along(cnv2) + length(cnv1)
merge(mcols(cnv1), mcols(cnv2), by="dummy", all=TRUE)

ADD COMMENT • link 6.8 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thank you very much for your reply.

(1) I need 2 ways of combining. One is merging, one is concatenating for 2 GRanges objects with compatible elementMetadata.

For the second case, if I run cnv = c(cnv1, cnv2) on GRanges objects created by cn.mops, this gives GRangesList object. If I run cnv = c(cnv1, cnv2) on GRanges objects created by myself, this gives GRanges object. The only difference I can say is that in my GRanges objects, the columns in elementMetadata are characters, while those in the other case are factors.

(2) By "too much collapsed", I meant there are too few rows after merging. Probably "min.gapwidth" can adjust this? I have to give same value to this parameter as cn.mops.

ADD REPLY • link 6.8 years ago Haiying.Kong ▴ 110

0

Entering edit mode

Please post the actual code that produces that inconsistency. That could happen if you pass named arguments to c(). The min.gapwidth could make it so that only overlapping (not adjacent) ranges are merged.

ADD REPLY • link 6.8 years ago Michael Lawrence ★ 11k

0

Entering edit mode

> class(CNVs)
[1] "CNVDetectionResult"
attr(,"package")
[1] "cn.mops"
> cnv = cnvs(CNVs)[1:4, ]
> cnv
GRanges object with 4 ranges and 4 metadata columns:
seqnames ranges strand | sampleName median mean
<Rle> <IRanges> <Rle> | <factor> <numeric> <numeric>
[1] 1 [ 65509, 721942] * | T10229 -0.9529843 -1.5198
[2] 1 [ 1643790, 1645523] * | T10229 -4.1718849 -3.5928
[3] 1 [21806582, 21809808] * | T10229 -3.0625941 -2.9810
[4] 1 [24172257, 24192026] * | T10229 -0.9788206 -0.9915
CN
<character>
[1] CN1
[2] CN1
[3] CN1
[4] CN1
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
> c(c(), cnv)
[[1]]
GRanges object with 4 ranges and 4 metadata columns:
seqnames ranges strand | sampleName median mean
<Rle> <IRanges> <Rle> | <factor> <numeric> <numeric>
[1] 1 [ 65509, 721942] * | T10229 -0.9529843 -1.5198
[2] 1 [ 1643790, 1645523] * | T10229 -4.1718849 -3.5928
[3] 1 [21806582, 21809808] * | T10229 -3.0625941 -2.9810
[4] 1 [24172257, 24192026] * | T10229 -0.9788206 -0.9915
CN
<character>
[1] CN1
[2] CN1
[3] CN1
[4] CN1
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths

ADD REPLY • link 6.8 years ago Haiying.Kong ▴ 110

1

Entering edit mode

I'm not sure that code is relevant, but anyway, that happens because c() produces NULL and NULL is not a GRanges, so it won't make a GRangesList. One could argue that c() should always drop NULL and dispatch on the first non-NULL argument. Btw, c() is a bit weird in that it is a variadic primitive, so dispatch happens only on the first argument, so maybe we should fix that, too. That would require changes in R though, so I would just not do what you're doing.

ADD REPLY • link 6.8 years ago Michael Lawrence ★ 11k

0

Entering edit mode

I tried, and it works now. Concatenating is working correctly now.

For merging with the function "reduce", min.gapwidth cannot do what I am trying to do:

Ranges separated by a gap of at least min.gapwidth positions are not merged.

Is there any way to set a restriction about at least how much overlap the ranges should have in order to be merged?

Thank you very much for your help.

ADD REPLY • link 6.8 years ago Haiying.Kong ▴ 110

score 2 · Accepted Answer · 2017-06-28

2

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 2.4 years ago

United States

No, there is no way to do that with reduce() directly. You could use findOverlaps() and specify a minoverlap. Then get the Hits into a graph and find connected components like this:Select from Overlapping Genomic Ranges based on Score or other Metadata.

ADD COMMENT • link 6.8 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thanks again for your help.

findOverlaps() needs 2 GRanges objects, query and subject, and find overlaps between them. This is slightly different from what I am trying to do. I have only one GRanges object, and the ranges in this object has overlapping genomic regions. I am trying to collapse those with much overlapping regions.

This probably is possible that I do a loop. First, I compare the first row and the second row, and save the result, then compare the 3rd row with the result, and update the result, and 4th row, ........ Is there any easier way of doing this?

ADD REPLY • link 6.8 years ago Haiying.Kong ▴ 110

0

Entering edit mode

Check out that link in my answer. It shows the basics of this approach, including that you can call findOverlaps() with only a single argument.

ADD REPLY • link 6.8 years ago Michael Lawrence ★ 11k