Hi Dagmar,
It is very interesting. It should be designed to show the peak names in the output of peaklist. If there are duplicated names or NA names, it will convert to peak number. I am not sure I fully understand your case. If you got peak names in GRanges1 and GRanges2, but not got peak names in overlapping peaks, this should be a bug. If this is the case, could you kindly send me your data and the code to repeat that? Thank you.
> library(ChIPpeakAnno)
> packageVersion("ChIPpeakAnno")
[1] ‘3.2.0’
> bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno")
> gr1 <- toGRanges(bed, format="BED", header=FALSE)
> gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno")
> gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3)
> head(gr1)
GRanges object with 6 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <numeric>
MACS_peak_1 chr1 [ 28341, 29610] * | 160.81
MACS_peak_2 chr1 [ 90821, 91234] * | 133.12
MACS_peak_3 chr1 [134974, 135538] * | 138.99
MACS_peak_4 chr1 [136331, 137068] * | 106.17
MACS_peak_5 chr1 [137277, 137847] * | 124.9
MACS_peak_6 chr1 [326732, 327221] * | 190.74
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> head(gr2)
GRanges object with 6 ranges and 4 metadata columns:
seqnames ranges strand | source score frame group
<Rle> <IRanges> <Rle> | <factor> <integer> <factor> <factor>
region_0 chr1 [713893, 714747] + | bed2gff 0 . region_0;
region_1 chr1 [715023, 715578] + | bed2gff 0 . region_1;
region_2 chr1 [724851, 725445] + | bed2gff 0 . region_2;
region_3 chr1 [839467, 840090] + | bed2gff 0 . region_3;
region_4 chr1 [856361, 856999] + | bed2gff 0 . region_4;
region_5 chr1 [859315, 859903] + | bed2gff 0 . region_5;
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> head(peaklist[[3]])
GRanges object with 6 ranges and 1 metadata column:
seqnames ranges strand | peakNames
<Rle> <IRanges> <Rle> | <CharacterList>
[1] chr1 [713791, 715578] * | gr1__MACS_peak_13,gr2__region_0,gr2__region_1
[2] chr1 [724851, 727191] * | gr2__region_2,gr1__MACS_peak_14
[3] chr1 [839467, 840090] * | gr1__MACS_peak_16,gr2__region_3
[4] chr1 [856361, 856999] * | gr1__MACS_peak_17,gr2__region_4
[5] chr1 [859315, 860144] * | gr2__region_5,gr1__MACS_peak_18
[6] chr1 [870970, 871568] * | gr2__region_7,gr1__MACS_peak_19
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> ol <- findOverlapsOfPeaks(gr1, gr2)
> peaklist <- ol$peaklist
> names(peaklist)
[1] "gr2" "gr1" "gr1///gr2"
> head(peaklist[[1]])
GRanges object with 6 ranges and 1 metadata column:
seqnames ranges strand | peakNames
<Rle> <IRanges> <Rle> | <CharacterList>
[1] chr1 [ 860248, 860833] + | gr2__region_6
[2] chr1 [ 905647, 906230] + | gr2__region_12
[3] chr1 [ 908528, 909096] + | gr2__region_13
[4] chr1 [ 918145, 918733] + | gr2__region_16
[5] chr1 [ 986902, 987370] + | gr2__region_23
[6] chr1 [1004125, 1004714] + | gr2__region_26
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
Hi,
thanks for your reply. I found out why it did not keep the names. I only had them as a metadata column gr1$names not as names(gr1).
I changed this and now it works.
However, any suggestions how to keep the metadata?
Hi,
To keep metadata is difficult. When I develop the package, I should consider about the metadata format. If all the peaks keep same structure of metadata, it is much easy. However, not every time it will work like that. When then peaks contain different metadata, I could not suppose the same column name indicates same info. Even more, if the metadata is contain complex data such as list, it will be time consuming to merge them.
Here, once you got the peakNames as CharacterList, you can extract the metadata as you want. My codes may be
Hi,
Regarding on your data table, in particular, the score column, how can I get correct pvalue format on it, and add it as new slot on GRanges objects? Thanks
Best regards
Jurat
Maybe you can have a try the following code to understand how to add original metadata into the overlapping peaks: