Question: ChIPpeakAnno: findOverlapsOfPeaks keep GRanges metadata
0
gravatar for da.de
4.4 years ago by
da.de10
Austria
da.de10 wrote:

Hi,

I am using the findOverlapsOfPeaks function from the ChIPpeakAnno_3.2.0 package.

Is there a possibility to keep the metadata of the GRanges objects used for the overlap? Would be nice to have it in the output (peaklist GRanges).

Or to at least keep the peak names from the input. For me it seems like findOverlapsOfPeaks is using the name of the GRanges + number for the peak names in the output peaklist.

GRanges1 peak=protA_antibody_1212
GRanges2 peak=protB_antibody_5894
names used in peaklist GRanges1///GRanges2:   GRanges1__1, GRanges2__1
would like to have: protA_antibody_1212, protB_antibody_5894

Thanks for your help!
Dagmar

ADD COMMENTlink modified 4.3 years ago by Hari Easwaran240 • written 4.4 years ago by da.de10
Answer: ChIPpeakAnno: findOverlapsOfPeaks keep GRanges metadata
1
gravatar for Ou, Jianhong
4.3 years ago by
Ou, Jianhong1.1k
United States
Ou, Jianhong1.1k wrote:

Hi Hari,

Please update your R into 3.2.0 or above.

findOverlapsOfPeaks function is introduced in ChIPpeakAnno v3.2.0 (happened to be same as R version) or above.
ADD COMMENTlink written 4.3 years ago by Ou, Jianhong1.1k
Answer: ChIPpeakAnno: findOverlapsOfPeaks keep GRanges metadata
0
gravatar for Guangchuang Yu
4.4 years ago by
Guangchuang Yu1.1k
China/Guangzhou/Southern Medical University
Guangchuang Yu1.1k wrote:

You may interested in ChIPseeker, http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btv145

ADD COMMENTlink written 4.4 years ago by Guangchuang Yu1.1k
Answer: ChIPpeakAnno: findOverlapsOfPeaks keep GRanges metadata
0
gravatar for Ou, Jianhong
4.4 years ago by
Ou, Jianhong1.1k
United States
Ou, Jianhong1.1k wrote:

Hi Dagmar,

It is very interesting. It should be designed to show the peak names in the output of peaklist. If there are duplicated names or NA names, it will convert to peak number. I am not sure I fully understand your case. If you got peak names in GRanges1 and GRanges2, but not got peak names in overlapping peaks, this should be a bug. If this is the case, could you kindly send me your data and the code to repeat that? Thank you.

 

> library(ChIPpeakAnno)
> packageVersion("ChIPpeakAnno")
[1] ‘3.2.0’
> bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno")
> gr1 <- toGRanges(bed, format="BED", header=FALSE)
> gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno")
> gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3)
> head(gr1)
GRanges object with 6 ranges and 1 metadata column:
              seqnames           ranges strand |     score
                 <Rle>        <IRanges>  <Rle> | <numeric>
  MACS_peak_1     chr1 [ 28341,  29610]      * |    160.81
  MACS_peak_2     chr1 [ 90821,  91234]      * |    133.12
  MACS_peak_3     chr1 [134974, 135538]      * |    138.99
  MACS_peak_4     chr1 [136331, 137068]      * |    106.17
  MACS_peak_5     chr1 [137277, 137847]      * |     124.9
  MACS_peak_6     chr1 [326732, 327221]      * |    190.74
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> head(gr2)
GRanges object with 6 ranges and 4 metadata columns:
           seqnames           ranges strand |   source     score    frame     group
              <Rle>        <IRanges>  <Rle> | <factor> <integer> <factor>  <factor>
  region_0     chr1 [713893, 714747]      + |  bed2gff         0        . region_0;
  region_1     chr1 [715023, 715578]      + |  bed2gff         0        . region_1;
  region_2     chr1 [724851, 725445]      + |  bed2gff         0        . region_2;
  region_3     chr1 [839467, 840090]      + |  bed2gff         0        . region_3;
  region_4     chr1 [856361, 856999]      + |  bed2gff         0        . region_4;
  region_5     chr1 [859315, 859903]      + |  bed2gff         0        . region_5;
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> head(peaklist[[3]])
GRanges object with 6 ranges and 1 metadata column:
      seqnames           ranges strand |                                     peakNames
         <Rle>        <IRanges>  <Rle> |                               <CharacterList>
  [1]     chr1 [713791, 715578]      * | gr1__MACS_peak_13,gr2__region_0,gr2__region_1
  [2]     chr1 [724851, 727191]      * |               gr2__region_2,gr1__MACS_peak_14
  [3]     chr1 [839467, 840090]      * |               gr1__MACS_peak_16,gr2__region_3
  [4]     chr1 [856361, 856999]      * |               gr1__MACS_peak_17,gr2__region_4
  [5]     chr1 [859315, 860144]      * |               gr2__region_5,gr1__MACS_peak_18
  [6]     chr1 [870970, 871568]      * |               gr2__region_7,gr1__MACS_peak_19
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> ol <- findOverlapsOfPeaks(gr1, gr2)
> peaklist <- ol$peaklist
> names(peaklist)
[1] "gr2"       "gr1"       "gr1///gr2"
> head(peaklist[[1]])
GRanges object with 6 ranges and 1 metadata column:
      seqnames             ranges strand |       peakNames
         <Rle>          <IRanges>  <Rle> | <CharacterList>
  [1]     chr1 [ 860248,  860833]      + |   gr2__region_6
  [2]     chr1 [ 905647,  906230]      + |  gr2__region_12
  [3]     chr1 [ 908528,  909096]      + |  gr2__region_13
  [4]     chr1 [ 918145,  918733]      + |  gr2__region_16
  [5]     chr1 [ 986902,  987370]      + |  gr2__region_23
  [6]     chr1 [1004125, 1004714]      + |  gr2__region_26
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

ADD COMMENTlink written 4.4 years ago by Ou, Jianhong1.1k

Hi,
thanks for your reply. I found out why it did not keep the names. I only had them as a metadata column gr1$names not as names(gr1).
I changed this and now it works.
However, any suggestions how to keep the metadata?

ADD REPLYlink written 4.4 years ago by da.de10

Hi,

To keep metadata is difficult. When I develop the package, I should consider about the metadata format. If all the peaks keep same structure of metadata, it is much easy. However, not every time it will work like that. When then peaks contain different metadata, I could not suppose the same column name indicates same info. Even more, if the metadata is contain complex data such as list, it will be time consuming to merge them.

Here, once you got the peakNames as CharacterList, you can extract the metadata as you want. My codes may be

peakNames <- peaklist[[3]]$peakNames
peakNames <- lapply(peakNames, function(.ele) as.data.frame(do.call(rbind, strsplit(.ele, "__"))))
names(peakNames) <- paste("ol", formatC(1:length(peakNames), flag="0", width=nchar(length(peakNames))), sep="")
peakNamesGP <- do.call(rbind, peakNames)
gp1 <- peakNamesGP[peakNamesGP$V1=="gr1",]
gp1.grs <- gr1[gp1$V2]
gp1.gps <- gsub("\\.\\d+$", "", rownames(gp1))
 
ADD REPLYlink written 4.4 years ago by Ou, Jianhong1.1k

Hi,

Regarding on your data table, in particular, the score column, how can I get correct pvalue format on it, and add it as new slot on GRanges objects? Thanks

Best regards

Jurat

 

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Jurat Shahidin70

Maybe you can have a try the following code to understand how to add original metadata into the overlapping peaks:

library(ChIPpeakAnno)
path <- system.file("extdata", package="ChIPpeakAnno")
files <- dir(path, "broadPeak")
data <- sapply(file.path(path, files), toGRanges, format="broadPeak")
names(data) <- gsub(".broadPeak", "", files)
ol <- findOverlapsOfPeaks(data)
olpeaks <- ol$peaklist[[length(ol$peaklist)]]
data.GR <- unlist(GRangesList(data), use.names=TRUE)
olpeaks$score <- sapply(olpeaks$peakNames, function(.ele) data.GR[gsub("__", ".", .ele)]$score)
head(olpeaks$score)
ADD REPLYlink written 3.6 years ago by Ou, Jianhong1.1k
Answer: ChIPpeakAnno: findOverlapsOfPeaks keep GRanges metadata
0
gravatar for Hari Easwaran
4.3 years ago by
Hari Easwaran240
United States
Hari Easwaran240 wrote:

I am facing a weird problem with ChIPpeakAnno. I am  to able to fine the function findOverlapsOfPeaks. I get the following:

> findOverlapsOfPeaks
Error: object 'findOverlapsOfPeaks' not found

However, other functions like annotatePeakInBatch can be found.

Any idea what could be wrong. Thanks for your help.

 

> R.Version()
$platform
[1] "x86_64-apple-darwin10.8.0"

$arch
[1] "x86_64"

$os
[1] "darwin10.8.0"

$system
[1] "x86_64, darwin10.8.0"

$status
[1] ""

$major
[1] "3"

$minor
[1] "1.3"

$year
[1] "2015"

$month
[1] "03"

$day
[1] "09"

$`svn rev`
[1] "67962"

$language
[1] "R"

$version.string
[1] "R version 3.1.3 (2015-03-09)"

$nickname
[1] "Smooth Sidewalk"
ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Hari Easwaran240
Answer: ChIPpeakAnno: findOverlapsOfPeaks keep GRanges metadata
0
gravatar for Hari Easwaran
4.3 years ago by
Hari Easwaran240
United States
Hari Easwaran240 wrote:

Hi Jianhong, Thanks for your message. It works now. Earlier I had a problem with R3.2 in that it could not find BiocParallel, and I could not install it. Had to restart R and reinstall Bioconductor, ChIPpeakAnno, BiocParallel.... it works now.

Thanks.

 

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Hari Easwaran240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 321 users visited in the last hour