Question: ChIPseeker question: enrichPeakOverlap to find enrichment of genomic annotations in peaks from GRanges
0
gravatar for francesca casalino
3.6 years ago by
United States
francesca casalino50 wrote:

Hi,

I am using ChIPseeker to find the enrichment of genomic annotations in several Chip peak experiments. I would like to specify my own TxDB and I have used GenomicFeatures to create this (function: makeTxDbFromGRanges). Also, I have used readPeakFile to read all my peaks and save them in a list as the Vignette shows. 

Everything seems to work fine until I try to find enrichment of different genomic annotations within the peaks. I get the following error when I try the command copied below, so it seems as the txdb is not specified correctly (I am trying both the file path for the database and the txdb that I had created). I tried to specify the path both as a file where the sqlite is saved and after loading it using loadDb, but neither works. What am I doing wrong? txdb worked fine when I used it in annotatePeak, so I don't know why it is not working here...

Also, the Vignette shows the use of GRanges directly for other functions in this package (so for example peaks as a list of GRange object), can I also specify GRanges directly in this function too, instead of BED files for this function? 

Thank you.

 

     enrichPeakOverlap(queryPeak=genomicAnnotationsBed, targetPeak=pathBedPeaks,TxDB=txdb, pAdjustMethod = "BH", nShuffle =1000, chainFile= NULL, verbose = FALSE)

# Error using the self-specified txdb:

Error in enrichOverlap.peak.internalquery.gr, target.gr, TxDb, nShuffle,  : 

  unused argument (TxDB = <S4 object of class "TxDb">)

In addition: Warning message:

In loadTxDb(TxDb) :

  >> TxDb is not specified, use 'TxDb.Hsapiens.UCSC.hg19.knownGene' by default...

 

# Error when trying the sqlite path for the txdb:

Error in enrichOverlap.peak.internalquery.gr, target.gr, TxDb, nShuffle,  : 

  unused argument (TxDB = "txdb.sqlite")

In addition: Warning message:

In loadTxDb(TxDb) :

  >> TxDb is not specified, use 'TxDb.Hsapiens.UCSC.hg19.knownGene' by default...

 

 

> sessionInfo()  

R version 3.1.2 (2014-10-31)

Platform: x86_64-unknown-linux-gnu (64-bit)

 

locale:

[1] C

 

attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets 

[8] methods   base     

 

other attached packages:

 [1] XVector_0.6.0          ChIPseeker_1.5.11      GenomicFeatures_1.18.7

 [4] AnnotationDbi_1.28.2   Biobase_2.28.0         GenomicRanges_1.18.4  

 [7] GenomeInfoDb_1.2.5     IRanges_2.0.1          S4Vectors_0.4.0       

[10] BiocGenerics_0.12.1   

 

loaded via a namespace (and not attached):

 [1] BBmisc_1.9                             

 [2] BatchJobs_1.6                          

 [3] BiocParallel_1.0.3                     

 [4] Biostrings_2.34.1                      

 [5] DBI_0.3.1                              

 [6] GenomicAlignments_1.2.2                

 [7] KernSmooth_2.23-13                     

 [8] MASS_7.3-35                            

 [9] R6_2.1.1                               

[10] RColorBrewer_1.1-2                     

[11] RCurl_1.95-4.7                         

[12] RSQLite_1.0.0                          

[13] Rcpp_0.12.1                            

[14] Rsamtools_1.18.3                       

[15] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0

[16] UpSetR_0.0.5                           

[17] XML_3.98-1.3                           

[18] assertthat_0.1                         

[19] base64enc_0.1-3                        

[20] biomaRt_2.24.1                         

[21] bitops_1.0-6                           

[22] boot_1.3-13                            

[23] brew_1.0-6                             

[24] caTools_1.17.1                         

[25] checkmate_1.6.2                        

[26] codetools_0.2-9                        

[27] colorspace_1.2-6                       

[28] digest_0.6.8                           

[29] dplyr_0.4.3                            

[30] fail_1.3                               

[31] foreach_1.4.2                          

[32] gdata_2.17.0                           

[33] ggplot2_1.0.1                          

[34] gplots_2.17.0                          

[35] grid_3.1.2                             

[36] gridBase_0.4-7                         

[37] gridExtra_2.0.0                        

[38] gtable_0.1.2                           

[39] gtools_3.5.0                           

[40] iterators_1.0.7                        

[41] magrittr_1.5                           

[42] munsell_0.4.2                          

[43] plotrix_3.5-12                         

[44] plyr_1.8.3                             

[45] proto_0.3-10                           

[46] reshape2_1.4.1                         

[47] rtracklayer_1.26.3                     

[48] scales_0.3.0                           

[49] sendmailR_1.2-1                        

[50] stringi_0.5-5                          

[51] stringr_1.0.0                          

[52] tools_3.1.2                            

[53] zlibbioc_1.12.0                       

chipseeker • 2.0k views
ADD COMMENTlink modified 3.6 years ago by Guangchuang Yu1.1k • written 3.6 years ago by francesca casalino50
Answer: ChIPseeker question: enrichPeakOverlap to find enrichment of genomic annotations
2
gravatar for Hervé Pagès
3.6 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi Francesca,

The error message is pretty clear:

  Error in enrichOverlap.peak.internalquery.gr, target.gr, TxDb, nShuffle,  : 

    unused argument (TxDB = <S4 object of class "TxDb">)

The enrichPeakOverlap() function has no TxDB argument. The correct spelling for this argument is TxDb.

@Guangchuang Yu: Do you really need the ellipsis at the end of your argument list? The man page for enrichPeakOverlap() only says "additional parameter" which leaves the user with no clue about what kind of additional parameter can be specified. Furthermore it seems to me that any additional parameter is actually ignored. Would you consider getting rid of the ellipsis? Then the user would get a more straightforward error in case s/he misspells an argument. Thanks!

H.

ADD COMMENTlink written 3.6 years ago by Hervé Pagès ♦♦ 13k

The ellipsis parameter is only used to specify mc.cores and whether verbose info will be printed.

I will remove the ellipsis parameter as you suggested.

 

Thanks!

Guangchuang
 

ADD REPLYlink written 3.6 years ago by Guangchuang Yu1.1k
Answer: ChIPseeker question: enrichPeakOverlap to find enrichment of genomic annotations
0
gravatar for francesca casalino
3.6 years ago by
United States
francesca casalino50 wrote:

Thank you Herve', it is working now with the BED files and the TxDb argument in the correct spelling (I only get a warning copied below which I am not sure what it means...)

But I was wondering if I could specify GRanges directly instead of having to save the BED files or a TxDb object, or whether I need to change the function to be able to use the GRanges instead of BED files.

I am asking because the annotations I use for queryPeak are GRanges and the peaks list for targetPeak is a list of GRange objects as shown in the Vignette. To use this function, I need to save the annotations I am using in the targetPeak as BED files first, so I was just wondering whether there is a simpler way I am not seeing...

And for the TxDb, I am trying to modify the functions for example of "getPromoters" to be able to apply these directly to GRange objects since also my TxDb is in Grange...

Thank you!!

 

Warning message:

In mclapply(seq_along(idx), function(j) { :

  all scheduled cores encountered errors in user code

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by francesca casalino50
1

almost all functions in ChIPseeker support both bed file name and GRanges object. Unfortunately enrichPeakOverlap() currently only support bed file since I believe people will use it to compare their own file with other bed files downloaded from GEO.

It's easy to extend it to support both BED and GRanges. This feature will be available in next release.

For getPromoters(), it use transcriptsBy(TxDb) or genes(TxDb) at transcript or gene level respectively, which is actually a GRanges object. But I will not extend it to support GRanges. TxDb contains more information needed for ChIP annotation and all functions in ChIPseeker is consistently support TxDb object. You can just apply makeTxDbFromGRanges() function to convert it as a TxDb object.

 

For the warning msg, ChIPseeker internally use mclapply for parallel, but unfortunately sqlite doesn't support parallel. I think you should specify mc.cores=1 to disable the parallel. I still figuring how to parallel enrichPeakOverlap in a safe way.

 

Guangchuang

 

 

 

 

 

 

ADD REPLYlink written 3.6 years ago by Guangchuang Yu1.1k
Answer: ChIPseeker question: enrichPeakOverlap to find enrichment of genomic annotations
0
gravatar for francesca casalino
3.6 years ago by
United States
francesca casalino50 wrote:

Thank you very much!

 

My reasoning for asking about getPromoters is because when I use this function after subsetting my GRange object to only include exons and converting to TxDb (using the function you suggest), and then apply getTagMatrix using the result of getPromoters, I get an error, is it because I need to split the GRange object first by something before creating the TxDb?

Thanks again very much for all the help, and for the brilliant package.

 

getTagMatrix(peak, windows=promoter)

 

Error in split.default(1:length(windows), as.factor(seqnames(windows))) : 

  group length is 0 but data length > 0

In addition: Warning message:

In .Seqinfo.mergexy(x, y) :

  The 2 combined objects have no sequence levels in common. (Use

 

  suppressWarnings() to suppress this warning.)

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by francesca casalino50

you should provide an example to reproduce this error for debugging what's going on.

ADD REPLYlink written 3.6 years ago by Guangchuang Yu1.1k
Answer: ChIPseeker question: enrichPeakOverlap to find enrichment of genomic annotations
0
gravatar for francesca casalino
3.6 years ago by
United States
francesca casalino50 wrote:

Yes sorry, I will try to explain better what I am trying to do with the code, I think my error comes from converting the incorrect TxDb from GenomicFeatures that needs to be used to getPromoter function...

    library("GenomicFeatures")

    # Download the version 19 gencode gtf file from gencode website (ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz)

    gencode = makeTxDbFromGFF("path/gtf/gencode.gtf")
    # Select exons
    exons.gencode = exons(gencode)

    # Add a "type" so that can make TxDb
    exons.gencode$type="exon_id"

    txdb=makeTxDbFromGRanges(exons.gencode)

    saveDb(gencode, file="exon.gencode_human_v19.sqlite")


    # Now use ChIPseeker to find promoters and overlap with peaks
    library(ChIPseeker)
    files <- getSampleFiles()
    peak <- readPeakFile(files[[4]])
    tagMatrix <- tagMatrixList[[4]]


    txdb = loadDb("exon.gencode_human_v19.sqlite")
   promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)
   tagMatrix <- getTagMatrix(peak, windows=promoter)

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by francesca casalino50

As I mentioned above, getPromoters internally call transcriptsBy(txdb) or genes(txdb).

If you check with your exon txdb, you will find that transcriptsBy() and genes()  return nothing.

 

> genes(txdb)
GRanges object with 0 ranges and 1 metadata column:
   seqnames    ranges strand |     gene_id
      <Rle> <IRanges>  <Rle> | <character>
  -------
  seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths
> transcriptsBy(txdb)
GRangesList object of length 0:
<0 elements>

-------
seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths

 

Even exons() can't return anything.
> exons(txdb)
GRanges object with 0 ranges and 1 metadata column:
   seqnames    ranges strand |   exon_id
      <Rle> <IRanges>  <Rle> | <integer>
  -------
  seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths

 

This is why getTagMatrix throw error of 'length of 0'.

 

With the gencode which build from gtf file, it all works fine. You should use this gencode one. I can modify the getPromoters() to accept 'by="exon"' to specify using exon instead of gene/transcript, so that you can use

 

   promoter <- getPromoters(TxDb=gencode, upstream=3000, downstream=3000, by="exon")
   tagMatrix <- getTagMatrix(peak, windows=promoter)

 

to get the tagMatrix at exon level.

 

As new release is approaching and BioC is now feature freeze, I will update the source code in devel branch after Bioc 3.2 release.

 

 

 

 

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Guangchuang Yu1.1k
Answer: ChIPseeker question: enrichPeakOverlap to find enrichment of genomic annotations
0
gravatar for francesca casalino
3.6 years ago by
United States
francesca casalino50 wrote:

Oh I see, sorry... it makes sense now. 

Thank you so much for your help and the quick replies!

I look forward to extensive use of this package!

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by francesca casalino50
Answer: ChIPseeker question: enrichPeakOverlap to find enrichment of genomic annotations
0
gravatar for Guangchuang Yu
3.6 years ago by
Guangchuang Yu1.1k
China/Guangzhou/Southern Medical University
Guangchuang Yu1.1k wrote:

centering all peaks to the start region of exon is now supported, see https://github.com/GuangchuangYu/ChIPseeker/issues/16

ADD COMMENTlink written 3.6 years ago by Guangchuang Yu1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 109 users visited in the last hour