Question

ChIPseeker question: enrichPeakOverlap to find enrichment of genomic annotations in peaks from GRanges

0

Entering edit mode

francesca casalino ▴ 50

@francesca-casalino-4984

Last seen 12 months ago

United States

Hi,

I am using ChIPseeker to find the enrichment of genomic annotations in several Chip peak experiments. I would like to specify my own TxDB and I have used GenomicFeatures to create this (function: makeTxDbFromGRanges). Also, I have used readPeakFile to read all my peaks and save them in a list as the Vignette shows.

Everything seems to work fine until I try to find enrichment of different genomic annotations within the peaks. I get the following error when I try the command copied below, so it seems as the txdb is not specified correctly (I am trying both the file path for the database and the txdb that I had created). I tried to specify the path both as a file where the sqlite is saved and after loading it using loadDb, but neither works. What am I doing wrong? txdb worked fine when I used it in annotatePeak, so I don't know why it is not working here...

Also, the Vignette shows the use of GRanges directly for other functions in this package (so for example peaks as a list of GRange object), can I also specify GRanges directly in this function too, instead of BED files for this function?

Thank you.

enrichPeakOverlap(queryPeak=genomicAnnotationsBed, targetPeak=pathBedPeaks,TxDB=txdb, pAdjustMethod = "BH", nShuffle =1000, chainFile= NULL, verbose = FALSE)

# Error using the self-specified txdb:

Error in enrichOverlap.peak.internalquery.gr, target.gr, TxDb, nShuffle, :

unused argument (TxDB = <S4 object of class "TxDb">)

In addition: Warning message:

In loadTxDb(TxDb) :

>> TxDb is not specified, use 'TxDb.Hsapiens.UCSC.hg19.knownGene' by default...

# Error when trying the sqlite path for the txdb:

Error in enrichOverlap.peak.internalquery.gr, target.gr, TxDb, nShuffle, :

unused argument (TxDB = "txdb.sqlite")

In addition: Warning message:

In loadTxDb(TxDb) :

>> TxDb is not specified, use 'TxDb.Hsapiens.UCSC.hg19.knownGene' by default...

> sessionInfo()

R version 3.1.2 (2014-10-31)

Platform: x86_64-unknown-linux-gnu (64-bit)

locale:

[1] C

attached base packages:

[1] stats4 parallel stats graphics grDevices utils datasets

[8] methods base

other attached packages:

[1] XVector_0.6.0 ChIPseeker_1.5.11 GenomicFeatures_1.18.7

[4] AnnotationDbi_1.28.2 Biobase_2.28.0 GenomicRanges_1.18.4

[7] GenomeInfoDb_1.2.5 IRanges_2.0.1 S4Vectors_0.4.0

[10] BiocGenerics_0.12.1

loaded via a namespace (and not attached):

[1] BBmisc_1.9

[2] BatchJobs_1.6

[3] BiocParallel_1.0.3

[4] Biostrings_2.34.1

[5] DBI_0.3.1

[6] GenomicAlignments_1.2.2

[7] KernSmooth_2.23-13

[8] MASS_7.3-35

[9] R6_2.1.1

[10] RColorBrewer_1.1-2

[11] RCurl_1.95-4.7

[12] RSQLite_1.0.0

[13] Rcpp_0.12.1

[14] Rsamtools_1.18.3

[15] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0

[16] UpSetR_0.0.5

[17] XML_3.98-1.3

[18] assertthat_0.1

[19] base64enc_0.1-3

[20] biomaRt_2.24.1

[21] bitops_1.0-6

[22] boot_1.3-13

[23] brew_1.0-6

[24] caTools_1.17.1

[25] checkmate_1.6.2

[26] codetools_0.2-9

[27] colorspace_1.2-6

[28] digest_0.6.8

[29] dplyr_0.4.3

[30] fail_1.3

[31] foreach_1.4.2

[32] gdata_2.17.0

[33] ggplot2_1.0.1

[34] gplots_2.17.0

[35] grid_3.1.2

[36] gridBase_0.4-7

[37] gridExtra_2.0.0

[38] gtable_0.1.2

[39] gtools_3.5.0

[40] iterators_1.0.7

[41] magrittr_1.5

[42] munsell_0.4.2

[43] plotrix_3.5-12

[44] plyr_1.8.3

[45] proto_0.3-10

[46] reshape2_1.4.1

[47] rtracklayer_1.26.3

[48] scales_0.3.0

[49] sendmailR_1.2-1

[50] stringi_0.5-5

[51] stringr_1.0.0

[52] tools_3.1.2

[53] zlibbioc_1.12.0

chipseeker • 4.5k views

ADD COMMENT • link updated 8.5 years ago by Guangchuang Yu ★ 1.2k • written 8.6 years ago by francesca casalino ▴ 50

score 2 · Answer 1 · 2015-10-07

Hi Francesca,

The error message is pretty clear:

  Error in enrichOverlap.peak.internalquery.gr, target.gr, TxDb, nShuffle,  : 

    unused argument (TxDB = <S4 object of class "TxDb">)

The enrichPeakOverlap() function has no TxDB argument. The correct spelling for this argument is TxDb.

@Guangchuang Yu: Do you really need the ellipsis at the end of your argument list? The man page for enrichPeakOverlap() only says "additional parameter" which leaves the user with no clue about what kind of additional parameter can be specified. Furthermore it seems to me that any additional parameter is actually ignored. Would you consider getting rid of the ellipsis? Then the user would get a more straightforward error in case s/he misspells an argument. Thanks!

H.

score 0 · Answer 2 · 2015-10-07

0

Entering edit mode

francesca casalino ▴ 50

@francesca-casalino-4984

Last seen 12 months ago

United States

Thank you Herve', it is working now with the BED files and the TxDb argument in the correct spelling (I only get a warning copied below which I am not sure what it means...)

But I was wondering if I could specify GRanges directly instead of having to save the BED files or a TxDb object, or whether I need to change the function to be able to use the GRanges instead of BED files.

I am asking because the annotations I use for queryPeak are GRanges and the peaks list for targetPeak is a list of GRange objects as shown in the Vignette. To use this function, I need to save the annotations I am using in the targetPeak as BED files first, so I was just wondering whether there is a simpler way I am not seeing...

And for the TxDb, I am trying to modify the functions for example of "getPromoters" to be able to apply these directly to GRange objects since also my TxDb is in Grange...

Thank you!!

Warning message:

In mclapply(seq_along(idx), function(j) { :

all scheduled cores encountered errors in user code

ADD COMMENT • link 8.6 years ago francesca casalino ▴ 50

1

Entering edit mode

almost all functions in ChIPseeker support both bed file name and GRanges object. Unfortunately enrichPeakOverlap() currently only support bed file since I believe people will use it to compare their own file with other bed files downloaded from GEO.

It's easy to extend it to support both BED and GRanges. This feature will be available in next release.

For getPromoters(), it use transcriptsBy(TxDb) or genes(TxDb) at transcript or gene level respectively, which is actually a GRanges object. But I will not extend it to support GRanges. TxDb contains more information needed for ChIP annotation and all functions in ChIPseeker is consistently support TxDb object. You can just apply makeTxDbFromGRanges() function to convert it as a TxDb object.

For the warning msg, ChIPseeker internally use mclapply for parallel, but unfortunately sqlite doesn't support parallel. I think you should specify mc.cores=1 to disable the parallel. I still figuring how to parallel enrichPeakOverlap in a safe way.

Guangchuang

ADD REPLY • link 8.6 years ago Guangchuang Yu ★ 1.2k

score 0 · Answer 3 · 2015-10-08

Thank you very much!

My reasoning for asking about getPromoters is because when I use this function after subsetting my GRange object to only include exons and converting to TxDb (using the function you suggest), and then apply getTagMatrix using the result of getPromoters, I get an error, is it because I need to split the GRange object first by something before creating the TxDb?

Thanks again very much for all the help, and for the brilliant package.

getTagMatrix(peak, windows=promoter)

Error in split.default(1:length(windows), as.factor(seqnames(windows))) :

group length is 0 but data length > 0

In addition: Warning message:

In .Seqinfo.mergexy(x, y) :

The 2 combined objects have no sequence levels in common. (Use

suppressWarnings() to suppress this warning.)

score 0 · Answer 4 · 2015-10-09

0

Entering edit mode

francesca casalino ▴ 50

@francesca-casalino-4984

Last seen 12 months ago

United States

Yes sorry, I will try to explain better what I am trying to do with the code, I think my error comes from converting the incorrect TxDb from GenomicFeatures that needs to be used to getPromoter function...

library("GenomicFeatures")

# Download the version 19 gencode gtf file from gencode website (ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz)

gencode = makeTxDbFromGFF("path/gtf/gencode.gtf")
# Select exons
exons.gencode = exons(gencode)

# Add a "type" so that can make TxDb
exons.gencode$type="exon_id"

txdb=makeTxDbFromGRanges(exons.gencode)

saveDb(gencode, file="exon.gencode_human_v19.sqlite")

# Now use ChIPseeker to find promoters and overlap with peaks
library(ChIPseeker)
files <- getSampleFiles()
peak <- readPeakFile(files[[4]])
tagMatrix <- tagMatrixList[[4]]

txdb = loadDb("exon.gencode_human_v19.sqlite")
promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)
tagMatrix <- getTagMatrix(peak, windows=promoter)

ADD COMMENT • link 8.6 years ago francesca casalino ▴ 50

0

Entering edit mode

As I mentioned above, getPromoters internally call transcriptsBy(txdb) or genes(txdb).

If you check with your exon txdb, you will find that transcriptsBy() and genes() return nothing.

> genes(txdb)
GRanges object with 0 ranges and 1 metadata column:
seqnames ranges strand | gene_id
<Rle> <IRanges> <Rle> | <character>
-------
seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths
> transcriptsBy(txdb)
GRangesList object of length 0:
<0 elements>

-------
seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths

Even exons() can't return anything.
> exons(txdb)
GRanges object with 0 ranges and 1 metadata column:
seqnames ranges strand | exon_id
<Rle> <IRanges> <Rle> | <integer>
-------
seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths

This is why getTagMatrix throw error of 'length of 0'.

With the gencode which build from gtf file, it all works fine. You should use this gencode one. I can modify the getPromoters() to accept 'by="exon"' to specify using exon instead of gene/transcript, so that you can use

promoter <- getPromoters(TxDb=gencode, upstream=3000, downstream=3000, by="exon")
tagMatrix <- getTagMatrix(peak, windows=promoter)

to get the tagMatrix at exon level.

As new release is approaching and BioC is now feature freeze, I will update the source code in devel branch after Bioc 3.2 release.

ADD REPLY • link 8.5 years ago Guangchuang Yu ★ 1.2k

score 0 · Answer 5 · 2015-10-11

0

Entering edit mode

francesca casalino ▴ 50

@francesca-casalino-4984

Last seen 12 months ago

United States

Oh I see, sorry... it makes sense now.

Thank you so much for your help and the quick replies!

I look forward to extensive use of this package!

ADD COMMENT • link 8.5 years ago francesca casalino ▴ 50

score 0 · Answer 6 · 2015-10-19

0

Entering edit mode

Guangchuang Yu ★ 1.2k

@guangchuang-yu-5419

Last seen 1 day ago

China/Guangzhou/Southern Medical Univer…

centering all peaks to the start region of exon is now supported, see https://github.com/GuangchuangYu/ChIPseeker/issues/16

ADD COMMENT • link 8.5 years ago Guangchuang Yu ★ 1.2k