ChIPQC gives Error subscript contains out-of-bounds ranges with custom transcript database
1
0
Entering edit mode
tofukaj • 0
@tofukaj-9928
Last seen 4.5 years ago

Dear Great Helpers,

I've tried 'ChIPQC' package with chip-seq data of the cow. I've created the annotation TxDb from 'gtf' file acquired from ensembl. The codes and results are as following:

   R
    library(GenomicRanges)
    library(GenomicFeatures)
    library(ChIPQC)
    
    # Create TxDb annotation
    btaurus_txdb <- makeTxDbFromGFF('Bos_taurus.UMD3.1.89.gtf.gz')     
    
    # Determine locations of bam and bed files
    btaurus_txdb <- loadDb(file="btaurus_txdb.sqlite")
    bamFile <- file.path(maindir, "macs2/bowtie2/SRR1977480.sorted.bam")
    bedFile <- file.path(maindir, "macs2/bowtie2/SRRSRR1977480K4peak_summits.bed")
    
    # ChIPQC analysis
    exampleExp <- ChIPQCsample(bamFile, peaks=bedFile, annotation=list(btaurus_txdb), chromosomes=NULL)

By the script provided above, I've got the results and errors as following:

   Removing 2 chromosomes with length less than cross-coverage shift
    Bam file has 3315 contigs
    Calculating coverage histogram for 1
    
    Calculating SSD for 1
    
    Calculating unique positions per strand for 1
    
    Calculating shift for 1
    
     300 / 300
    Counting reads in features for 1
    
    Signal over peaks for 1
    
    done
    Calculating coverage
    Calculating Summits on  1  ..Calculating coverage histogram for 10
    
    Calculating SSD for 10
    
    Calculating unique positions per strand for 10
    
    Calculating shift for 10
    
     300 / 300
    Counting reads in features for 10
    
    Signal over peaks for 10
    
    done
    Calculating coverage
    Calculating Summits on  10  ..Calculating coverage histogram for 11
    
    Calculating SSD for 11
    
    Calculating unique positions per strand for 11
    
    Calculating shift for 11
    
     300 / 300
    Counting reads in features for 11
    
    Signal over peaks for 11
    
    done
    Calculating coverage
    Calculating Summits on  11  ..Error: subscript contains out-of-bounds ranges
    In addition: Warning messages:
    1: In BioC 3.5, the 'force' argument was replaced by the more flexible
      'pruning.mode' argument, and is deprecated. See ?seqinfo for the
      supported pruning modes. Note that 'force=TRUE' is equivalent to
      'pruning.mode="coarse"'. 
    2: In BioC 3.5, the 'force' argument was replaced by the more flexible
      'pruning.mode' argument, and is deprecated. See ?seqinfo for the
      supported pruning modes. Note that 'force=TRUE' is equivalent to
      'pruning.mode="coarse"'. 
    3: In BioC 3.5, the 'force' argument was replaced by the more flexible
      'pruning.mode' argument, and is deprecated. See ?seqinfo for the
      supported pruning modes. Note that 'force=TRUE' is equivalent to
      'pruning.mode="coarse"'. 
    4: In BioC 3.5, the 'force' argument was replaced by the more flexible
      'pruning.mode' argument, and is deprecated. See ?seqinfo for the
      supported pruning modes. Note that 'force=TRUE' is equivalent to
      'pruning.mode="coarse"'.

Would you please help me figure out what is happening and how to fix it properly? I must confess that I'm just the starter in this field with very poor experience. If my question is not properly managed, I would like to apologize you all in advance.

Best Regards,
Kaj

    sessionInfo()
    R version 3.4.0 (2017-04-21)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Ubuntu 16.04.2 LTS
    
    Matrix products: default
    BLAS: /usr/lib/libblas/libblas.so.3.6.0
    LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
    
    locale:
     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
     [3] LC_TIME=th_TH.UTF-8        LC_COLLATE=en_US.UTF-8    
     [5] LC_MONETARY=th_TH.UTF-8    LC_MESSAGES=en_US.UTF-8   
     [7] LC_PAPER=th_TH.UTF-8       LC_NAME=C                 
     [9] LC_ADDRESS=C               LC_TELEPHONE=C            
    [11] LC_MEASUREMENT=th_TH.UTF-8 LC_IDENTIFICATION=C       
    
    attached base packages:
    [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
    [8] methods   base     
    
    other attached packages:
     [1] rtracklayer_1.36.2         ChIPQC_1.12.0             
     [3] DiffBind_2.4.2             SummarizedExperiment_1.6.1
     [5] DelayedArray_0.2.4         matrixStats_0.52.2        
     [7] ggplot2_2.2.1              GenomicFeatures_1.28.0    
     [9] AnnotationDbi_1.38.0       Biobase_2.36.2            
    [11] GenomicRanges_1.28.2       GenomeInfoDb_1.12.0       
    [13] IRanges_2.10.1             S4Vectors_0.14.1          
    [15] BiocGenerics_0.22.0
chipqc • 2.3k views
ADD COMMENT
1
Entering edit mode
@thomas-carroll-7019
Last seen 2.1 years ago
United States/New York/The Rockefeller …
hi Kaj, I think there is too problems here actually. As of yet, ChIPQC does not accept TxDB directly, please see this thread to create your own annotation. creating custom genome annotation for the ChIPQC package. <https: support.bioconductor.org="" p="" 70893=""/>Not entirely sure how your version is working but will test this out. We have the code to use TxDB directly and will include in the next version. Hopefully the instructions in thread will explain. The second problem i think is unrelated to the warnings but in summit identification. This looks too have occurred when ChIP is looking for peak summits but here you are providing the MACs output for peak summits directly which could have caused a problem. Does the problem occur when using the actual peaks.bed file? Could you share the *summits.bed and *peaks.bed with me privately at tc.infomatics@gmail.com and i can take look and try and recreate the issue. best, tom On Thu, Jun 1, 2017 at 6:50 PM, tofukaj [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User tofukaj <https: support.bioconductor.org="" u="" 9928=""/> wrote Question: > ChIPQC gives Error subscript contains out-of-bounds ranges with custom > transcript database <https: support.bioconductor.org="" p="" 96555=""/>: > > Dear Great Helpers, > > I've tried 'ChIPQC' package with chip-seq data of the cow. I've created > the annotation TxDb from 'gtf' file acquired from ensembl. The codes and > results are as following: > > R > library(GenomicRanges) > library(GenomicFeatures) > library(ChIPQC) > > # Create TxDb annotation > btaurus_txdb <- makeTxDbFromGFF('Bos_taurus.UMD3.1.89.gtf.gz') > > # Determine locations of bam and bed files > btaurus_txdb <- loadDb(file="btaurus_txdb.sqlite") > bamFile <- file.path(maindir, "macs2/bowtie2/SRR1977480.sorted.bam") > bedFile <- file.path(maindir, "macs2/bowtie2/SRRSRR1977480K4peak_summits.bed") > > # ChIPQC analysis > exampleExp <- ChIPQCsample(bamFile, peaks=bedFile, annotation=list(btaurus_txdb), chromosomes=NULL) > > By the script provided above, I've got the results and errors as following: > > Removing 2 chromosomes with length less than cross-coverage shift > Bam file has 3315 contigs > Calculating coverage histogram for 1 > > Calculating SSD for 1 > > Calculating unique positions per strand for 1 > > Calculating shift for 1 > > 300 / 300 > Counting reads in features for 1 > > Signal over peaks for 1 > > done > Calculating coverage > Calculating Summits on 1 ..Calculating coverage histogram for 10 > > Calculating SSD for 10 > > Calculating unique positions per strand for 10 > > Calculating shift for 10 > > 300 / 300 > Counting reads in features for 10 > > Signal over peaks for 10 > > done > Calculating coverage > Calculating Summits on 10 ..Calculating coverage histogram for 11 > > Calculating SSD for 11 > > Calculating unique positions per strand for 11 > > Calculating shift for 11 > > 300 / 300 > Counting reads in features for 11 > > Signal over peaks for 11 > > done > Calculating coverage > Calculating Summits on 11 ..Error: subscript contains out-of-bounds ranges > In addition: Warning messages: > 1: In BioC 3.5, the 'force' argument was replaced by the more flexible > 'pruning.mode' argument, and is deprecated. See ?seqinfo for the > supported pruning modes. Note that 'force=TRUE' is equivalent to > 'pruning.mode="coarse"'. > 2: In BioC 3.5, the 'force' argument was replaced by the more flexible > 'pruning.mode' argument, and is deprecated. See ?seqinfo for the > supported pruning modes. Note that 'force=TRUE' is equivalent to > 'pruning.mode="coarse"'. > 3: In BioC 3.5, the 'force' argument was replaced by the more flexible > 'pruning.mode' argument, and is deprecated. See ?seqinfo for the > supported pruning modes. Note that 'force=TRUE' is equivalent to > 'pruning.mode="coarse"'. > 4: In BioC 3.5, the 'force' argument was replaced by the more flexible > 'pruning.mode' argument, and is deprecated. See ?seqinfo for the > supported pruning modes. Note that 'force=TRUE' is equivalent to > 'pruning.mode="coarse"'. > > Would you please help me figure out what is happening and how to fix it > properly? I must confess that I'm just the starter in this field with very > poor experience. If my question is not properly managed, I would like to > apologize you all in advance. > > Best Regards, > Kaj > > sessionInfo() > R version 3.4.0 (2017-04-21) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 16.04.2 LTS > > Matrix products: default > BLAS: /usr/lib/libblas/libblas.so.3.6.0 > LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=th_TH.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=th_TH.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=th_TH.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=th_TH.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats4 stats graphics grDevices utils datasets > [8] methods base > > other attached packages: > [1] rtracklayer_1.36.2 ChIPQC_1.12.0 > [3] DiffBind_2.4.2 SummarizedExperiment_1.6.1 > [5] DelayedArray_0.2.4 matrixStats_0.52.2 > [7] ggplot2_2.2.1 GenomicFeatures_1.28.0 > [9] AnnotationDbi_1.38.0 Biobase_2.36.2 > [11] GenomicRanges_1.28.2 GenomeInfoDb_1.12.0 > [13] IRanges_2.10.1 S4Vectors_0.14.1 > [15] BiocGenerics_0.22.0 > > ------------------------------ > > Post tags: chipqc > > You may reply via email or visit ChIPQC gives Error subscript contains out-of-bounds ranges with custom transcript database >
ADD COMMENT
0
Entering edit mode

Dear Thomus,

Thank you very much for your quick response. Since I'm an extreme newbie here, I really have no idea when you talk about peak.bed. All I have from MACS2 result is *.narrowpeak file. Here I provide you the link to all files I acquired from MACS.

https://drive.google.com/file/d/0B4_V0v1OpPRKNThwZDVJSGZ5NW8/view?usp=sharing
https://drive.google.com/file/d/0B4_V0v1OpPRKVlBKTkIxMDR1VlE/view?usp=sharing
https://drive.google.com/file/d/0B4_V0v1OpPRKUWlXTXlUTkhibHM/view?usp=sharing
https://drive.google.com/file/d/0B4_V0v1OpPRKMFBKTm5jUVVkcXc/view?usp=sharing

I have read 'Specifying custom annotation' topic from http://bioconductor.org/help/course-materials/2014/BioC2014/Bioc2014_ChIPQC_Practical.pdf, but still have no idea how could I create such custom annotation from the TxDb created by GenomicFeatures. I do apologize for my poor experience, but can you provide me example to create custom annotation from TxDb.

Really grateful for your support in advance,

Kaj

 

ADD REPLY
1
Entering edit mode

hi Kaj,

Here is an example to make custom annotation for ChIPQC.

source("https://raw.githubusercontent.com/ThomasCarroll/blank/master/chipqcAnnotation.R")

library(GenomicRanges) 

library(GenomicFeatures)

library(ChIPQC) 

# Create TxDb annotation 

btaurus_anno <- ChIPQCAnnotationFromGFF3('Bos_taurus.UMD3.1.89.gtf.gz') 

bamFile <- file.path(maindir, "macs2/bowtie2/SRR1977480.sorted.bam") 

bedFile <- file.path(maindir, "macs2/bowtie2/SRRSRR1977480K4peak_summits.bed")

#ChIPQC analysis

exampleExp <- ChIPQCsample(bamFile, peaks=bedFile, annotation=btaurus_anno, chromosomes=NULL) 

best,

tom

 
ADD REPLY
0
Entering edit mode

Thank you very very very much!!!!!

ADD REPLY
0
Entering edit mode

hi Kaj,

Sorry for the delay. I will look at this tomorrow am and provide an example.

best,

tom

ADD REPLY

Login before adding your answer.

Traffic: 438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6