ChIPQC: issues with report generation and QC metrics
0
0
Entering edit mode
@ryleehackley-21854
Last seen 2.1 years ago
United States

Hello,

I am trying to calculate the Fragment Length and Relative Cross-Coverage for a sequencing experiment using a non-model microbe. DNA was sheared to ~300bp during library prep, used single-end library sequencing, and my peaks were called using Mosaics in R (fragment length 300, bin size 200). I used the information here and here to make a custom annotation list from my organisms .gff file and that appears to work.

txdb <- GenomicFeatures::makeTxDbFromGFF("20181113_hcah_GCF_000223905.1_ASM22390v1_genomic.gff", format = "gff")
txn <- GenomicFeatures::transcripts(txdb)
gene <- unlist(GenomicFeatures::cdsBy(txdb, "tx"))
pro500 <- GenomicFeatures::promoters(txdb, upstream = 500, downstream = 0)
pro250 <- GenomicFeatures::promoters(txdb, upstream = 250, downstream = 0)

hha <- list(version="",
            gff.features = txn, #all features in  gff
            genes = gene, #specifically CDS
            promoters250 = pro250, #250bp upstream of any feature
            promoters500 = pro500 #500bp upstream of any feature

However, when I run either ChIPQCsample() or ChIPQC() my output contains only Tissue, Factor, Condition, Control, Replicate, Peaks. It is not returning fragment length calculations or cross-correlation scores. Is it because my data are not paired-end? For brevity, I've only included the error information for ChIPQCsample(). The output is the same for ChIPQC(), just over my 16 samples.

###test on single file
bam <- "data/bams/4774_C5_S74_L004_R1_001_trimmed_sorted.bam"
peeks <- "data/peakcalls/HCA_trmB_gc_0glu_A.bed"
peek <- rtracklayer::import(peeks, format = "bed")
tmp2 = ChIPQCsample(reads = bam, peaks = peek, annotation = hha)

The output:

I haven't been able to identify the source of the NA integer warning. It is also present if I do not include the annotation file.

the condition has length > 1 and only the first element will be usedBam file has 3 contigs Calculating coverage histogram for NC015948.1 Calculating SSD for NC015948.1 Calculating unique positions per strand for NC015948.1 Calculating shift for NC015948.1 300 / 300 removing1peaks within 400bp of chromosome edges Counting reads in features for NC015948.1 Signal over peaks for NC015948.1 done Calculating coverage Calculating Summits on NC015948.1 ..Calculating coverage histogram for NC015943.1 Calculating SSD for NC015943.1 Calculating unique positions per strand for NC015943.1 Calculating shift for NC015943.1 300 / 300 removing1peaks within 400bp of chromosome edges Counting reads in features for NC015943.1 Signal over peaks for NC015943.1 done Calculating coverage Calculating Summits on NC015943.1 ..Calculating coverage histogram for NC015944.1 Calculating SSD for NC015944.1 Calculating unique positions per strand for NC015944.1 Calculating shift for NC015944.1 300 / 300 removing1peaks within 400bp of chromosome edges Counting reads in features for NC015944.1 Signal over peaks for NC015944.1 done Calculating coverage Calculating Summits on NC_015944.1 .. [1] 1 [1] 1 [1] 1 NAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflow.....

tmp2

ChIPQCsample Number of Mapped reads: 2167362 Number of Mapped reads passing MapQ filter: 2136201 Percentage Of Reads as Non-Duplicates (NRF): 100(0) Percentage Of Reads in Blacklisted Regions: NA SSD: 22.8916269433196 Fragment Length Cross-Coverage: Relative Cross-Coverage: Percentage Of Reads in GenomicFeature: Percentage Of Reads in Peaks: 9.28 Number of Peaks: 192

ProportionOfCounts Peaks 0.09284473
gff.features 0.89566946
genes 0.88093443
promoters250 0.28362640
promoters500 0.47769943

A separate error (perhaps a bug - others have similar issues that haven't been resolved, see below for links) is preventing the generation of a complete report. All plots are generated, but the final html is not compiled. The CC_score plot is empty.

QCmetrics(tmp2)

QCmetrics(tmp2) Error in names(res) <- c("Reads", "Map%", "Filt%", "Dup%", "ReadL", "FragL", : 'names' attribute [9] must be the same length as the vector [7]

ChIPQCreport(tmp2)

ChIPQCreport(tmp2) Saving 7 x 7 in image Error in names(res) <- c("Reads", "Map%", "Filt%", "Dup%", "ReadL", "FragL", : 'names' attribute [9] must be the same length as the vector [7]

Here is a list of others reporting the same issues: https://support.bioconductor.org/p/117577/ https://support.bioconductor.org/p/122402/ https://support.bioconductor.org/p/122763/ https://support.bioconductor.org/p/121941/ https://www.biostars.org/p/377937/

It's difficult to troubleshoot because I am not sure if these are independent issues or related. Any help would be appreciated. Some users have suggested reverting to an older version of ChIPQC, but I am having issues with dependencies for older versions of DiffBind. If that's the best solution, I'll make another post.

Session info:

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ChIPQC_1.20.0               DiffBind_2.12.0             SummarizedExperiment_1.14.1 DelayedArray_0.10.0         BiocParallel_1.18.1        
 [6] matrixStats_0.54.0          Biobase_2.44.0              GenomicRanges_1.36.1        GenomeInfoDb_1.20.0         IRanges_2.18.2             
[11] S4Vectors_0.22.0            BiocGenerics_0.30.0         forcats_0.4.0               stringr_1.4.0               dplyr_0.8.3                
[16] purrr_0.3.2                 readr_1.3.1                 tidyr_0.8.3                 tibble_2.1.3                ggplot2_3.2.1              
[21] tidyverse_1.2.1            

loaded via a namespace (and not attached):
  [1] amap_0.8-17                               colorspace_1.4-1                          rjson_0.2.20                             
  [4] hwriter_1.3.2                             XVector_0.24.0                            rstudioapi_0.10                          
  [7] ggrepel_0.8.1                             bit64_0.9-7                               AnnotationDbi_1.46.1                     
 [10] lubridate_1.7.4                           xml2_1.2.2                                splines_3.6.1                            
 [13] TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2   knitr_1.24                                zeallot_0.1.0                            
 [16] Nozzle.R1_1.1-1                           jsonlite_1.6                              Rsamtools_2.0.0                          
 [19] broom_0.5.2                               annotate_1.62.0                           GO.db_3.8.2                              
 [22] pheatmap_1.0.12                           graph_1.62.0                              TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2  
 [25] BiocManager_1.30.4                        compiler_3.6.1                            httr_1.4.1                               
 [28] GOstats_2.50.0                            backports_1.1.4                           assertthat_0.2.1                         
 [31] Matrix_1.2-17                             lazyeval_0.2.2                            TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2  
 [34] limma_3.40.6                              cli_1.1.0                                 prettyunits_1.0.2                        
 [37] tools_3.6.1                               gtable_0.3.0                              glue_1.3.1                               
 [40] GenomeInfoDbData_1.2.1                    Category_2.50.0                           reshape2_1.4.3                           
 [43] systemPipeR_1.18.2                        rappdirs_0.3.1                            batchtools_0.9.11                        
 [46] ShortRead_1.42.0                          Rcpp_1.0.2                                TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2
 [49] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2   cellranger_1.1.0                          vctrs_0.2.0                              
 [52] Biostrings_2.52.0                         gdata_2.18.0                              nlme_3.1-140                             
 [55] rtracklayer_1.44.4                        TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.7  xfun_0.9                                 
 [58] rvest_0.3.4                               gtools_3.8.1                              XML_3.98-1.20                            
 [61] edgeR_3.26.8                              zlibbioc_1.30.0                           scales_1.0.0                             
 [64] BSgenome_1.52.0                           VariantAnnotation_1.30.1                  hms_0.5.1                                
 [67] RBGL_1.60.0                               RColorBrewer_1.1-2                        yaml_2.2.0                               
 [70] memoise_1.1.0                             biomaRt_2.40.4                            latticeExtra_0.6-28                      
 [73] stringi_1.4.3                             RSQLite_2.1.2                             genefilter_1.66.0                        
 [76] checkmate_1.9.4                           GenomicFeatures_1.36.4                    caTools_1.17.1.2                         
 [79] chipseq_1.34.0                            rlang_0.4.0                               pkgconfig_2.0.2                          
 [82] bitops_1.0-6                              TxDb.Celegans.UCSC.ce6.ensGene_3.2.2      lattice_0.20-38                          
 [85] labeling_0.3                              GenomicAlignments_1.20.1                  bit_1.1-14                               
 [88] tidyselect_0.2.5                          GSEABase_1.46.0                           AnnotationForge_1.26.0                   
 [91] plyr_1.8.4                                magrittr_1.5                              R6_2.4.0                                 
 [94] gplots_3.0.1.1                            generics_0.0.2                            base64url_1.4                            
 [97] DBI_1.0.0                                 pillar_1.4.2                              haven_2.1.1                              
[100] withr_2.1.2                               survival_2.44-1.1                         RCurl_1.95-4.12                          
[103] modelr_0.1.5                              crayon_1.3.4                              KernSmooth_2.23-15                       
[106] progress_1.2.2                            locfit_1.5-9.1                            grid_3.6.1                               
[109] readxl_1.3.1                              data.table_1.12.2                         blob_1.2.0                               
[112] Rgraphviz_2.28.0                          digest_0.6.20                             xtable_1.8-4                             
[115] brew_1.0-6                                munsell_0.5.0
ChIPQC software error • 1.3k views
ADD COMMENT
1
Entering edit mode

You are correct. The problem is caused by single end data which would not return fragment length. I fixed it in my fork project. But I hope the author will fix it in future.

https://github.com/shengqh/ChIPQC

ADD REPLY

Login before adding your answer.

Traffic: 954 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6