I am trying to calculate the Fragment Length and Relative Cross-Coverage for a sequencing experiment using a non-model microbe. DNA was sheared to ~300bp during library prep, used single-end library sequencing, and my peaks were called using Mosaics in R (fragment length 300, bin size 200). I used the information here and here to make a custom annotation list from my organisms .gff file and that appears to work.
txdb <- GenomicFeatures::makeTxDbFromGFF("20181113_hcah_GCF_000223905.1_ASM22390v1_genomic.gff", format = "gff") txn <- GenomicFeatures::transcripts(txdb) gene <- unlist(GenomicFeatures::cdsBy(txdb, "tx")) pro500 <- GenomicFeatures::promoters(txdb, upstream = 500, downstream = 0) pro250 <- GenomicFeatures::promoters(txdb, upstream = 250, downstream = 0) hha <- list(version="", gff.features = txn, #all features in gff genes = gene, #specifically CDS promoters250 = pro250, #250bp upstream of any feature promoters500 = pro500 #500bp upstream of any feature
However, when I run either ChIPQCsample() or ChIPQC() my output contains only Tissue, Factor, Condition, Control, Replicate, Peaks. It is not returning fragment length calculations or cross-correlation scores. Is it because my data are not paired-end? For brevity, I've only included the error information for ChIPQCsample(). The output is the same for ChIPQC(), just over my 16 samples.
###test on single file bam <- "data/bams/4774_C5_S74_L004_R1_001_trimmed_sorted.bam" peeks <- "data/peakcalls/HCA_trmB_gc_0glu_A.bed" peek <- rtracklayer::import(peeks, format = "bed") tmp2 = ChIPQCsample(reads = bam, peaks = peek, annotation = hha)
I haven't been able to identify the source of the NA integer warning. It is also present if I do not include the annotation file.
the condition has length > 1 and only the first element will be usedBam file has 3 contigs Calculating coverage histogram for NC015948.1 Calculating SSD for NC015948.1 Calculating unique positions per strand for NC015948.1 Calculating shift for NC015948.1 300 / 300 removing1peaks within 400bp of chromosome edges Counting reads in features for NC015948.1 Signal over peaks for NC015948.1 done Calculating coverage Calculating Summits on NC015948.1 ..Calculating coverage histogram for NC015943.1 Calculating SSD for NC015943.1 Calculating unique positions per strand for NC015943.1 Calculating shift for NC015943.1 300 / 300 removing1peaks within 400bp of chromosome edges Counting reads in features for NC015943.1 Signal over peaks for NC015943.1 done Calculating coverage Calculating Summits on NC015943.1 ..Calculating coverage histogram for NC015944.1 Calculating SSD for NC015944.1 Calculating unique positions per strand for NC015944.1 Calculating shift for NC015944.1 300 / 300 removing1peaks within 400bp of chromosome edges Counting reads in features for NC015944.1 Signal over peaks for NC015944.1 done Calculating coverage Calculating Summits on NC_015944.1 ..  1  1  1 NAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflow.....
ChIPQCsample Number of Mapped reads: 2167362 Number of Mapped reads passing MapQ filter: 2136201 Percentage Of Reads as Non-Duplicates (NRF): 100(0) Percentage Of Reads in Blacklisted Regions: NA SSD: 22.8916269433196 Fragment Length Cross-Coverage: Relative Cross-Coverage: Percentage Of Reads in GenomicFeature: Percentage Of Reads in Peaks: 9.28 Number of Peaks: 192
ProportionOfCounts Peaks 0.09284473
A separate error (perhaps a bug - others have similar issues that haven't been resolved, see below for links) is preventing the generation of a complete report. All plots are generated, but the final html is not compiled. The CC_score plot is empty.
QCmetrics(tmp2) Error in names(res) <- c("Reads", "Map%", "Filt%", "Dup%", "ReadL", "FragL", : 'names' attribute  must be the same length as the vector 
ChIPQCreport(tmp2) Saving 7 x 7 in image Error in names(res) <- c("Reads", "Map%", "Filt%", "Dup%", "ReadL", "FragL", : 'names' attribute  must be the same length as the vector 
Here is a list of others reporting the same issues: https://support.bioconductor.org/p/117577/ https://support.bioconductor.org/p/122402/ https://support.bioconductor.org/p/122763/ https://support.bioconductor.org/p/121941/ https://www.biostars.org/p/377937/
It's difficult to troubleshoot because I am not sure if these are independent issues or related. Any help would be appreciated. Some users have suggested reverting to an older version of ChIPQC, but I am having issues with dependencies for older versions of DiffBind. If that's the best solution, I'll make another post.
> sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763) Matrix products: default Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding locale:  LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252  LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages:  parallel stats4 stats graphics grDevices utils datasets methods base other attached packages:  ChIPQC_1.20.0 DiffBind_2.12.0 SummarizedExperiment_1.14.1 DelayedArray_0.10.0 BiocParallel_1.18.1  matrixStats_0.54.0 Biobase_2.44.0 GenomicRanges_1.36.1 GenomeInfoDb_1.20.0 IRanges_2.18.2  S4Vectors_0.22.0 BiocGenerics_0.30.0 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3  purrr_0.3.2 readr_1.3.1 tidyr_0.8.3 tibble_2.1.3 ggplot2_3.2.1  tidyverse_1.2.1 loaded via a namespace (and not attached):  amap_0.8-17 colorspace_1.4-1 rjson_0.2.20  hwriter_1.3.2 XVector_0.24.0 rstudioapi_0.10  ggrepel_0.8.1 bit64_0.9-7 AnnotationDbi_1.46.1  lubridate_1.7.4 xml2_1.2.2 splines_3.6.1  TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2 knitr_1.24 zeallot_0.1.0  Nozzle.R1_1.1-1 jsonlite_1.6 Rsamtools_2.0.0  broom_0.5.2 annotate_1.62.0 GO.db_3.8.2  pheatmap_1.0.12 graph_1.62.0 TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2  BiocManager_1.30.4 compiler_3.6.1 httr_1.4.1  GOstats_2.50.0 backports_1.1.4 assertthat_0.2.1  Matrix_1.2-17 lazyeval_0.2.2 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2  limma_3.40.6 cli_1.1.0 prettyunits_1.0.2  tools_3.6.1 gtable_0.3.0 glue_1.3.1  GenomeInfoDbData_1.2.1 Category_2.50.0 reshape2_1.4.3  systemPipeR_1.18.2 rappdirs_0.3.1 batchtools_0.9.11  ShortRead_1.42.0 Rcpp_1.0.2 TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2  TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2 cellranger_1.1.0 vctrs_0.2.0  Biostrings_2.52.0 gdata_2.18.0 nlme_3.1-140  rtracklayer_1.44.4 TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.7 xfun_0.9  rvest_0.3.4 gtools_3.8.1 XML_3.98-1.20  edgeR_3.26.8 zlibbioc_1.30.0 scales_1.0.0  BSgenome_1.52.0 VariantAnnotation_1.30.1 hms_0.5.1  RBGL_1.60.0 RColorBrewer_1.1-2 yaml_2.2.0  memoise_1.1.0 biomaRt_2.40.4 latticeExtra_0.6-28  stringi_1.4.3 RSQLite_2.1.2 genefilter_1.66.0  checkmate_1.9.4 GenomicFeatures_1.36.4 caTools_126.96.36.199  chipseq_1.34.0 rlang_0.4.0 pkgconfig_2.0.2  bitops_1.0-6 TxDb.Celegans.UCSC.ce6.ensGene_3.2.2 lattice_0.20-38  labeling_0.3 GenomicAlignments_1.20.1 bit_1.1-14  tidyselect_0.2.5 GSEABase_1.46.0 AnnotationForge_1.26.0  plyr_1.8.4 magrittr_1.5 R6_2.4.0  gplots_188.8.131.52 generics_0.0.2 base64url_1.4  DBI_1.0.0 pillar_1.4.2 haven_2.1.1  withr_2.1.2 survival_2.44-1.1 RCurl_1.95-4.12  modelr_0.1.5 crayon_1.3.4 KernSmooth_2.23-15  progress_1.2.2 locfit_1.5-9.1 grid_3.6.1  readxl_1.3.1 data.table_1.12.2 blob_1.2.0  Rgraphviz_2.28.0 digest_0.6.20 xtable_1.8-4  brew_1.0-6 munsell_0.5.0