Error in running ATACseqQC
3
0
Entering edit mode
Fei Liu • 0
@fei-liu-22054
Last seen 4.9 years ago

There are some issues when running ATACseqQC.

  1. In out-pdf functions such as fragSizeDist, PTscore, only when I exit R session can I get non-empty pdf document, and if I don't exit R, the out-pdf is always empty, so that run a function, I have to go into R, and then exit to get the target file. Is this the way it was done? Or there other way to get out files?
  2. When running splitGAlignmentsByCut function, I got an error.
library(ATACseqQC)
library(GenomicRanges)
bamfile <- '/simm/huangyulab/liufei/Owl/seq_data/ATACseq/GYP1.sort.clean.bam'
tags <- c("AS", "XN", "XM", "XO", "XG", "NM", "MD", "YS", "YT")
gal1 <- readBamFile(bamFile=bamfile, tag=tags,which=GRanges("chr1", IRanges(1, 1e6)),asMates=FALSE)
This is a big BAM file. Do you want to set bigFile=TRUE to save memory? (Y/n)? y
names(gal1) <- mcols(gal1)$qname
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txs <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(phastCons100way.UCSC.hg19)
objs <- splitGAlignmentsByCut(gal1,txs=txs,genome=genome,outPath='/simm/huangyulab/liufei/Owl/seq_data/ATACseq/GYP1.splitbam/',conservation=phastCons100way.UCSC.hg19)
  
Error in splitGAlignmentsByCut(gal1, txs = txs, genome = genome, outPath = "/simm/huangyulab/liufei/Owl/seq_data/ATACseq/GYP1.splitbam/",  :
  length(names(obj)) == length(obj) is not TRUE

and then try splitBam

> objs <- splitBam(bamfile, tags=tags, outPath='/simm/huangyulab/liufei/Owl/seq_data/ATACseq/RESULT/GYP1.splitBam/',txs=txs,genome=genome,conservation=phastCons100way.UCSC.hg19)

  Error in splitBam(bamfile, tags = tags, outPath = "/simm/huangyulab/liufei/Owl/seq_data/ATACseq/RESULT/GYP1.splitBam/", :   is(genome, "BSgenome") is not TRUE

then I set :

genome <- BSgenome.Hsapiens.UCSC.hg19

try again :

> objs <- splitBam(bamfile, tags=tags, outPath='/simm/huangyulab/liufei/Owl/seq_data/ATACseq/RESULT/GYP1.splitBam/',txs=txs,genome=genome,conservation=phastCons100way.UCSC.hg19)

error :

[E::sam_parse1] unrecognized type N 
[W::sam_read1] Parse error at line2780 
[E::sam_parse1] unrecognized type N 
[W::sam_read1] Parse error at line 2780 
[E::sam_parse1] unrecognized type N 
[W::sam_read1] Parse error at line 2780 
[E::sam_parse1] unrecognized type N 
[W::sam_read1] Parse error at line 2780 
[E::sam_parse1] unrecognized type N
[W::sam_read1] Parse error at line 2780 
[E::sam_parse1] unrecognized type N 
[W::sam_read1] Parse error at line 2780 
[E::sam_parse1] unrecognized type N 
[W::sam_read1] Parse error at line 2780
[E::sam_parse1] unrecognized type N 
[W::sam_read1] Parse error at line 2780 
Error in shiftGAlignmentsList(chunk0, positive = positive,
    negative = negative) :   all(elementNROWS(gal) < 3) is not TRUE

What does that mean and how to deal with the error when running splitGAlignmentsByCut and splitBam?

software error • 2.4k views
ADD COMMENT
2
Entering edit mode
haibol ▴ 90
@haibol-19957
Last seen 5.2 years ago
  1. In out-pdf functions such as fragSizeDist, PTscore, only when I exit R session can I get non-empty pdf document, and if I don't exit R, the out-pdf is always empty, so that run a function, I have to go into R, and then exit to get the target file. Is this the way it was done? Or there other way to get out files?

A: A better way to generate plots and save to a PDF is like this: pdf(file = "XXXXX.pdf", width = 10, height = 12)

plot script here

dev.off() # this will close the graphical device for plotting, the plots in the buffer will be written to the PDF file.

  1. This problem is not clear to me. I need know more about your data. 1) Are you analyze paired end ATAC-seq data from human? 2) what is the reference genome sequence you used for mapping? and where was it downloaded from? To use library(BSgenome.Hsapiens.UCSC.hg19) library(TxDb.Hsapiens.UCSC.hg19.knownGene) You must align your sequences to hg19 from UCSC. Otherwise you need make some change to the code. And son'g use library(phastCons100way.UCSC.hg19), which will trigger a computation-intensive machine-learning algorithm. 3) The third error is due the abortion of calling the function splitGAlignmentsByCut(). You should make sure the previous steps run correctly before running the next step. My suggestion is to run the example code in the ATACseqQC vignette first to see if you can get it work. Then you can try your own data. Also read our BMC Genomics Paper about the ATACseqQC package, especially the Supplementary File 1, where more detailed scripts for preprocessing of ATAC-seq data is documented.

Haibo

ADD COMMENT
0
Entering edit mode

Thanks for your reply.

The command pdf(file=xxx.pdf .....) not work, although that produce a new pdf, it can not open to watch.

I tried the example code successfully, and to I downloaded the SRR891280 https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR891280 and tried again, but also got the same error. My reference is hg38.

Here are my steps to deal with it

fastq-dump SRR891280
bwa mem hg38 SRR891280.fastq > SRR891280.sam  (my own data are paired-end)
samtools view -bS -@30 SRR891280.sam > SRR891280.bam
samtools sort SRR891280.bam > SRR891280.sort.bam
samtools index SRR891280.sort.bam

and then I used bamQC to get SRR891280.sort.clean.bam, after that used splitGAlignmentsByCut, and both SRR891280.sort.bam and SRR891280.sort.clean.bam error
ADD REPLY
0
Entering edit mode
Ou, Jianhong ★ 1.3k
@ou-jianhong-4539
Last seen 7 days ago
United States

Hi Fei,

As Haibo suggested, to output figures, you'd better plot the figure to a device and close it after plot.

length(names(obj)) == length(obj) is not TRUE

means the length of name of gal1 is not identical to the length of gal. Please double check the names of gal1 to make sure that there is no issue with the name of the reads.

And

all(elementNROWS(gal) < 3) is not TRUE

I think this should be fixed, see https://support.bioconductor.org/p/120707/. Could you please share your sessionInfo and packageVersion of ATACseqQC.

Jianhong.

ADD COMMENT
0
Entering edit mode
sessionInfo

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] BiocManager_1.30.4
 [2] phastCons100way.UCSC.hg19_3.7.2
 [3] GenomicScores_1.8.1
 [4] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [5] GenomicFeatures_1.36.4
 [6] AnnotationDbi_1.46.1
 [7] Biobase_2.44.0
 [8] BSgenome.Hsapiens.UCSC.hg19_1.4.0
 [9] BSgenome_1.52.0
[10] rtracklayer_1.44.4
[11] Biostrings_2.52.0
[12] XVector_0.24.0
[13] GenomicRanges_1.36.1
[14] GenomeInfoDb_1.20.0
[15] IRanges_2.18.3
[16] ATACseqQC_1.8.5
[17] S4Vectors_0.22.1
[18] BiocGenerics_0.30.0

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1              seqinr_3.6-1
  [3] grImport2_0.1-5               futile.logger_1.4.3
  [5] base64enc_0.1-3               rGADEM_2.32.0
  [7] ChIPpeakAnno_3.18.2           bit64_0.9-7
  [9] interactiveDisplayBase_1.22.0 splines_3.6.1
 [11] motifStack_1.28.0             zeallot_0.1.0
 [13] ade4_1.7-13                   polynom_1.4-0
 [15] Rsamtools_2.0.2               seqLogo_1.50.0
 [17] GO.db_3.8.2                   dbplyr_1.4.2
 [19] png_0.1-7                     graph_1.62.0
 [21] shiny_1.3.2                   compiler_3.6.1
 [23] httr_1.4.1                    backports_1.1.5
 [25] assertthat_0.2.1              Matrix_1.2-17
 [27] lazyeval_0.2.2                limma_3.40.6
 [29] later_1.0.0                   formatR_1.7
 [31] htmltools_0.4.0               prettyunits_1.0.2
 [33] tools_3.6.1                   glue_1.3.1
 [35] GenomeInfoDbData_1.2.1        dplyr_0.8.3
 [37] rappdirs_0.3.1                Rcpp_1.0.2
 [39] vctrs_0.2.0                   multtest_2.40.0
 [41] stringr_1.4.0                 mime_0.7
 [43] ensembldb_2.8.0               XML_3.98-1.20
 [45] idr_1.2                       AnnotationHub_2.16.1
 [47] edgeR_3.26.8                  zlibbioc_1.30.0
 [49] MASS_7.3-51.4                 scales_1.0.0
 [51] hms_0.5.1                     promises_1.1.0
 [53] ProtGenerics_1.16.0           SummarizedExperiment_1.14.1
 [55] RBGL_1.60.0                   AnnotationFilter_1.8.0
 [57] lambda.r_1.2.4                yaml_2.2.0
 [59] curl_4.2                      memoise_1.1.0
 [61] MotIV_1.40.0                  biomaRt_2.40.5
 [63] stringi_1.4.3                 RSQLite_2.1.2
 [65] randomForest_4.6-14           BiocParallel_1.18.1
 [67] rlang_0.4.0                   pkgconfig_2.0.3
 [69] matrixStats_0.55.0            bitops_1.0-6
 [71] lattice_0.20-38               purrr_0.3.2
 [73] GenomicAlignments_1.20.1      htmlwidgets_1.5
 [75] bit_1.1-14                    tidyselect_0.2.5
 [77] magrittr_1.5                  R6_2.4.0
 [79] DelayedArray_0.10.0           DBI_1.0.0
 [81] preseqR_4.0.0                 pillar_1.4.2
 [83] survival_2.44-1.1             RCurl_1.95-4.12
 [85] tibble_2.1.3                  crayon_1.3.4
 [87] futile.options_1.0.1          KernSmooth_2.23-15
 [89] BiocFileCache_1.8.0           jpeg_0.1-8
 [91] progress_1.2.2                locfit_1.5-9.1
 [93] grid_3.6.1                    blob_1.2.0
 [95] digest_0.6.21                 xtable_1.8-4
 [97] VennDiagram_1.6.20            httpuv_1.5.2
 [99] regioneR_1.16.5               munsell_0.5.0
ADD REPLY
0
Entering edit mode

And I try to use bowtie2 to get sam file, then samtools and try splitBam function again, there also have some error.

> objs <- splitBam(bamfile,tags=tags,outPath='/simm/huangyulab/liufei/Owl/bowtie2_build_ref/split_1Bam',txs=txs,genome=Hsapiens,conservation=phastCons100way.UCSC.hg19)
[bam_sort_core] merging from 3 files and 1 in-memory blocks...
[bam_sort_core] merging from 1 files and 1 in-memory blocks...
[bam_sort_core] merging from 1 files and 1 in-memory blocks...
[bam_sort_core] merging from 1 files and 1 in-memory blocks...
[bam_sort_core] merging from 1 files and 1 in-memory blocks...
[bam_sort_core] merging from 1 files and 1 in-memory blocks...
[bam_sort_core] merging from 1 files and 1 in-memory blocks...
[bam_sort_core] merging from 1 files and 1 in-memory blocks...
[bam_sort_core] merging from 1 files and 1 in-memory blocks...
[bam_sort_core] merging from 1 files and 1 in-memory blocks...
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
Error in splitGAlignmentsByCut(gal1, breaks = breaks, labels = labels,  :
  no reads in the obj.

and splitGAlignmentsByCut function get the same error

splitGAlignmentsByCut(gal1, txs=txs, genome=Hsapiens,conservation=phastCons100way.UCSC.hg19, outPath='/simm/huangyulab/liufei/Owl/bowtie2_build_ref/split_2Bam')
Error in splitGAlignmentsByCut(gal1, txs = txs, genome = Hsapiens, conservation = phastCons100way.UCSC.hg19,  :
  length(names(obj)) == length(obj) is not TRUE
ADD REPLY
0
Entering edit mode
Fei Liu • 0
@fei-liu-22054
Last seen 4.9 years ago

After testing, the splitGAlignmentByCut function may depend on the shiftGAlignmentsList function and use the same parameters, so I obtained adapt output successfully.

But splitBam function still not work.

Thank you all.

ADD COMMENT

Login before adding your answer.

Traffic: 882 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6