Question

FourCseq - error ountFragmentOverlaps()

0

Entering edit mode

samuel collombet ▴ 140

@samuel-collombet-6574

Last seen 6.9 years ago

France

Hi,

I have an error using FourCseq, which I don't really understand as I was usign the package before without problem.

Here is my code:

> 4C_exptData <- SimpleList(projectPath="myPath",fragmentDir="fragmentDir",referenceGenomeFile ="chr.fa",primerFile="primer.fa",bamFilePath="bam",reSequence1="GTAC",reSequence2="GATC")

> 4C_colData <- DataFrame(viewpoint="vp",condition=factor(rep(c("cell1", "cell2"),each=2),levels=c("cell1", "cell2")),replicate=c(1,2)),bamFile=c("cell1_rep1.bam","cell1_rep2.bam","cell2_rep1.bam","cell2_rep2.bam"),sequencingPrimer="first")

> 4C_fc <- FourC(4C_colData, 4C_exptData)
> 4C_fc <- addFragments(4C_fc,save=F)
> 4C_fc <- addViewpointFrags(4C_fc, primerFragFile="fragmentDir/primerFragments.rda")
> 4C_fc <- countFragmentOverlaps(4C_fc)

And here I got :

reading bam files
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
subscript contains NAs

Here is my sessionInfo:

> sessionInfo()
R version 3.1.0 RC (2014-04-05 r65382)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines stats4 parallel stats graphics grDevices utils
[8] datasets methods base

other attached packages:
[1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.0.0
[2] GenomicFeatures_1.18.3
[3] AnnotationDbi_1.28.1
[4] Biobase_2.26.0
[5] FourCSeq_1.0.0
[6] DESeq2_1.6.3
[7] RcppArmadillo_0.4.650.1.1
[8] Rcpp_0.11.4
[9] ggplot2_1.0.0
[10] GenomicRanges_1.18.4
[11] GenomeInfoDb_1.2.4
[12] IRanges_2.0.1
[13] S4Vectors_0.4.0
[14] BiocGenerics_0.12.1

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3 annotate_1.44.0 base64enc_0.1-2
[4] BatchJobs_1.5 BBmisc_1.9 BiocParallel_1.0.3
[7] biomaRt_2.22.0 Biostrings_2.34.1 biovizBase_1.14.1
[10] bitops_1.0-6 brew_1.0-6 BSgenome_1.34.1
[13] checkmate_1.5.1 cluster_2.0.1 codetools_0.2-10
[16] colorspace_1.2-5 DBI_0.3.1 dichromat_2.0-0
[19] digest_0.6.8 fail_1.2 fda_2.4.4
[22] foreach_1.4.2 foreign_0.8-63 Formula_1.2-0
[25] genefilter_1.48.1 geneplotter_1.44.0 GenomicAlignments_1.2.2
[28] GGally_0.5.0 ggbio_1.14.0 graph_1.44.1
[31] grid_3.1.0 gridExtra_0.9.1 gtable_0.1.2
[34] gtools_3.4.1 Hmisc_3.15-0 iterators_1.0.7
[37] lattice_0.20-30 latticeExtra_0.6-26 locfit_1.5-9.1
[40] MASS_7.3-39 Matrix_1.1-5 munsell_0.4.2
[43] nnet_7.3-9 OrganismDbi_1.8.0 plyr_1.8.1
[46] proto_0.3-10 RBGL_1.42.0 RColorBrewer_1.1-2
[49] RCurl_1.95-4.5 reshape_0.8.5 reshape2_1.4.1
[52] rpart_4.1-9 Rsamtools_1.18.3 RSQLite_1.0.0
[55] rtracklayer_1.26.2 scales_0.2.4 sendmailR_1.2-1
[58] stringr_0.6.2 survival_2.38-1 tcltk_3.1.0
[61] tools_3.1.0 VariantAnnotation_1.12.9 XML_3.98-1.1
[64] xtable_1.7-4 XVector_0.6.0 zlibbioc_1.12.0

fourcseq • 1.8k views

ADD COMMENT • link 9.1 years ago samuel collombet ▴ 140

score 0 · Answer 1 · 2015-03-05

Hello Samuel, the BiocTeam did some changes to the countOverlaps function. Could you please run traceback() to figure out where the error occurs exactly? I will look into this tomorrow in more detail. Cheers, Felix On 03/04/2015 11:12 PM, samuel collombet [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User samuel collombet <https: support.bioconductor.org="" u="" 6574=""/> wrote > Question: FourCseq - error ountFragmentOverlaps() > <https: support.bioconductor.org="" p="" 65390=""/>: > > Hi, > > I have an error using FourCseq, which I don't really understand as I > was usign the package before without problem. > > Here is my code: > > > 4C_exptData <- > SimpleList(projectPath="myPath",fragmentDir="fragmentDir",referenceGenomeFile > ="chr.fa",primerFile="primer.fa",bamFilePath="bam",reSequence1="GTAC",reSequence2="GATC") > > > 4C_colData <- > DataFrame(viewpoint="vp",condition=factor(rep(c("cell1", > "cell2"),each=2),levels=c("cell1", > "cell2")),replicate=c(1,2)),bamFile=c("cell1_rep1.bam","cell1_rep2.bam","cell2_rep1.bam","cell2_rep2.bam"),sequencingPrimer="first") > > > 4C_fc <- FourC(4C_colData, 4C_exptData) > > 4C_fc <- addFragments(4C_fc,save=F) > > 4C_fc <- addViewpointFrags(4C_fc, primerFragFile="fragmentDir/primerFragments.rda") > > 4C_fc <- countFragmentOverlaps(4C_fc) > > And here I got : > > reading bam files > Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : > subscript contains NAs > > Here is my sessionInfo: > > > sessionInfo() > R version 3.1.0 RC (2014-04-05 r65382) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] splines stats4 parallel stats graphics grDevices utils > [8] datasets methods base > > other attached packages: > [1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.0.0 > [2] GenomicFeatures_1.18.3 > [3] AnnotationDbi_1.28.1 > [4] Biobase_2.26.0 > [5] FourCSeq_1.0.0 > [6] DESeq2_1.6.3 > [7] RcppArmadillo_0.4.650.1.1 > [8] Rcpp_0.11.4 > [9] ggplot2_1.0.0 > [10] GenomicRanges_1.18.4 > [11] GenomeInfoDb_1.2.4 > [12] IRanges_2.0.1 > [13] S4Vectors_0.4.0 > [14] BiocGenerics_0.12.1 > > loaded via a namespace (and not attached): > [1] acepack_1.3-3.3 annotate_1.44.0 base64enc_0.1-2 > [4] BatchJobs_1.5 BBmisc_1.9 BiocParallel_1.0.3 > [7] biomaRt_2.22.0 Biostrings_2.34.1 biovizBase_1.14.1 > [10] bitops_1.0-6 brew_1.0-6 BSgenome_1.34.1 > [13] checkmate_1.5.1 cluster_2.0.1 codetools_0.2-10 > [16] colorspace_1.2-5 DBI_0.3.1 dichromat_2.0-0 > [19] digest_0.6.8 fail_1.2 fda_2.4.4 > [22] foreach_1.4.2 foreign_0.8-63 Formula_1.2-0 > [25] genefilter_1.48.1 geneplotter_1.44.0 GenomicAlignments_1.2.2 > [28] GGally_0.5.0 ggbio_1.14.0 graph_1.44.1 > [31] grid_3.1.0 gridExtra_0.9.1 gtable_0.1.2 > [34] gtools_3.4.1 Hmisc_3.15-0 iterators_1.0.7 > [37] lattice_0.20-30 latticeExtra_0.6-26 locfit_1.5-9.1 > [40] MASS_7.3-39 Matrix_1.1-5 munsell_0.4.2 > [43] nnet_7.3-9 OrganismDbi_1.8.0 plyr_1.8.1 > [46] proto_0.3-10 RBGL_1.42.0 RColorBrewer_1.1-2 > [49] RCurl_1.95-4.5 reshape_0.8.5 reshape2_1.4.1 > [52] rpart_4.1-9 Rsamtools_1.18.3 RSQLite_1.0.0 > [55] rtracklayer_1.26.2 scales_0.2.4 sendmailR_1.2-1 > [58] stringr_0.6.2 survival_2.38-1 tcltk_3.1.0 > [61] tools_3.1.0 VariantAnnotation_1.12.9 XML_3.98-1.1 > [64] xtable_1.7-4 XVector_0.6.0 zlibbioc_1.12.0 > > ------------------------------------------------------------------------ > > You may reply via email or visit > FourCseq - error ountFragmentOverlaps() >

score 0 · Answer 2 · 2015-03-05

0

Entering edit mode

samuel collombet ▴ 140

@samuel-collombet-6574

Last seen 6.9 years ago

France

> traceback()
11: stop("subscript contains NAs")
10: NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)
9: NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)
8: normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE)
7: extractROWS(x, i)
6: extractROWS(x, i)
5: rga[mcols(rga)$mapq > minMapq]
4: rga[mcols(rga)$mapq > minMapq]
3: FUN(X[[1L]], ...)
2: lapply(reads, function(rga, minMapq) rga[mcols(rga)$mapq > minMapq],
minMapq = minMapq)
1: countFragmentOverlaps(4C_fc)

is it ok?
Thanks!

ADD COMMENT • link 9.1 years ago samuel collombet ▴ 140

0

Entering edit mode

Hello Samuel, I could reproduce the error by setting the "mapq" to NA. This means that the reads that you are using probably do not have mapq scores. So at this position the error occurs: 5: rga[mcols(rga)$mapq > minMapq] 4: rga[mcols(rga)$mapq > minMapq] Since there are no scores this results in NAs and raises the reported error. Please check what the mapq scores are in your bam files and adjust the filter accordingly. To completely skip the filtering step you can provide a negative number for the minimum mapq score: countFragmentOverlaps(fc, minMapq=-1) This should solve the problem. Best regards, Felix

ADD REPLY • link 9.1 years ago felix.klein ▴ 150

0

Entering edit mode

This is quite strange as all my reads have a MAPQ of 255 (mapping with STAR):
$ samtools view reads.bam | cut -f 5 | sort -u
> 255

However putting minMapq=-1 make it work! Do you have any idea where it comes form?

Many thanks!

ADD REPLY • link 9.1 years ago samuel collombet ▴ 140

0

Entering edit mode

Hello Samuel, I would guess that somehow the quality scores are not properly read resulting in NA values. If you run countFragmentOverlaps in debugging mode you could check if the mapq are properly read or if they are read as NA. It would be great if you could check this on your files or provide me with a short file (maybe 100 lines of your bam files) that I can test it. Best regards, Felix

ADD REPLY • link 9.1 years ago felix.klein ▴ 150

0

Entering edit mode

sure, but I do not know how to run debugging mode, how should I do?

ADD REPLY • link 9.1 years ago samuel collombet ▴ 140

0

Entering edit mode

Hello Samuel, just type: debug(countFragmentOverlaps) countFragmentOverlaps(fc) you can walk through the lines of the function by pressing enter for each line until the files bam files are read in. Then have a look at reads[[1]]. For the example data in the package it looks like this: GAlignments object with 13 alignments and 1 metadata column: seqnames strand cigar qwidth start end width njunc | mapq <rle> <rle> <character> <integer> <integer> <integer> <integer> <integer> | <integer> [1] chr2L + 80M3S 83 5152 5231 80 0 | 70 [2] chr2L + 71M12S 83 5152 5222 71 0 | 70 [3] chr2L + 78M 78 5152 5229 78 0 | 70 [4] chr2L - 83M 83 5222 5304 83 0 | 70 [5] chr2L - 78M 78 5227 5304 78 0 | 70 ... ... ... ... ... ... ... ... ... ... ... [9] chr2L + 83M 83 6023 6105 83 0 | 70 [10] chr2L + 83M 83 6023 6105 83 0 | 70 [11] chr2L + 83M 83 6023 6105 83 0 | 70 [12] chr2L + 78M 78 6023 6100 78 0 | 70 [13] chr2L - 83M 83 6800 6882 83 0 | 70 ------- seqinfo: 1 sequence from an unspecified genome So the mapq is 70 in this case. Best regards, Felix

ADD REPLY • link 9.1 years ago felix.klein ▴ 150

0

Entering edit mode

I am sorry I do not really understand it, but it looks like I do not get even to reading the bam file :/

Browse[2]>
debug: stopifnot(class(object) == "FourC")
Browse[2]>
debug: if (length(rowData(object)) == 0) stop("Add fragments before calling 'findViewpointFragments'")
Browse[2]>
debug: cat("reading bam files\n")
Browse[2]>
reading bam files
debug: bamFiles = file.path(exptData(object)$bamFilePath, colData(object)$bamFile)
Browse[2]>
debug: colData(object)$orignialReads = sapply(bamFiles, function(bamFile) countBam(bamFile)$records)
Browse[2]>

debug: reads = lapply(bamFiles, function(bamfile) {
what <- c("mapq")
flag <- scanBamFlag(isUnmappedQuery = FALSE)
param <- ScanBamParam(what = what, flag = flag)
readGAlignments(bamfile, param = param)
})
Browse[2]>

debug: colData(object)$rawReads = sapply(reads, length)

Browse[2]>
debug: if (minMapq >= 0) reads = lapply(reads, function(rga, minMapq) rga[mcols(rga)$mapq >
minMapq], minMapq = minMapq)
Browse[2]>
debug: reads = lapply(reads, function(rga, minMapq) rga[mcols(rga)$mapq >
minMapq], minMapq = minMapq)
Browse[2]>
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
subscript contains NAs

Does it helps? otherwise I can send you part of my bam by email

ADD REPLY • link 9.1 years ago samuel collombet ▴ 140

0

Entering edit mode

Hello Samuel, you read in the bam files. In the debugging mode you can manually interact at each point. Just before the steps below you should type reads[[1]] manually into the console. > > Browse[2]> > debug:colData(object)$rawReads = sapply(reads, length) > > Browse[2]> > debug: if (minMapq >= 0) reads = lapply(reads, function(rga, minMapq) > rga[mcols(rga)$mapq > > minMapq], minMapq = minMapq) > Cheers, Felix PS: Otherwise send a small example file to felix.klein@embl.de including all information needed to create the FourC object until the countFragmentOverlaps call.

ADD REPLY • link 9.1 years ago felix.klein ▴ 150

0

Entering edit mode

Got it :)

indeed all the reads are reads with NA as MAPQ :/ I sent you 100 line of sam if you want to check it.

Thanks!

ADD REPLY • link 9.1 years ago samuel collombet ▴ 140