Search
Question: Errors using wavClusteR package - "range cannot be determined from the supplied arguments (too many NAs)"
1
gravatar for S.k
2.0 years ago by
S.k10
S.k10 wrote:

I am working on PAR-CLIP analysis using the wavClusteR package. When I try to load a bam file, I get the following error:

 

> filename<-("test.bam")
> Bam <- readSortedBam(filename = filename)
> Bam
> countTable <- getAllSub( Bam, minCov = 10 )
Loading required package: doMC
Considering substitutions, n = 1999, processing in 2 chunks
   chunk #: 1
   chunk #: 2
Error in { :
  task 1 failed - "solving row 247: range cannot be determined from the supplied arguments (too many NAs)"
In addition: Warning message:
executing %dopar% sequentially: no parallel backend registered

 

Any ideas on what could be causing this? I suspect this it has something to do with GRanges. Here is the portion of the file that is being noted in the error message:

 

> Bam[247]
GRanges object with 1 range and 2 metadata columns:
      seqnames               ranges strand |
         <Rle>            <IRanges>  <Rle> |
  [1]     chr1 [91853066, 91853115]      + |
                                                    qseq          MD
                                          <DNAStringSet> <character>
  [1] ACTGAGCTCGCCTTAGGACACCTGCGTTACCGTTTGACAGGTGTACCGCC        45A3
  -------
  seqinfo: 25 sequences from an unspecified genome; no seqlengths

 

This seems pretty normal, though. Suggestions?

 

Session info:

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.3 (Final)

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] wavClusteR_2.4.1     Rsamtools_1.22.0     Biostrings_2.38.4
[4] XVector_0.10.0       GenomicRanges_1.22.4 GenomeInfoDb_1.6.3
[7] IRanges_2.4.8        S4Vectors_0.8.11     BiocGenerics_0.16.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.4                wmtsa_2.0-0
 [3] compiler_3.2.3             RColorBrewer_1.1-2
 [5] futile.logger_1.4.1        plyr_1.8.3
 [7] GenomicFeatures_1.22.13    bitops_1.0-6
 [9] futile.options_1.0.0       iterators_1.0.8
[11] tools_3.2.3                zlibbioc_1.16.0
[13] rpart_4.1-10               biomaRt_2.26.1
[15] mclust_5.2                 lattice_0.20-33
[17] RSQLite_1.0.0              gtable_0.2.0
[19] foreach_1.4.3              DBI_0.3.1
[21] gridExtra_2.2.1            cluster_2.0.3
[23] rtracklayer_1.30.4         stringr_1.0.0
[25] ifultools_2.0-1            ade4_1.7-4
[27] nnet_7.3-12                grid_3.2.3
[29] Biobase_2.30.0             AnnotationDbi_1.32.3
[31] survival_2.38-3            XML_3.98-1.4
[33] BiocParallel_1.4.3         foreign_0.8-66
[35] latticeExtra_0.6-28        Formula_1.2-1
[37] seqinr_3.1-3               ggplot2_2.1.0
[39] lambda.r_1.1.7             splus2R_1.2-0
[41] magrittr_1.5               splines_3.2.3
[43] Hmisc_3.17-3               MASS_7.3-45
[45] scales_0.4.0               codetools_0.2-14
[47] GenomicAlignments_1.6.3    SummarizedExperiment_1.0.2
[49] colorspace_1.2-6           stringi_1.0-1
[51] acepack_1.3-3.3            RCurl_1.95-4.8
[53] munsell_0.4.3
ADD COMMENTlink modified 24 months ago by federico.comoglio100 • written 2.0 years ago by S.k10
2
gravatar for federico.comoglio
24 months ago by
Switzerland
federico.comoglio100 wrote:

Hi Stephen,

to generate a substitution count table, wavClusteR splits a sorted Bam file by strand, flags all entries with an MD value other than perfect matches, and generates a GRangesList object where each slot contains the set of genomic regions that share the same MD. Elements of this list are then processed in chunks of size 1000 (using multiple cores if enabled). 

You sent me a chunk of the Bam file that was sufficient to generate the error you reported. This error is caused by the alignment:

$10^A38 
GRanges object with 1 range and 2 metadata columns:
      seqnames             ranges strand |                                               qseq          MD
         <Rle>          <IRanges>  <Rle> |                                     <DNAStringSet> <character>
  [1]    chr21 [9827478, 9827527]      + | CCGCAGGCGCGCAATTACCCACTCCCGACCCGGGGAGGTAGTGACGAAAA      10^A38
 

because the MD value 10^A38 (10 matches followed by deletion of one A and 38 matches) contains an indel. wavCluster does not support indels. How did you align you PAR-CLIP data? 

 

Let me know. 

Hope this helps,

Federico

ADD COMMENTlink written 24 months ago by federico.comoglio100

This was a paired-end alignment to mm10 with bowtie2, the command used was something like this:

bowtie2 --threads "$THREADS" --local -x "$Genome_path" -q -1 "${FASTQ_R1}.trim" -2 "${FASTQ_R2}.trim" | samtools view -@ "$THREADS" -Sb1 - | samtools sort -m 10G -@ "$THREADS" - "$SAMPLEID"
ADD REPLYlink written 24 months ago by S.k10

Thank you. The problem was then generated by Bowtie2, which allows gapped alignments. I would suggest realigning your data with bowtie. Let me know.

ADD REPLYlink written 24 months ago by federico.comoglio100

Federico, thanks for your help. I repeated the alignment on that test sample with bowtie, and was able to progress past the pipeline step in the OP which was previously giving an error. This is the alignment command I used (using only one of the paired reads):

zcat ${tmp_fastq1_gz} | bowtie "$tmp_genome_path" --threads 4 -v 2 -m 10 --best --strata - -S ${tmp_sampleID}.1.sam

However I am currently having issues downstream in the pipeline, at the filterClusters step. If I am unable to troubleshoot it, I will create a new support thread on here about it.

ADD REPLYlink written 24 months ago by S.k10

Sure, keep me posted.

ADD REPLYlink written 24 months ago by federico.comoglio100

Hi Federico,

Thanks for your help. I am having another issue, which I have posted here: Error with wavClusteR annotateClusters function

 

ADD REPLYlink written 23 months ago by S.k10
0
gravatar for federico.comoglio
2.0 years ago by
Switzerland
federico.comoglio100 wrote:

Hi Stephen,

thank you for posting your issue on the support site. I realised the tag for the threads related to wavClusteR has been consistently misspelled to wavcluter. Is it possible to edit it?

Re. your issue, would it be possible for you to try out v2.5.2 in BioC devel and report back? As we discussed earlier today by email, the issue you reported has nothing to do with line 247 in your Bam file. Rather, it refers to the row 247 in the count table.

I will work on it and report back asap.

 

Let me know.

Federico

 

 

ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by federico.comoglio100

Thanks for looking into this. I tried to change the tag but I think that the forum here is intentionally using lowercase for the tags.

ADD REPLYlink written 2.0 years ago by S.k10

Federico mentioned that there was a tag 'wavcluter' (note spelling); this tag has now been removed.

ADD REPLYlink written 2.0 years ago by Martin Morgan ♦♦ 21k

Thank you, Martin. I am still working on the issue and I hope to get back with a proper answer in 2-3 days.

ADD REPLYlink written 24 months ago by federico.comoglio100
0
gravatar for federico.comoglio
24 months ago by
Switzerland
federico.comoglio100 wrote:

Hi Stephen,

did you have the chance to run your code with version 2.5.2 of the package?

ADD COMMENTlink written 24 months ago by federico.comoglio100

Sorry I haven't figured out yet how to force a download for v2.5.2, I see it listed here:

https://www.bioconductor.org/packages/3.3/bioc/html/wavClusteR.html

But when I run the commands listed :

source("https://bioconductor.org/biocLite.R")
biocLite("wavClusteR")

It gives me the 2.4.1 version instead. Perhaps because I am using Bioconductor 3.2 instead of Bioconductor 3.3 (developer version)? I haven't been able to find clear instructions on how to set that up either.

ADD REPLYlink modified 24 months ago • written 24 months ago by S.k10

Federico,

I installed the developer version of R 3.3 for OS X and tried it again, and got the same error with the same data set, using version 2.5.2 of wavClusteR.

> countTable <- getAllSub( Bam, minCov = 10 )
Loading required package: doMC
Considering substitutions, n = 1999, processing in 2 chunks
   chunk #: 1
   chunk #: 2
Error in { :
  task 1 failed - "solving row 247: range cannot be determined from the supplied arguments (too many NAs)"
In addition: Warning message:
executing %dopar% sequentially: no parallel backend registered

 

> sessionInfo()
R version 3.3.0 beta (2016-04-21 r70529)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] wavClusteR_2.5.2      Rsamtools_1.23.8      Biostrings_2.39.14
 [4] XVector_0.11.8        GenomicRanges_1.23.27 GenomeInfoDb_1.7.6
 [7] IRanges_2.5.46        S4Vectors_0.9.51      BiocGenerics_0.17.5
[10] BiocInstaller_1.21.4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.4.5               wmtsa_2.0-0
 [3] compiler_3.3.0              RColorBrewer_1.1-2
 [5] plyr_1.8.3                  GenomicFeatures_1.23.30
 [7] bitops_1.0-6                iterators_1.0.8
 [9] tools_3.3.0                 zlibbioc_1.17.1
[11] rpart_4.1-10                biomaRt_2.27.2
[13] mclust_5.2                  lattice_0.20-33
[15] RSQLite_1.0.0               gtable_0.2.0
[17] Matrix_1.2-5                foreach_1.4.3
[19] DBI_0.3.1                   gridExtra_2.2.1
[21] cluster_2.0.4               rtracklayer_1.31.10
[23] stringr_1.0.0               ifultools_2.0-1
[25] ade4_1.7-4                  nnet_7.3-12
[27] grid_3.3.0                  Biobase_2.31.3
[29] AnnotationDbi_1.33.12       survival_2.39-2
[31] XML_3.98-1.4                BiocParallel_1.5.21
[33] foreign_0.8-66              latticeExtra_0.6-28
[35] Formula_1.2-1               seqinr_3.1-3
[37] ggplot2_2.1.0               splus2R_1.2-0
[39] magrittr_1.5                splines_3.3.0
[41] Hmisc_3.17-3                MASS_7.3-45
[43] scales_0.4.0                codetools_0.2-14
[45] GenomicAlignments_1.7.21    SummarizedExperiment_1.1.25
[47] colorspace_1.2-6            stringi_1.0-1
[49] acepack_1.3-3.3             RCurl_1.95-4.8
[51] munsell_0.4.3
ADD REPLYlink written 24 months ago by S.k10

Thanks a lot, I will get back to you asap.

ADD REPLYlink written 24 months ago by federico.comoglio100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 194 users visited in the last hour