Hi,
I am trying to use derfinder to analyze samples from Saccharomyces cerevisiae. I am able to supply the species and chromosome style information in the `fullCoverage()` command, but when I go to do `regionMatrix()` the style has changed.
This is my code to get to the `fullCoverage()` call:
##Load libraries library('derfinder') library('GenomicRanges') file_designations <- c("WT1","WT2","WT3", "MT1", "MT2", "MT3") chrs_all = c('I', 'II','III','IV', 'V', 'VI', 'VII', 'VIII', 'IX', 'X', 'XI', 'XII', 'XIII', 'XIV', 'XV', 'XVI', 'MT') ## Load the data from disk files <- file.path("/data/alignments", file_designations , paste0(file_designations,"Aligned.sorted.bam")) system.time(fullCov <- fullCoverage(files = files, chrs = chrs_all, totalMapped = sapply(files,getTotalMapped, chrs = chrs_all), targetSize = 8e+07, species = "Saccharomyces_cerevisiae",currentStyle = 'NCBI'))
That worked and I'll include some fragments of it so you can see the chromosome names:
2017-05-08 17:10:19 fullCoverage: processing chromosome I 2017-05-08 17:10:19 loadCoverage: finding chromosome lengths ... 2017-05-08 17:10:23 loadCoverage: applying the cutoff to the merged data 2017-05-08 17:10:23 filterData: normalizing coverage 2017-05-08 17:10:24 filterData: done normalizing coverage 2017-05-08 17:10:24 filterData: originally there were 230218 rows, now there are 230218 rows. Meaning that 0 percent was filtered. 2017-05-08 17:10:24 fullCoverage: processing chromosome II 2017-05-08 17:10:24 loadCoverage: finding chromosome lengths ... 2017-05-08 17:10:44 loadCoverage: applying the cutoff to the merged data 2017-05-08 17:10:44 filterData: normalizing coverage 2017-05-08 17:10:44 filterData: done normalizing coverage 2017-05-08 17:10:44 filterData: originally there were 813184 rows, now there are 813184 rows. Meaning that 0 percent was filtered. 2017-05-08 17:10:44 fullCoverage: processing chromosome III 2017-05-08 17:10:44 loadCoverage: finding chromosome lengths ...
Next I want to run the `regionMatrix()` command. After seeing Leonardo's reply to my other post -> (C: regionMatrix for derfinder crashes with seqnames issue on non-human data) , I gave the style to the `regionMatrix()` command too.
system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, species = "Saccharomyces_cerevisiae", currentStyle = 'NCBI'))
Unexpectedly, that encountered an error in the first chromosome it tried:
> system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, species = "Saccharomyces_cerevisiae", currentStyle = 'NCBI')) 2017-05-08 17:28:10 regionMatrix: processing chrI 2017-05-08 17:28:10 filterData: originally there were 230218 rows, now there are 174908 rows. Meaning that 24.03 percent was filtered. 2017-05-08 17:28:10 findRegions: identifying potential segments 2017-05-08 17:28:10 findRegions: segmenting information 2017-05-08 17:28:10 findRegions: identifying candidate regions Error in validObject(.Object) : invalid class “GRanges” object: 'seqnames(x)' contains missing values Timing stopped at: 0.308 0 0.311
So I tried without the species and chromosome style and that worked. Curiously, though, I saw that the chromosome style was now different.
> system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75)) 2017-05-08 17:29:23 regionMatrix: processing chrI 2017-05-08 17:29:23 filterData: originally there were 230218 rows, now there are 174908 rows. Meaning that 24.03 percent was filtered. 2017-05-08 17:29:23 findRegions: identifying potential segments 2017-05-08 17:29:23 findRegions: segmenting information 2017-05-08 17:29:23 findRegions: identifying candidate regions extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'. 2017-05-08 17:29:23 findRegions: identifying region clusters extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'. extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'. 2017-05-08 17:29:23 getRegionCoverage: processing chrI 2017-05-08 17:29:23 getRegionCoverage: done processing chrI 2017-05-08 17:29:23 regionMatrix: calculating coverageMatrix 2017-05-08 17:29:24 regionMatrix: adjusting coverageMatrix for 'L' 2017-05-08 17:29:24 regionMatrix: processing chrII 2017-05-08 17:29:25 filterData: originally there were 813184 rows, now there are 670986 rows. Meaning that 17.49 percent was filtered. 2017-05-08 17:29:25 findRegions: identifying potential segments 2017-05-08 17:29:25 findRegions: segmenting information 2017-05-08 17:29:25 findRegions: identifying candidate regions extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'. 2017-05-08 17:29:25 findRegions: identifying region clusters extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'. extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'. 2017-05-08 17:29:25 getRegionCoverage: processing chrII ...
Okay. Now I see from my table here -> (C: regionMatrix for derfinder crashes with seqnames issue on non-human data) that the chromosomes are in the UCSC format. They changed. Thus, I assume I should be able to add the chromosome style as UCSC in the `regionMatrix()` call and I'll cease to see `extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.`.
But that is not the case. I still see `extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.`
> system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, species = "Saccharomyces_cerevisiae", currentStyle = 'UCSC')) 2017-05-08 17:37:06 regionMatrix: processing chrI 2017-05-08 17:37:07 filterData: originally there were 230218 rows, now there are 174908 rows. Meaning that 24.03 percent was filtered. 2017-05-08 17:37:07 findRegions: identifying potential segments 2017-05-08 17:37:07 findRegions: segmenting information 2017-05-08 17:37:07 findRegions: identifying candidate regions 2017-05-08 17:37:07 findRegions: identifying region clusters extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'. extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'. 2017-05-08 17:37:07 getRegionCoverage: processing chrI 2017-05-08 17:37:07 getRegionCoverage: done processing chrI 2017-05-08 17:37:07 regionMatrix: calculating coverageMatrix 2017-05-08 17:37:07 regionMatrix: adjusting coverageMatrix for 'L' 2017-05-08 17:37:07 regionMatrix: processing chrII 2017-05-08 17:37:08 filterData: originally there were 813184 rows, now there are 670986 rows. Meaning that 17.49 percent was filtered. 2017-05-08 17:37:08 findRegions: identifying potential segments 2017-05-08 17:37:08 findRegions: segmenting information 2017-05-08 17:37:08 findRegions: identifying candidate regions 2017-05-08 17:37:08 findRegions: identifying region clusters extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'. extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'. 2017-05-08 17:37:08 getRegionCoverage: processing chrII ...
I even tried `system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, species = "Saccharomyces_cerevisiae", chrsStyle = 'UCSC'))
` after seeing Leonardo's reply here -->(C: regionMatrix for derfinder crashes with seqnames issue on non-human data). I still get the same thing about the style. Same for `system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, currentStyle = 'UCSC'))` and `system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, chrsStyle = 'UCSC'))` and `system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75))`. I even tried `system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, chrsStyle = 'NULL'))` after seeing Leonarado's reply here ---> (C: regionMatrix for derfinder crashes with seqnames issue on non-human data). Still see `extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.`
First, is `fullCoverage()` supposed to switch the style? Fine, if so. However, it would be helpful to know how to write the `regionMatrix()` call subsequently? Am I doing something wrong or maybe the code is overlooking what I supplied in `currentStyle` in the `regionMatrix()` call ? Everything seems to work despite that curious message.
> session_info() Session info ----- setting value version R version 3.3.1 (2016-06-21) system x86_64, linux-gnu ui X11 language (EN) collate en_US.UTF-8 tz <NA> date 2017-05-08 Packages ------ package * version date source acepack 1.4.1 2016-10-29 CRAN (R 3.3.1) AnnotationDbi 1.36.2 2017-03-10 Bioconductor backports 1.0.5 2017-01-18 CRAN (R 3.3.1) base64enc 0.1-3 2015-07-28 CRAN (R 3.3.1) Biobase 2.34.0 2017-03-10 Bioconductor BiocGenerics * 0.20.0 2017-03-10 Bioconductor BiocParallel 1.8.2 2017-05-01 Bioconductor biomaRt 2.30.0 2017-03-10 Bioconductor Biostrings 2.42.1 2017-03-10 Bioconductor bitops 1.0-6 2013-08-17 CRAN (R 3.3.1) BSgenome 1.42.0 2017-03-10 Bioconductor bumphunter 1.14.0 2017-05-01 Bioconductor checkmate 1.8.2 2016-11-02 CRAN (R 3.3.1) cluster 2.0.6 2017-03-16 CRAN (R 3.3.1) codetools 0.2-15 2016-10-05 CRAN (R 3.3.1) colorspace 1.3-2 2016-12-14 CRAN (R 3.3.1) data.table 1.10.4 2017-02-01 CRAN (R 3.3.1) DBI 0.6-1 2017-04-01 CRAN (R 3.3.1) derfinder * 1.8.5 2017-05-01 Bioconductor derfinderHelper 1.8.1 2017-05-01 Bioconductor devtools * 1.12.0 2016-12-05 CRAN (R 3.3.1) digest 0.6.12 2017-01-27 CRAN (R 3.3.1) doRNG 1.6.6 2017-04-10 CRAN (R 3.3.1) foreach 1.4.3 2015-10-13 CRAN (R 3.3.1) foreign 0.8-68 2017-04-24 CRAN (R 3.3.1) Formula 1.2-1 2015-04-07 CRAN (R 3.3.1) GenomeInfoDb * 1.10.3 2017-03-10 Bioconductor GenomicAlignments 1.10.1 2017-05-01 Bioconductor GenomicFeatures 1.26.4 2017-05-01 Bioconductor GenomicFiles 1.10.3 2017-05-01 Bioconductor GenomicRanges * 1.26.4 2017-05-01 Bioconductor ggplot2 2.2.1 2016-12-30 CRAN (R 3.3.1) gridExtra 2.2.1 2016-02-29 CRAN (R 3.3.1) gtable 0.2.0 2016-02-26 CRAN (R 3.3.1) Hmisc 4.0-2 2016-12-31 CRAN (R 3.3.1) htmlTable 1.9 2017-01-26 CRAN (R 3.3.1) htmltools 0.3.6 2017-04-28 CRAN (R 3.3.1) htmlwidgets 0.8 2016-11-09 CRAN (R 3.3.1) IRanges * 2.8.2 2017-05-01 Bioconductor iterators 1.0.8 2015-10-13 CRAN (R 3.3.1) knitr 1.15.1 2016-11-22 CRAN (R 3.3.1) lattice 0.20-35 2017-03-25 CRAN (R 3.3.1) latticeExtra 0.6-28 2016-02-09 CRAN (R 3.3.1) lazyeval 0.2.0 2016-06-12 CRAN (R 3.3.1) locfit 1.5-9.1 2013-04-20 CRAN (R 3.3.1) magrittr 1.5 2014-11-22 CRAN (R 3.3.1) Matrix 1.2-10 2017-04-28 CRAN (R 3.3.1) matrixStats 0.52.2 2017-04-14 CRAN (R 3.3.1) memoise 1.1.0 2017-04-21 CRAN (R 3.3.1) munsell 0.4.3 2016-02-13 CRAN (R 3.3.1) nnet 7.3-12 2016-02-02 CRAN (R 3.3.1) pkgmaker 0.22 2014-05-14 CRAN (R 3.3.1) plyr 1.8.4 2016-06-08 CRAN (R 3.3.1) qvalue 2.6.0 2017-05-01 Bioconductor RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.3.1) Rcpp 0.12.10 2017-03-19 CRAN (R 3.3.1) RCurl 1.95-4.8 2016-03-01 CRAN (R 3.3.1) registry 0.3 2015-07-08 CRAN (R 3.3.1) reshape2 1.4.2 2016-10-22 CRAN (R 3.3.1) rngtools 1.2.4 2014-03-06 CRAN (R 3.3.1) rpart 4.1-11 2017-04-21 CRAN (R 3.3.1) Rsamtools 1.26.2 2017-05-01 Bioconductor RSQLite 1.1-2 2017-01-08 CRAN (R 3.3.1) rtracklayer 1.34.2 2017-03-10 Bioconductor S4Vectors * 0.12.2 2017-05-01 Bioconductor scales 0.4.1 2016-11-09 CRAN (R 3.3.1) stringi 1.1.5 2017-04-07 CRAN (R 3.3.1) stringr 1.2.0 2017-02-18 CRAN (R 3.3.1) SummarizedExperiment 1.4.0 2017-03-10 Bioconductor survival 2.41-3 2017-04-04 CRAN (R 3.3.1) tibble 1.3.0 2017-04-01 CRAN (R 3.3.1) VariantAnnotation 1.20.3 2017-05-01 Bioconductor withr 1.0.2 2016-06-20 CRAN (R 3.3.1) XML 3.98-1.6 2017-03-30 CRAN (R 3.3.1) xtable 1.8-2 2016-02-05 CRAN (R 3.3.1) XVector 0.14.1 2017-05-01 Bioconductor zlibbioc 1.20.0 2017-03-10 Bioconductor >
Hi Wayne,
Can you post the output of:
after you created it with:
Also, are the chromosome names in UCSC or NCBI style in your bam files?
I have a pretty good idea of what is going on, but it could be that you don't need all the complicated chromosome name changes.
Thanks!