derfinder fullCoverage() changes chromosome style and then regionMatrix() complains and ignores provided 'currentStyle'
0
0
Entering edit mode
Wayne • 0
@wayne-12952
Last seen 7.6 years ago

Hi,

I am trying to use derfinder to analyze samples from Saccharomyces cerevisiae. I am able to supply the species and chromosome style information in the `fullCoverage()` command, but when I go to do `regionMatrix()` the style has changed.

This is my code to get to the `fullCoverage()` call:

##Load libraries
library('derfinder')
library('GenomicRanges')
file_designations <- c("WT1","WT2","WT3", "MT1", "MT2", "MT3")
chrs_all = c('I', 'II','III','IV', 'V', 'VI', 'VII', 'VIII', 'IX', 'X', 'XI', 'XII', 'XIII', 'XIV', 'XV', 'XVI', 'MT')
## Load the data from disk
files <- file.path("/data/alignments", file_designations , paste0(file_designations,"Aligned.sorted.bam"))
system.time(fullCov <- fullCoverage(files = files, chrs = chrs_all,
    totalMapped = sapply(files,getTotalMapped, chrs = chrs_all), targetSize = 8e+07, species = "Saccharomyces_cerevisiae",currentStyle = 'NCBI'))   

That worked and I'll include some fragments of it so you can see the chromosome names:

2017-05-08 17:10:19 fullCoverage: processing chromosome I
2017-05-08 17:10:19 loadCoverage: finding chromosome lengths
...
2017-05-08 17:10:23 loadCoverage: applying the cutoff to the merged data
2017-05-08 17:10:23 filterData: normalizing coverage
2017-05-08 17:10:24 filterData: done normalizing coverage
2017-05-08 17:10:24 filterData: originally there were 230218 rows, now there are 230218 rows. Meaning that 0 percent was filtered.
2017-05-08 17:10:24 fullCoverage: processing chromosome II
2017-05-08 17:10:24 loadCoverage: finding chromosome lengths
...
2017-05-08 17:10:44 loadCoverage: applying the cutoff to the merged data
2017-05-08 17:10:44 filterData: normalizing coverage
2017-05-08 17:10:44 filterData: done normalizing coverage
2017-05-08 17:10:44 filterData: originally there were 813184 rows, now there are 813184 rows. Meaning that 0 percent was filtered.
2017-05-08 17:10:44 fullCoverage: processing chromosome III
2017-05-08 17:10:44 loadCoverage: finding chromosome lengths
...

Next I want to run the `regionMatrix()` command. After seeing Leonardo's reply to my other post -> (C: regionMatrix for derfinder crashes with seqnames issue on non-human data) , I gave the style to the `regionMatrix()` command too.

system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, species = "Saccharomyces_cerevisiae", currentStyle = 'NCBI'))

Unexpectedly, that encountered an error in the first chromosome it tried:

    > system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, species = "Saccharomyces_cerevisiae", currentStyle = 'NCBI'))
    2017-05-08 17:28:10 regionMatrix: processing chrI
    2017-05-08 17:28:10 filterData: originally there were 230218 rows, now there are 174908 rows. Meaning that 24.03 percent was filtered.
    2017-05-08 17:28:10 findRegions: identifying potential segments
    2017-05-08 17:28:10 findRegions: segmenting information
    2017-05-08 17:28:10 findRegions: identifying candidate regions
    Error in validObject(.Object) : 
      invalid class “GRanges” object: 'seqnames(x)' contains missing values
    Timing stopped at: 0.308 0 0.311 

So I tried without the species and chromosome style and that worked. Curiously, though, I saw that the chromosome style was now different.

> system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75))
2017-05-08 17:29:23 regionMatrix: processing chrI
2017-05-08 17:29:23 filterData: originally there were 230218 rows, now there are 174908 rows. Meaning that 24.03 percent was filtered.
2017-05-08 17:29:23 findRegions: identifying potential segments
2017-05-08 17:29:23 findRegions: segmenting information
2017-05-08 17:29:23 findRegions: identifying candidate regions
extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.
2017-05-08 17:29:23 findRegions: identifying region clusters
extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.
extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.
2017-05-08 17:29:23 getRegionCoverage: processing chrI
2017-05-08 17:29:23 getRegionCoverage: done processing chrI
2017-05-08 17:29:23 regionMatrix: calculating coverageMatrix
2017-05-08 17:29:24 regionMatrix: adjusting coverageMatrix for 'L'
2017-05-08 17:29:24 regionMatrix: processing chrII
2017-05-08 17:29:25 filterData: originally there were 813184 rows, now there are 670986 rows. Meaning that 17.49 percent was filtered.
2017-05-08 17:29:25 findRegions: identifying potential segments
2017-05-08 17:29:25 findRegions: segmenting information
2017-05-08 17:29:25 findRegions: identifying candidate regions
extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.
2017-05-08 17:29:25 findRegions: identifying region clusters
extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.
extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.
2017-05-08 17:29:25 getRegionCoverage: processing chrII
...

Okay. Now I see from my table here -> (C: regionMatrix for derfinder crashes with seqnames issue on non-human data) that the chromosomes are in the UCSC format. They changed. Thus, I assume I should be able to add the chromosome style as UCSC in the `regionMatrix()` call and I'll cease to see `extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.`.

But that is not the case. I still see `extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.`

> system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, species = "Saccharomyces_cerevisiae", currentStyle = 'UCSC'))
2017-05-08 17:37:06 regionMatrix: processing chrI
2017-05-08 17:37:07 filterData: originally there were 230218 rows, now there are 174908 rows. Meaning that 24.03 percent was filtered.
2017-05-08 17:37:07 findRegions: identifying potential segments
2017-05-08 17:37:07 findRegions: segmenting information
2017-05-08 17:37:07 findRegions: identifying candidate regions
2017-05-08 17:37:07 findRegions: identifying region clusters
extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.
extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.
2017-05-08 17:37:07 getRegionCoverage: processing chrI
2017-05-08 17:37:07 getRegionCoverage: done processing chrI
2017-05-08 17:37:07 regionMatrix: calculating coverageMatrix
2017-05-08 17:37:07 regionMatrix: adjusting coverageMatrix for 'L'
2017-05-08 17:37:07 regionMatrix: processing chrII
2017-05-08 17:37:08 filterData: originally there were 813184 rows, now there are 670986 rows. Meaning that 17.49 percent was filtered.
2017-05-08 17:37:08 findRegions: identifying potential segments
2017-05-08 17:37:08 findRegions: segmenting information
2017-05-08 17:37:08 findRegions: identifying candidate regions
2017-05-08 17:37:08 findRegions: identifying region clusters
extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.
extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.
2017-05-08 17:37:08 getRegionCoverage: processing chrII
...

I even tried `system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, species = "Saccharomyces_cerevisiae", chrsStyle = 'UCSC'))
` after seeing Leonardo's reply here -->(C: regionMatrix for derfinder crashes with seqnames issue on non-human data). I still get the same thing about the style. Same for `system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, currentStyle = 'UCSC'))` and `system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, chrsStyle = 'UCSC'))` and `system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75))`. I even tried `system.time(regionMat <- regionMatrix(fullCov, cutoff = 30, L = 75, chrsStyle = 'NULL'))` after seeing Leonarado's reply here ---> (C: regionMatrix for derfinder crashes with seqnames issue on non-human data). Still see `extendedMapSeqlevels: the current naming style could not be guessed. Consider supplying 'currentStyle'.`

First, is `fullCoverage()` supposed to switch the style? Fine, if so. However, it would be helpful to know how to write the `regionMatrix()` call subsequently? Am I doing something wrong or maybe the code is overlooking what I supplied in `currentStyle` in the `regionMatrix()` call ? Everything seems to work despite that curious message.

 

> session_info()
Session info -----
 setting  value                       
 version  R version 3.3.1 (2016-06-21)
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       <NA>                        
 date     2017-05-08                  

Packages ------
 package              * version  date       source        
 acepack                1.4.1    2016-10-29 CRAN (R 3.3.1)
 AnnotationDbi          1.36.2   2017-03-10 Bioconductor  
 backports              1.0.5    2017-01-18 CRAN (R 3.3.1)
 base64enc              0.1-3    2015-07-28 CRAN (R 3.3.1)
 Biobase                2.34.0   2017-03-10 Bioconductor  
 BiocGenerics         * 0.20.0   2017-03-10 Bioconductor  
 BiocParallel           1.8.2    2017-05-01 Bioconductor  
 biomaRt                2.30.0   2017-03-10 Bioconductor  
 Biostrings             2.42.1   2017-03-10 Bioconductor  
 bitops                 1.0-6    2013-08-17 CRAN (R 3.3.1)
 BSgenome               1.42.0   2017-03-10 Bioconductor  
 bumphunter             1.14.0   2017-05-01 Bioconductor  
 checkmate              1.8.2    2016-11-02 CRAN (R 3.3.1)
 cluster                2.0.6    2017-03-16 CRAN (R 3.3.1)
 codetools              0.2-15   2016-10-05 CRAN (R 3.3.1)
 colorspace             1.3-2    2016-12-14 CRAN (R 3.3.1)
 data.table             1.10.4   2017-02-01 CRAN (R 3.3.1)
 DBI                    0.6-1    2017-04-01 CRAN (R 3.3.1)
 derfinder            * 1.8.5    2017-05-01 Bioconductor  
 derfinderHelper        1.8.1    2017-05-01 Bioconductor  
 devtools             * 1.12.0   2016-12-05 CRAN (R 3.3.1)
 digest                 0.6.12   2017-01-27 CRAN (R 3.3.1)
 doRNG                  1.6.6    2017-04-10 CRAN (R 3.3.1)
 foreach                1.4.3    2015-10-13 CRAN (R 3.3.1)
 foreign                0.8-68   2017-04-24 CRAN (R 3.3.1)
 Formula                1.2-1    2015-04-07 CRAN (R 3.3.1)
 GenomeInfoDb         * 1.10.3   2017-03-10 Bioconductor  
 GenomicAlignments      1.10.1   2017-05-01 Bioconductor  
 GenomicFeatures        1.26.4   2017-05-01 Bioconductor  
 GenomicFiles           1.10.3   2017-05-01 Bioconductor  
 GenomicRanges        * 1.26.4   2017-05-01 Bioconductor  
 ggplot2                2.2.1    2016-12-30 CRAN (R 3.3.1)
 gridExtra              2.2.1    2016-02-29 CRAN (R 3.3.1)
 gtable                 0.2.0    2016-02-26 CRAN (R 3.3.1)
 Hmisc                  4.0-2    2016-12-31 CRAN (R 3.3.1)
 htmlTable              1.9      2017-01-26 CRAN (R 3.3.1)
 htmltools              0.3.6    2017-04-28 CRAN (R 3.3.1)
 htmlwidgets            0.8      2016-11-09 CRAN (R 3.3.1)
 IRanges              * 2.8.2    2017-05-01 Bioconductor  
 iterators              1.0.8    2015-10-13 CRAN (R 3.3.1)
 knitr                  1.15.1   2016-11-22 CRAN (R 3.3.1)
 lattice                0.20-35  2017-03-25 CRAN (R 3.3.1)
 latticeExtra           0.6-28   2016-02-09 CRAN (R 3.3.1)
 lazyeval               0.2.0    2016-06-12 CRAN (R 3.3.1)
 locfit                 1.5-9.1  2013-04-20 CRAN (R 3.3.1)
 magrittr               1.5      2014-11-22 CRAN (R 3.3.1)
 Matrix                 1.2-10   2017-04-28 CRAN (R 3.3.1)
 matrixStats            0.52.2   2017-04-14 CRAN (R 3.3.1)
 memoise                1.1.0    2017-04-21 CRAN (R 3.3.1)
 munsell                0.4.3    2016-02-13 CRAN (R 3.3.1)
 nnet                   7.3-12   2016-02-02 CRAN (R 3.3.1)
 pkgmaker               0.22     2014-05-14 CRAN (R 3.3.1)
 plyr                   1.8.4    2016-06-08 CRAN (R 3.3.1)
 qvalue                 2.6.0    2017-05-01 Bioconductor  
 RColorBrewer           1.1-2    2014-12-07 CRAN (R 3.3.1)
 Rcpp                   0.12.10  2017-03-19 CRAN (R 3.3.1)
 RCurl                  1.95-4.8 2016-03-01 CRAN (R 3.3.1)
 registry               0.3      2015-07-08 CRAN (R 3.3.1)
 reshape2               1.4.2    2016-10-22 CRAN (R 3.3.1)
 rngtools               1.2.4    2014-03-06 CRAN (R 3.3.1)
 rpart                  4.1-11   2017-04-21 CRAN (R 3.3.1)
 Rsamtools              1.26.2   2017-05-01 Bioconductor  
 RSQLite                1.1-2    2017-01-08 CRAN (R 3.3.1)
 rtracklayer            1.34.2   2017-03-10 Bioconductor  
 S4Vectors            * 0.12.2   2017-05-01 Bioconductor  
 scales                 0.4.1    2016-11-09 CRAN (R 3.3.1)
 stringi                1.1.5    2017-04-07 CRAN (R 3.3.1)
 stringr                1.2.0    2017-02-18 CRAN (R 3.3.1)
 SummarizedExperiment   1.4.0    2017-03-10 Bioconductor  
 survival               2.41-3   2017-04-04 CRAN (R 3.3.1)
 tibble                 1.3.0    2017-04-01 CRAN (R 3.3.1)
 VariantAnnotation      1.20.3   2017-05-01 Bioconductor  
 withr                  1.0.2    2016-06-20 CRAN (R 3.3.1)
 XML                    3.98-1.6 2017-03-30 CRAN (R 3.3.1)
 xtable                 1.8-2    2016-02-05 CRAN (R 3.3.1)
 XVector                0.14.1   2017-05-01 Bioconductor  
 zlibbioc               1.20.0   2017-03-10 Bioconductor  
> 

 

derfinder • 1.5k views
ADD COMMENT
0
Entering edit mode

Hi Wayne,

Can you post the output of:

names(fullCov)

after you created it with:

fullCov <- fullCoverage(files = files, chrs = chrs_all,
    totalMapped = sapply(files,getTotalMapped, chrs = chrs_all), targetSize = 8e+07, species = "Saccharomyces_cerevisiae",currentStyle = 'NCBI')

Also, are the chromosome names in UCSC or NCBI style in your bam files?

I have a pretty good idea of what is going on, but it could be that you don't need all the complicated chromosome name changes.

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 882 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6