My ultimate goal is to be able to run derfinder on a custom GTF. But to get a hang of how to do it, I am first trying to succeed at it using a completely stock Ensembl GTF file (so no biomaRt suggestions please).
First I create a custom TxDb:
txdb <- makeTxDbFromGFF(file='~/devtestdata/Arabidopsis_thaliana.TAIR10.36.gtf',
organism='Arabidopsis thaliana',
chrominfo=Seqinfo(seqnames=c('1','2','3','4','5','Mt','Pt'), seqlengths=c(30427671,19698289,23459830,18585056,26975502,366924,154478), isCircular=c(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE), genome='TAIR10') )
No errors reported up to here.
Then (following coverage loading from 4 BAM files using loadCoverage, setting up a dataframe with two conditions, calculating the sample depths and building the models) I call derfinder:
analyzeChr(chr='4', coverageInfo=chr4Cov, txdb=txdb, models=fwdModels, groupInfo=covariates$condition, returnOutput=TRUE, chrsStyle=NULL, cutoffPre=3, cutoffFstat=5e-02, nPermute=20, seeds=NULL, writeOutput=FALSE, lowMemDir=NULL, smooth=FALSE, weights=NULL, smoothFunction=bumphunter::locfitByCluster )
And I get this error:
2017-08-18 15:02:32 analyzeChr: Pre-processing the coverage data 2017-08-18 15:02:35 analyzeChr: Calculating statistics 2017-08-18 15:02:35 calculateStats: calculating the F-statistics 2017-08-18 15:02:46 analyzeChr: Calculating pvalues 2017-08-18 15:02:46 analyzeChr: Using the following theoretical cutoff for the F-statistics 161.447638797588 2017-08-18 15:02:46 calculatePvalues: identifying data segments Error in attr(x, "tsp") <- c(1, NROW(x), 1) : attempt to set an attribute on NULL
This is the stack trace:
9: hasTsp(x)
8: start.default(position)
7: start(position)
6: start(position)
5: is(start, "Ranges")
4: IRanges(start = start(position)[runValue(position)], end = end(position)[runValue(position)])
3: .clusterMakerRle(position = position, maxGap = .advanced_argument("maxRegionGap",
0L, ...), ranges = TRUE)
2: calculatePvalues(coveragePrep = prep, models = models, fstats = fstats,
nPermute = nPermute, seeds = seeds, chr = chr, cutoff = cutoff,
lowMemDir = lowMemDir, smooth = smooth, smoothFunction = smoothFunction,
weights = weights, ...)
1: analyzeChr(chr = "4", coverageInfo = fwdCov[["4"]], txdb = txdb,
models = fwdModels, groupInfo = experiment$covariates$condition[fwd],
returnOutput = TRUE, chrsStyle = NULL, cutoffPre = 3, cutoffFstat = 0.05,
nPermute = 20, seeds = NULL, writeOutput = FALSE, lowMemDir = NULL,
smooth = FALSE, weights = NULL, smoothFunction = bumphunter::locfitByCluster)
And this is my session:
R version 3.4.1 (2017-06-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Sierra 10.12.6 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] ensembldb_2.0.4 AnnotationFilter_1.0.0 biomaRt_2.32.1 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 [5] BiocInstaller_1.26.0 Rsamtools_1.28.0 Biostrings_2.44.2 XVector_0.16.0 [9] GenomicFeatures_1.28.4 AnnotationDbi_1.38.2 treat_0.0.1 DESeq2_1.16.1 [13] SummarizedExperiment_1.6.3 DelayedArray_0.2.7 matrixStats_0.52.2 Biobase_2.36.2 [17] GenomicRanges_1.28.4 GenomeInfoDb_1.12.2 IRanges_2.10.2 S4Vectors_0.14.3 [21] BiocGenerics_0.22.0 data.table_1.10.4 derfinder_1.10.5 loaded via a namespace (and not attached): [1] ProtGenerics_1.8.0 bitops_1.0-6 bit64_0.9-7 httr_1.3.0 RColorBrewer_1.1-2 [6] GenomicFiles_1.12.0 tools_3.4.1 backports_1.1.0 doRNG_1.6.6 R6_2.2.2 [11] rpart_4.1-11 Hmisc_4.0-3 DBI_0.7 lazyeval_0.2.0 colorspace_1.3-2 [16] nnet_7.3-12 gridExtra_2.2.1 curl_2.8.1 bit_1.1-12 compiler_3.4.1 [21] htmlTable_1.9 pkgmaker_0.22 rtracklayer_1.36.4 scales_0.4.1 checkmate_1.8.3 [26] genefilter_1.58.1 stringr_1.2.0 digest_0.6.12 foreign_0.8-69 base64enc_0.1-3 [31] pkgconfig_2.0.1 htmltools_0.3.6 BSgenome_1.44.0 htmlwidgets_0.9 rlang_0.1.2 [36] RSQLite_2.0 shiny_1.0.4 BiocParallel_1.10.1 acepack_1.4.1 VariantAnnotation_1.22.3 [41] RCurl_1.95-4.8 magrittr_1.5 GenomeInfoDbData_0.99.0 Formula_1.2-2 Matrix_1.2-11 [46] Rcpp_0.12.12 munsell_0.4.3 yaml_2.1.14 stringi_1.1.5 zlibbioc_1.22.0 [51] plyr_1.8.4 qvalue_2.8.0 bumphunter_1.16.0 AnnotationHub_2.8.2 grid_3.4.1 [56] blob_1.1.0 crayon_1.3.2 lattice_0.20-35 splines_3.4.1 annotate_1.54.0 [61] derfinderHelper_1.10.0 locfit_1.5-9.1 knitr_1.17 rngtools_1.2.4 geneplotter_1.54.0 [66] reshape2_1.4.2 codetools_0.2-15 XML_3.98-1.9 latticeExtra_0.6-28 httpuv_1.3.5 [71] foreach_1.4.3 testthat_1.0.2 gtable_0.2.0 ggplot2_2.2.1 mime_0.5 [76] xtable_1.8-2 survival_2.41-3 tibble_1.3.3 iterators_1.0.8 GenomicAlignments_1.12.2 [81] registry_0.3 memoise_1.1.0 cluster_2.0.6 interactiveDisplayBase_1.14.0
I am assuming I'm not setting up the TxDb properly but it is not obvious to me what is going on.
(The BAM files are aligned to the TAIR10 assembly, so there should be no mismatch from that)
Any help much appreciated!

Hi,
The TxDb is not an issue here since the TxDb is used later by analyzeChr().
It looks like the F-stat cutoff might be too high, leaving no regions. But that use case should work. Here I run the example code with a very high F-stat cutoff:
```
> ## Collapse the coverage information > collapsedFull <- collapseFullCoverage(list(genomeData$coverage), + verbose = TRUE) 2017-08-18 12:11:42 collapseFullCoverage: Sorting fullCov 2017-08-18 12:11:42 collapseFullCoverage: Collapsing chromosomes information by sample > > ## Calculate library size adjustments > sampleDepths <- sampleDepth(collapsedFull, probs = c(0.5), nonzero=TRUE, + verbose=TRUE) 2017-08-18 12:11:42 sampleDepth: Calculating sample quantiles 2017-08-18 12:11:42 sampleDepth: Calculating sample adjustments > > ## Build the models > groupInfo <- genomeInfo$pop > adjustvars <- data.frame(genomeInfo$gender) > models <- makeModels(sampleDepths, testvars=groupInfo, adjustvars=adjustvars) > > ## Analyze the chromosome > results <- analyzeChr(chr='21', coverageInfo=genomeData, models=models, + cutoffFstat=1e10, cutoffType='manual', groupInfo=groupInfo, mc.cores=1, + writeOutput=FALSE, returnOutput=TRUE, runAnnotation = FALSE) extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens 2017-08-18 12:11:43 analyzeChr: Pre-processing the coverage data 2017-08-18 12:11:43 analyzeChr: Calculating statistics 2017-08-18 12:11:43 calculateStats: calculating the F-statistics 2017-08-18 12:11:43 analyzeChr: Calculating pvalues 2017-08-18 12:11:43 analyzeChr: Using the following manual cutoff for the F-statistics 1e+10 2017-08-18 12:11:43 calculatePvalues: identifying data segments 2017-08-18 12:11:43 findRegions: segmenting information 2017-08-18 12:11:43 findRegions: found no segments to work with!! 2017-08-18 12:11:43 analyzeChr: Annotating regions > results$regions $regions NULL $nullStats NULL $nullWidths NULL $nullPermutation NULL > packageVersion('derfinder') [1] ‘1.11.6’```
I wonder how the output of the following function call looks like:
```
```
Best,
Leonardo
Hi Leonardo,
preprocessCoverage() seems to be working fine:
$coverageProcessed DataFrame with 18585056 rows and 4 columns col0_Sample_01-Col-0_fwd.bam col0_Sample_02-Col-0_fwd.bam fpa8_Sample_19-fpa-8_fwd.bam fpa8_Sample_20-fpa-8_fwd.bam <Rle> <Rle> <Rle> <Rle> 1 5 5 5 5 2 5 5 5 5 3 5 5 5 5 4 5 5 5 5 5 5 5 5 5 ... ... ... ... ... 18585052 5 5 5 5 18585053 5 5 5 5 18585054 5 5 5 5 18585055 5 5 5 5 18585056 5 5 5 5 $mclapplyIndex $mclapplyIndex[[1]] logical-Rle of length 18585056 with 2 runs Lengths: 5000000 13585056 Values : TRUE FALSE $mclapplyIndex[[2]] logical-Rle of length 18585056 with 3 runs Lengths: 5000000 5000000 8585056 Values : FALSE TRUE FALSE $mclapplyIndex[[3]] logical-Rle of length 18585056 with 3 runs Lengths: 10000000 5000000 3585056 Values : FALSE TRUE FALSE $mclapplyIndex[[4]] logical-Rle of length 18585056 with 2 runs Lengths: 15000000 3585056 Values : FALSE TRUE $position NULL $meanCoverage numeric-Rle of length 18585056 with 2343537 runs Lengths: 1419 9 2356 13 293 7 86 7 26 47 1 16 8 11 9 8 15 2 7 17 9 11 22 ... 2 1 1 1 7 25 67 1 4 1860 109 606 118 197 56 25 125 95 7 85 63 1008 Values : 0 0.25 0 0.25 0 0.25 0 0.25 1 1.25 3.75 3.25 3.5 3.75 4 4.25 6.75 7.75 7.5 7.25 7.75 7.5 8 ... 4 3.75 3 2.5 2.25 4.75 4 3.75 2.5 0 5 0 1 0 0.25 0 0.5 0 0.25 0.5 0.25 0 $groupMeans $groupMeans$col0 numeric-Rle of length 18585056 with 1326213 runs Lengths: 4190 73 1 52 15 75 7 44 9 15 64 11 1354 49 98 52 2178 11 1411 51 87 134 12 51 592 9 ... 3 1 2 8 2 1 1 1 1 1 1 1 1 1 40 1 3 1 8 25 67 1 4 1860 109 2385 Values : 0 1 6 5 10 12 14 9 14 9 7 2 0 5 10 5 0 0.5 0 2 4 0 4 2 0 0.5 ... 91.5 92 86 86.5 80.5 64 58 53.5 44 43.5 28.5 20 18 18.5 19 9.5 6.5 5 4.5 9.5 8 7.5 5 0 10 0 $groupMeans$fpa8 numeric-Rle of length 18585056 with 2163864 runs Lengths: 1419 9 2356 13 293 7 86 7 26 64 8 11 9 25 7 17 9 11 22 7 7 17 8 11 8 30 ... 1 1 1 2 1 2 3 6 6 14 12 1 2 2 1 2679 118 197 56 25 125 95 7 85 63 1008 Values : 0 0.5 0 0.5 0 0.5 0 0.5 1 1.5 2 2.5 3 3.5 3 2.5 3.5 3 4 5 5.5 9.5 9 8.5 8 7.5 ... 15 13 12.5 7.5 6 4 4.5 5 4.5 3.5 2.5 2 1.5 1 0.5 0 2 0 0.5 0 1 0 0.5 1 0.5 0How did you create
and can you show me the head of it?
Thanks. It looks like it has to do with position = NULL.
BINGO!
I was passing
cutoff=NULLtoloadCoverage()in an attempt to disable filtering at that stage (sinceanalyzeChr()also has acutoffPre) while keeping the output structure in the format recognised byanalyzeChr().Since the function did not complain, I assumed it was a valid value. I now tried passingcutoff=1instead, which is probably more sensible overall, andanalyzeChr()does not crash.What is the difference between
cutoffinfilterData()andcutoffPreinanalyzeChr()? Why are two cutoffs needed?Although I'm using the latest R and Bioconductor available for MacOSX, I noticed your version of derfinder is newer. Is there any chance that you recently fixed something relevant to this?
I'm using Bioc-devel but your version of derfinder and mine are the same in terms of these functions.
I've tried different values for the F cutoff from 0.9 to 0.0005 and all of them crashed. It doesn't seem to depend on the value.
It looks like this is a dispatch issue, perhaps caused by some weird corruption of the methods cache. It might help to reinstall some of the packages involved, like BiocGenerics and S4Vectors.
Hi Michael,
I nuked all of my R and re-installed everything fresh. The error persists.