My ultimate goal is to be able to run derfinder on a custom GTF. But to get a hang of how to do it, I am first trying to succeed at it using a completely stock Ensembl GTF file (so no biomaRt suggestions please).
First I create a custom TxDb:
txdb <- makeTxDbFromGFF(file='~/devtestdata/Arabidopsis_thaliana.TAIR10.36.gtf', organism='Arabidopsis thaliana', chrominfo=Seqinfo(seqnames=c('1','2','3','4','5','Mt','Pt'), seqlengths=c(30427671,19698289,23459830,18585056,26975502,366924,154478), isCircular=c(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE), genome='TAIR10') )
No errors reported up to here.
Then (following coverage loading from 4 BAM files using loadCoverage
, setting up a dataframe with two conditions, calculating the sample depths and building the models) I call derfinder:
analyzeChr(chr='4', coverageInfo=chr4Cov, txdb=txdb, models=fwdModels, groupInfo=covariates$condition, returnOutput=TRUE, chrsStyle=NULL, cutoffPre=3, cutoffFstat=5e-02, nPermute=20, seeds=NULL, writeOutput=FALSE, lowMemDir=NULL, smooth=FALSE, weights=NULL, smoothFunction=bumphunter::locfitByCluster )
And I get this error:
2017-08-18 15:02:32 analyzeChr: Pre-processing the coverage data 2017-08-18 15:02:35 analyzeChr: Calculating statistics 2017-08-18 15:02:35 calculateStats: calculating the F-statistics 2017-08-18 15:02:46 analyzeChr: Calculating pvalues 2017-08-18 15:02:46 analyzeChr: Using the following theoretical cutoff for the F-statistics 161.447638797588 2017-08-18 15:02:46 calculatePvalues: identifying data segments Error in attr(x, "tsp") <- c(1, NROW(x), 1) : attempt to set an attribute on NULL
This is the stack trace:
9: hasTsp(x) 8: start.default(position) 7: start(position) 6: start(position) 5: is(start, "Ranges") 4: IRanges(start = start(position)[runValue(position)], end = end(position)[runValue(position)]) 3: .clusterMakerRle(position = position, maxGap = .advanced_argument("maxRegionGap", 0L, ...), ranges = TRUE) 2: calculatePvalues(coveragePrep = prep, models = models, fstats = fstats, nPermute = nPermute, seeds = seeds, chr = chr, cutoff = cutoff, lowMemDir = lowMemDir, smooth = smooth, smoothFunction = smoothFunction, weights = weights, ...) 1: analyzeChr(chr = "4", coverageInfo = fwdCov[["4"]], txdb = txdb, models = fwdModels, groupInfo = experiment$covariates$condition[fwd], returnOutput = TRUE, chrsStyle = NULL, cutoffPre = 3, cutoffFstat = 0.05, nPermute = 20, seeds = NULL, writeOutput = FALSE, lowMemDir = NULL, smooth = FALSE, weights = NULL, smoothFunction = bumphunter::locfitByCluster)
And this is my session:
R version 3.4.1 (2017-06-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Sierra 10.12.6 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] ensembldb_2.0.4 AnnotationFilter_1.0.0 biomaRt_2.32.1 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 [5] BiocInstaller_1.26.0 Rsamtools_1.28.0 Biostrings_2.44.2 XVector_0.16.0 [9] GenomicFeatures_1.28.4 AnnotationDbi_1.38.2 treat_0.0.1 DESeq2_1.16.1 [13] SummarizedExperiment_1.6.3 DelayedArray_0.2.7 matrixStats_0.52.2 Biobase_2.36.2 [17] GenomicRanges_1.28.4 GenomeInfoDb_1.12.2 IRanges_2.10.2 S4Vectors_0.14.3 [21] BiocGenerics_0.22.0 data.table_1.10.4 derfinder_1.10.5 loaded via a namespace (and not attached): [1] ProtGenerics_1.8.0 bitops_1.0-6 bit64_0.9-7 httr_1.3.0 RColorBrewer_1.1-2 [6] GenomicFiles_1.12.0 tools_3.4.1 backports_1.1.0 doRNG_1.6.6 R6_2.2.2 [11] rpart_4.1-11 Hmisc_4.0-3 DBI_0.7 lazyeval_0.2.0 colorspace_1.3-2 [16] nnet_7.3-12 gridExtra_2.2.1 curl_2.8.1 bit_1.1-12 compiler_3.4.1 [21] htmlTable_1.9 pkgmaker_0.22 rtracklayer_1.36.4 scales_0.4.1 checkmate_1.8.3 [26] genefilter_1.58.1 stringr_1.2.0 digest_0.6.12 foreign_0.8-69 base64enc_0.1-3 [31] pkgconfig_2.0.1 htmltools_0.3.6 BSgenome_1.44.0 htmlwidgets_0.9 rlang_0.1.2 [36] RSQLite_2.0 shiny_1.0.4 BiocParallel_1.10.1 acepack_1.4.1 VariantAnnotation_1.22.3 [41] RCurl_1.95-4.8 magrittr_1.5 GenomeInfoDbData_0.99.0 Formula_1.2-2 Matrix_1.2-11 [46] Rcpp_0.12.12 munsell_0.4.3 yaml_2.1.14 stringi_1.1.5 zlibbioc_1.22.0 [51] plyr_1.8.4 qvalue_2.8.0 bumphunter_1.16.0 AnnotationHub_2.8.2 grid_3.4.1 [56] blob_1.1.0 crayon_1.3.2 lattice_0.20-35 splines_3.4.1 annotate_1.54.0 [61] derfinderHelper_1.10.0 locfit_1.5-9.1 knitr_1.17 rngtools_1.2.4 geneplotter_1.54.0 [66] reshape2_1.4.2 codetools_0.2-15 XML_3.98-1.9 latticeExtra_0.6-28 httpuv_1.3.5 [71] foreach_1.4.3 testthat_1.0.2 gtable_0.2.0 ggplot2_2.2.1 mime_0.5 [76] xtable_1.8-2 survival_2.41-3 tibble_1.3.3 iterators_1.0.8 GenomicAlignments_1.12.2 [81] registry_0.3 memoise_1.1.0 cluster_2.0.6 interactiveDisplayBase_1.14.0
I am assuming I'm not setting up the TxDb properly but it is not obvious to me what is going on.
(The BAM files are aligned to the TAIR10 assembly, so there should be no mismatch from that)
Any help much appreciated!
Hi,
The TxDb is not an issue here since the TxDb is used later by analyzeChr().
It looks like the F-stat cutoff might be too high, leaving no regions. But that use case should work. Here I run the example code with a very high F-stat cutoff:
```
```
I wonder how the output of the following function call looks like:
```
```
Best,
Leonardo
Hi Leonardo,
preprocessCoverage() seems to be working fine:
How did you create
and can you show me the head of it?
Thanks. It looks like it has to do with position = NULL.
BINGO!
I was passing
cutoff=NULL
toloadCoverage()
in an attempt to disable filtering at that stage (sinceanalyzeChr()
also has acutoffPre
) while keeping the output structure in the format recognised byanalyzeChr().
Since the function did not complain, I assumed it was a valid value. I now tried passingcutoff=1
instead, which is probably more sensible overall, andanalyzeChr()
does not crash.What is the difference between
cutoff
infilterData()
andcutoffPre
inanalyzeChr()
? Why are two cutoffs needed?Although I'm using the latest R and Bioconductor available for MacOSX, I noticed your version of derfinder is newer. Is there any chance that you recently fixed something relevant to this?
I'm using Bioc-devel but your version of derfinder and mine are the same in terms of these functions.
I've tried different values for the F cutoff from 0.9 to 0.0005 and all of them crashed. It doesn't seem to depend on the value.
It looks like this is a dispatch issue, perhaps caused by some weird corruption of the methods cache. It might help to reinstall some of the packages involved, like BiocGenerics and S4Vectors.
Hi Michael,
I nuked all of my R and re-installed everything fresh. The error persists.