Question

derfinder gets stuck in calculatePvalues

0

Entering edit mode

drramki.chop • 0

@drramkichop-8164

Last seen 6.4 years ago

United States

I am using 'derfinder' to find differentially expressed regions using RNA-seq data. The tool works until some point and gets stuck after calculating p-values. Some of the intermediate files are created but it doesn't finish the analysis. Here is the log:

R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Error in tools:::httpdPort <= 0L :
comparison (4) is possible only for atomic and list types
> library(derfinder)
Find out what's changed in derfinder with
news(Version == "1.2.0", package = "derfinder")
> library(GenomicRanges)

> library(BiocParallel)

> setwd("~/Research/Ring/rnaSeq/bams/");
>
> files <- Sys.glob(file.path(".","*chr20*.bam"))
> fullCov <- fullCoverage(files=files,chrs='20')
2015-06-12 14:07:26 fullCoverage: processing chromosome 20
2015-06-12 14:07:26 loadCoverage: finding chromosome lengths
2015-06-12 14:07:26 loadCoverage: loading BAM file ./E11-1.chr20.sorted.bam
2015-06-12 14:07:31 loadCoverage: loading BAM file ./E11-2.chr20.sorted.bam
2015-06-12 14:07:33 loadCoverage: loading BAM file ./E8-1.chr20.sorted.bam
2015-06-12 14:07:36 loadCoverage: loading BAM file ./E8-2.chr20.sorted.bam
2015-06-12 14:07:39 loadCoverage: applying the cutoff to the merged data
2015-06-12 14:07:39 filterData: originally there were 63025520 rows, now there are 63025520 rows. Meaning that 0 percent was filtered.
extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
> filteredCov <- lapply(fullCov, filterData, cutoff = 2)
2015-06-12 14:07:40 filterData: originally there were 63025520 rows, now there are 3580364 rows. Meaning that 94.32 percent was filtered.
> sampleDepths <- sampleDepth(collapseFullCoverage(fullCov), 1)
2015-06-12 14:07:40 sampleDepth: Calculating sample quantiles
2015-06-12 14:07:40 sampleDepth: Calculating sample adjustments
> pheno <- data.frame(group = c("Ring","Ring","WT","WT"))
>
> models <- makeModels(sampleDepths, testvars = pheno$group)
> originalWd <- getwd()
> setwd(file.path(originalWd, 'analysisResults'))
>
> results <- analyzeChr(chr = '20', filteredCov$chr20, models, cutoffType='manual', mc.cores=6, method = 'regular', runAnnotation = 'false', BPPARAM.custom = MulticoreParam(workers = 6),
+ groupInfo = pheno$group, writeOutput = FALSE, cutoffFstat = 5e-02, nPermute = 5, seeds = 20140923 + seq_len(5), returnOutput = TRUE)
extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
2015-06-12 14:09:07 analyzeChr: Pre-processing the coverage data
2015-06-12 14:09:18 analyzeChr: Calculating statistics
2015-06-12 14:09:18 calculateStats: calculating the F-statistics
2015-06-12 14:09:21 analyzeChr: Calculating pvalues
2015-06-12 14:09:21 analyzeChr: Using the following manual cutoff for the F-statistics 0.05
2015-06-12 14:09:21 calculatePvalues: identifying data segments
2015-06-12 14:09:21 findRegions: segmenting F-stats information
2015-06-12 14:09:21 findRegions: identifying candidate regions
2015-06-12 14:09:21 findRegions: identifying region clusters
2015-06-12 14:09:48 calculatePvalues: calculating F-statistics for permutation 1 and seed 20140924
2015-06-12 14:09:51 findRegions: segmenting F-stats information
2015-06-12 14:09:51 findRegions: identifying candidate regions
2015-06-12 14:09:51 calculatePvalues: calculating F-statistics for permutation 2 and seed 20140925
2015-06-12 14:09:53 findRegions: segmenting F-stats information
2015-06-12 14:09:53 findRegions: identifying candidate regions
2015-06-12 14:09:54 calculatePvalues: calculating F-statistics for permutation 3 and seed 20140926
2015-06-12 14:09:56 findRegions: segmenting F-stats information
2015-06-12 14:09:56 findRegions: identifying candidate regions
2015-06-12 14:09:56 calculatePvalues: calculating F-statistics for permutation 4 and seed 20140927
2015-06-12 14:09:58 findRegions: segmenting F-stats information
2015-06-12 14:09:59 findRegions: identifying candidate regions
2015-06-12 14:09:59 calculatePvalues: calculating F-statistics for permutation 5 and seed 20140928
2015-06-12 14:10:01 findRegions: segmenting F-stats information
2015-06-12 14:10:01 findRegions: identifying candidate regions
2015-06-12 14:10:01 calculatePvalues: calculating the p-values

It just gets stuck here. Any suggestions to fix?

derfinder software error rnaseq • 2.1k views

ADD COMMENT • link 8.9 years ago • updated 8.8 years ago drramki.chop • 0

0

Entering edit mode

Hi,

Can you please provide the session information? Thanks!

options(width = 120)
devtools::session_info()

Best,

Leonardo

ADD REPLY • link 8.9 years ago Leonardo Collado Torres ★ 1.0k

0

Entering edit mode

drramki.chop • 0

@drramkichop-8164

Last seen 6.4 years ago

United States

Sorry for not providing the session info. However, I upgraded the package to the latest version and it is working fine.

Thanks for the support.

ADD COMMENT • link 8.8 years ago drramki.chop • 0

0

Entering edit mode

No problem and I'm glad that it's working for you now.

ADD REPLY • link 8.8 years ago Leonardo Collado Torres ★ 1.0k

0

Entering edit mode

I am looking to find contiguous regions that are differentially expressed b/w cases and controls. derfinder finds 33k+ in one chromosome (chr 20 in his case) regions that are really short and of course, clusters (most of them are same as what we find in our DESeq/edgeR experiments as expected). When I use plotClusters, the plot's title has p-values for clusters and is there a way to extract co-ordinates for clusters and the respective p-values?

ADD REPLY • link 8.8 years ago drramki.chop • 0

0

Entering edit mode

Hi,

This is really a new question instead of a comment. In the future, I encourage you to post new questions as such so other users will be able to find it and hopefully take advantage of the answer(s).

derfinderPlot::plotCluster() uses the p-value from the selected region inside the cluster for the plot title as shown in https://github.com/leekgroup/derfinderPlot/blob/master/R/plotCluster.R#L135

What you are asking for is not something we have developed code for. That is, separate the DERs into clusters and calculate cluster specific p-values. Before going down this route, consider using a lower cutoff, so your small regions will get merged into longer ones. You should also carefully choose the distance until which you consider neighboring DERs to be part of the same cluster given your biological problem at hand.

If you still want to go down the route of merging DERs, I wrote you some code. In the first section, I separate DERs into clusters and store them in a GRangesList object. To get a cluster level p-value, you can summarize the region level p-values. I haven't given much thought into this, so please think it over before using the code as is.

On the second section, I use the information from the null regions to get null areas for clusters with the same number of regions as those observed. For doing so, I ignore the distance between null regions since `derfinder` currently does not save that piece of information. Maybe it should.

Let me know what you think.

Here's the code and output

Best,

Leonardo

ADD REPLY • link 8.8 years ago Leonardo Collado Torres ★ 1.0k

score 2 · Accepted Answer · 2015-06-16

Hi,

This issue has been addressed in derfinder Bioconductor-release version 1.2.1 (and Bioconductor-devel version 1.3.1) as explained in further detail at https://github.com/lcolladotor/derfinder/issues/29.

The user has not gotten back to me with the session information they had, but it's fair to assume that they were not using this version given that it has only been available from biocLite() since today.

I originally asked for the session information for completeness and also because the above github issue got resolved 5 days ago, 1 day before the user posted this question. The delay in derfinder 1.2.1's availability via biocLite() happened because the derfinder Bioconductor-release git-svn bridge was not properly setup, my bad.

Best,

Leonardo