derfinder gets stuck in calculatePvalues
2
0
Entering edit mode
@drramkichop-8164
Last seen 7.2 years ago
United States

I am using 'derfinder' to find differentially expressed regions using RNA-seq data. The tool works until some point and gets stuck after calculating p-values. Some of the intermediate files are created but it doesn't finish the analysis. Here is the log:

 

R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Error in tools:::httpdPort <= 0L : 
  comparison (4) is possible only for atomic and list types
> library(derfinder)
Find out what's changed in derfinder with
news(Version == "1.2.0", package = "derfinder")
> library(GenomicRanges)

> library(BiocParallel)

> setwd("~/Research/Ring/rnaSeq/bams/");

> files <- Sys.glob(file.path(".","*chr20*.bam"))
> fullCov <- fullCoverage(files=files,chrs='20')
2015-06-12 14:07:26 fullCoverage: processing chromosome 20
2015-06-12 14:07:26 loadCoverage: finding chromosome lengths
2015-06-12 14:07:26 loadCoverage: loading BAM file ./E11-1.chr20.sorted.bam
2015-06-12 14:07:31 loadCoverage: loading BAM file ./E11-2.chr20.sorted.bam
2015-06-12 14:07:33 loadCoverage: loading BAM file ./E8-1.chr20.sorted.bam
2015-06-12 14:07:36 loadCoverage: loading BAM file ./E8-2.chr20.sorted.bam
2015-06-12 14:07:39 loadCoverage: applying the cutoff to the merged data
2015-06-12 14:07:39 filterData: originally there were 63025520 rows, now there are 63025520 rows. Meaning that 0 percent was filtered.
extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
> filteredCov <- lapply(fullCov, filterData, cutoff = 2)
2015-06-12 14:07:40 filterData: originally there were 63025520 rows, now there are 3580364 rows. Meaning that 94.32 percent was filtered.
> sampleDepths <- sampleDepth(collapseFullCoverage(fullCov), 1)
2015-06-12 14:07:40 sampleDepth: Calculating sample quantiles
2015-06-12 14:07:40 sampleDepth: Calculating sample adjustments
> pheno <- data.frame(group = c("Ring","Ring","WT","WT"))

> models <- makeModels(sampleDepths, testvars = pheno$group)
> originalWd <- getwd()
> setwd(file.path(originalWd, 'analysisResults'))

> results <- analyzeChr(chr = '20', filteredCov$chr20, models, cutoffType='manual', mc.cores=6, method = 'regular', runAnnotation = 'false', BPPARAM.custom = MulticoreParam(workers = 6),
+                       groupInfo = pheno$group, writeOutput = FALSE, cutoffFstat = 5e-02, nPermute = 5, seeds = 20140923 + seq_len(5), returnOutput = TRUE)
extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
2015-06-12 14:09:07 analyzeChr: Pre-processing the coverage data
2015-06-12 14:09:18 analyzeChr: Calculating statistics
2015-06-12 14:09:18 calculateStats: calculating the F-statistics
2015-06-12 14:09:21 analyzeChr: Calculating pvalues
2015-06-12 14:09:21 analyzeChr: Using the following manual cutoff for the F-statistics 0.05
2015-06-12 14:09:21 calculatePvalues: identifying data segments
2015-06-12 14:09:21 findRegions: segmenting F-stats information
2015-06-12 14:09:21 findRegions: identifying candidate regions
2015-06-12 14:09:21 findRegions: identifying region clusters
2015-06-12 14:09:48 calculatePvalues: calculating F-statistics for permutation 1 and seed 20140924
2015-06-12 14:09:51 findRegions: segmenting F-stats information
2015-06-12 14:09:51 findRegions: identifying candidate regions
2015-06-12 14:09:51 calculatePvalues: calculating F-statistics for permutation 2 and seed 20140925
2015-06-12 14:09:53 findRegions: segmenting F-stats information
2015-06-12 14:09:53 findRegions: identifying candidate regions
2015-06-12 14:09:54 calculatePvalues: calculating F-statistics for permutation 3 and seed 20140926
2015-06-12 14:09:56 findRegions: segmenting F-stats information
2015-06-12 14:09:56 findRegions: identifying candidate regions
2015-06-12 14:09:56 calculatePvalues: calculating F-statistics for permutation 4 and seed 20140927
2015-06-12 14:09:58 findRegions: segmenting F-stats information
2015-06-12 14:09:59 findRegions: identifying candidate regions
2015-06-12 14:09:59 calculatePvalues: calculating F-statistics for permutation 5 and seed 20140928
2015-06-12 14:10:01 findRegions: segmenting F-stats information
2015-06-12 14:10:01 findRegions: identifying candidate regions
2015-06-12 14:10:01 calculatePvalues: calculating the p-values

It just gets stuck here. Any suggestions to fix?

derfinder software error rnaseq • 2.5k views
ADD COMMENT
0
Entering edit mode

Hi,

Can you please provide the session information? Thanks!

options(width = 120)
devtools::session_info()

Best,

Leonardo

ADD REPLY
2
Entering edit mode
@lcolladotor
Last seen 21 days ago
United States

Hi,

This issue has been addressed in derfinder Bioconductor-release version 1.2.1 (and Bioconductor-devel version 1.3.1) as explained in further detail at https://github.com/lcolladotor/derfinder/issues/29.

The user has not gotten back to me with the session information they had, but it's fair to assume that they were not using this version given that it has only been available from biocLite() since today.

I originally asked for the session information for completeness and also because the above github issue got resolved 5 days ago, 1 day before the user posted this question. The delay in derfinder 1.2.1's availability via biocLite() happened because the derfinder Bioconductor-release git-svn bridge was not properly setup, my bad.

Best,

Leonardo

ADD COMMENT
0
Entering edit mode
@drramkichop-8164
Last seen 7.2 years ago
United States

Sorry for not providing the session info. However, I upgraded the package to the latest version and it is working fine.

 

Thanks for the support.

ADD COMMENT
0
Entering edit mode

No problem and I'm glad that it's working for you now.

ADD REPLY
0
Entering edit mode

I am looking to find contiguous regions that are differentially expressed b/w cases and controls. derfinder finds 33k+ in one chromosome (chr 20 in his case) regions that are really short and of course, clusters (most of them are same as what we find in our DESeq/edgeR experiments as expected). When I use plotClusters, the plot's title has p-values for clusters and is there a way to extract co-ordinates for clusters and the respective p-values?

ADD REPLY
0
Entering edit mode

Hi,

This is really a new question instead of a comment. In the future, I encourage you to post new questions as such so other users will be able to find it and hopefully take advantage of the answer(s).

derfinderPlot::plotCluster() uses the p-value from the selected region inside the cluster for the plot title as shown in https://github.com/leekgroup/derfinderPlot/blob/master/R/plotCluster.R#L135 

 

What you are asking for is not something we have developed code for. That is, separate the DERs into clusters and calculate cluster specific p-values. Before going down this route, consider using a lower cutoff, so your small regions will get merged into longer ones. You should also carefully choose the distance until which you consider neighboring DERs to be part of the same cluster given your biological problem at hand.

If you still want to go down the route of merging DERs, I wrote you some code. In the first section, I separate DERs into clusters and store them in a GRangesList object. To get a cluster level p-value, you can summarize the region level p-values. I haven't given much thought into this, so please think it over before using the code as is.

On the second section, I use the information from the null regions to get null areas for clusters with the same number of regions as those observed. For doing so, I ignore the distance between null regions since `derfinder` currently does not save that piece of information. Maybe it should. 

Let me know what you think.

Here's the code and output 

> ## See https://support.bioconductor.org/p/68709 for context
>
> ## Required packages
> library('derfinder')
Warning message:
replacing previous import by ‘scales::alpha’ when loading ‘Hmisc’
> library('GenomicRanges')
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap,
parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB
The following object is masked from ‘package:stats’:
xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, do.call,
duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted,
lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table,
tapply, union, unique, unlist, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: IRanges
Loading required package: GenomeInfoDb
> library('devtools')
>
> #### First section
>
> ## Get regions
> example('analyzeChr', 'derfinder')
anlyzC> ## Collapse the coverage information
anlyzC> collapsedFull <- collapseFullCoverage(list(genomeData$coverage),
anlyzC+ verbose = TRUE)
2015-07-10 12:32:54 collapseFullCoverage: Sorting fullCov
2015-07-10 12:32:54 collapseFullCoverage: Collapsing chromosomes information by sample
anlyzC> ## Calculate library size adjustments
anlyzC> sampleDepths <- sampleDepth(collapsedFull, probs = c(0.5), nonzero=TRUE,
anlyzC+ verbose=TRUE)
2015-07-10 12:32:54 sampleDepth: Calculating sample quantiles
2015-07-10 12:32:54 sampleDepth: Calculating sample adjustments
anlyzC> ## Build the models
anlyzC> groupInfo <- genomeInfo$pop
anlyzC> adjustvars <- data.frame(genomeInfo$gender)
anlyzC> models <- makeModels(sampleDepths, testvars=groupInfo, adjustvars=adjustvars)
anlyzC> ## Analyze the chromosome
anlyzC> results <- analyzeChr(chr='21', coverageInfo=genomeData, models=models,
anlyzC+ cutoffFstat=1, cutoffType='manual', groupInfo=groupInfo, mc.cores=1,
anlyzC+ writeOutput=FALSE, returnOutput=TRUE, method='regular',
anlyzC+ runAnnotation = FALSE)
extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
2015-07-10 12:32:54 analyzeChr: Pre-processing the coverage data
2015-07-10 12:32:54 analyzeChr: Calculating statistics
2015-07-10 12:32:54 calculateStats: calculating the F-statistics
2015-07-10 12:32:54 analyzeChr: Calculating pvalues
2015-07-10 12:32:54 analyzeChr: Using the following manual cutoff for the F-statistics 1
2015-07-10 12:32:54 calculatePvalues: identifying data segments
2015-07-10 12:32:54 findRegions: segmenting F-stats information
2015-07-10 12:32:54 findRegions: identifying candidate regions
2015-07-10 12:32:54 findRegions: identifying region clusters
2015-07-10 12:32:54 calculatePvalues: calculating F-statistics for permutation 1 and seed 20150711
2015-07-10 12:32:54 findRegions: segmenting F-stats information
2015-07-10 12:32:54 findRegions: identifying candidate regions
2015-07-10 12:32:55 calculatePvalues: calculating the p-values
2015-07-10 12:32:55 analyzeChr: Annotating regions
anlyzC> names(results)
[1] "timeinfo" "optionsStats" "coveragePrep" "fstats" "regions" "annotation"
> regs <- results$regions$regions
>
> ## Note that all of them are from the same chromosome.
> ## If you have data for more than one chromsome, then you can split the
> ## GRanges object into a list, with one element per chromosome.
> table(seqnames(regs))
chr21
33
>
> ## Split regions by cluster
> clus <- endoapply(split(regs, regs$cluster), identity)
>
> ## Calculate the total area for each cluster
> mcols(clus)$area <- sapply(clus, function(x) { sum(x$area)})
>
>
> ## Order clusters by area
> clus <- clus[order(mcols(clus)$area, decreasing = TRUE)]
>
> ## Check the first two clusters by area
> clus[1:2]
GRangesList object of length 2:
$6
GRanges object with 4 ranges and 14 metadata columns:
seqnames ranges strand | value area indexStart indexEnd cluster
<Rle> <IRanges> <Rle> | <numeric> <numeric> <integer> <integer> <Rle>
up chr21 [47411924, 47411986] * | 7.096704 447.09232 618 680 6
up chr21 [47412088, 47412131] * | 3.344261 147.14747 691 734 6
up chr21 [47412277, 47412312] * | 3.575484 128.71743 735 770 6
up chr21 [47412078, 47412083] * | 2.293818 13.76291 681 686 6
clusterL meanCoverage meanCEU meanYRI log2FoldChangeYRIvsCEU pvalues significant
<Rle> <numeric> <numeric> <numeric> <numeric> <numeric> <factor>
up 389 0.5847414 0.8374906 0.05396825 -3.955890 0.01098901 TRUE
up 389 0.4464809 0.6590909 0.00000000 -Inf 0.09890110 FALSE
up 389 0.5053763 0.7460317 0.00000000 -Inf 0.14285714 FALSE
up 389 0.4139785 0.6031746 0.01666667 -5.177538 0.64835165 FALSE
qvalues significantQval
<numeric> <factor>
up 0.01193762 TRUE
up 0.05371929 TRUE
up 0.05968810 TRUE
up 0.12555084 FALSE
$3
GRanges object with 2 ranges and 14 metadata columns:
seqnames ranges strand | value area indexStart indexEnd cluster
up chr21 [47409522, 47409560] * | 15.053644 587.09211 128 166 3
up chr21 [47409685, 47409688] * | 1.215855 4.86342 192 195 3
clusterL meanCoverage meanCEU meanYRI log2FoldChangeYRIvsCEU pvalues significant
up 167 0.4152192 0.6129426 0.0 -Inf 0.01098901 TRUE
up 167 0.4758065 0.6547619 0.1 -2.71097 0.80219780 FALSE
qvalues significantQval
up 0.01193762 TRUE
up 0.14055586 FALSE
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
> ## One option to get p-values for clusters is to take the minimum, or mean p-value
> mcols(clus)$minPval <- sapply(clus, function(x) min(x$pvalues))
> mcols(clus)$meanPval <- sapply(clus, function(x) mean(x$pvalues))
> mcols(clus)$maxPval <- sapply(clus, function(x) max(x$pvalues))
>
> ## Note that the function you choose to summarize the region level p-values into
> ## cluster level p-values will matter. Using min() is less conservative than
> ## using max().
>
>
> #### Second section
>
> ## Get the null areas
> null <- results$regions$nullStats * results$regions$nullWidths
>
> ## Get some null clusters with number of elements equal to those observed.
> ## Note that this code doesn't consider the distance between the null regions.
> ## So take these numbers with caution.
> n <- 100
> set.seed(20150710) ## Choose your own seed
> index <- sample(seq_len(length(null)), n * sum(elementLengths(clus)), replace = TRUE)
> groups <- factor(rep(rep(names(clus), elementLengths(clus)), n), levels = names(clus))
> ## Check that each corresponds to elementLenths(clus) * n
> table(groups)
groups
6 3 10 2 5 9 7 8 4 1 11
400 200 500 300 600 200 100 100 600 200 100
>
> ## Split null areas into resampled groups
> area_by_clus <- split(null[index], groups)
>
> ## Split by number of regions in each cluster, calculate null areas
> clus_null <- unlist(mapply(function(areas, nRegs, n) {
+ sapply(split(areas, rep(seq_len(n), each = nRegs)), sum)
+ }, area_by_clus, elementLengths(clus), MoreArgs = list(n = n), SIMPLIFY = FALSE), use.names = FALSE)
>
> ## Calculate (approximated) p-values for clusters
> mcols(clus)$pvalue <- derfinder:::.calcPval(mcols(clus)$area, clus_null)
>
> ## View them all
> mcols(clus)
DataFrame with 11 rows and 5 columns
area minPval meanPval maxPval pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
1 736.7201 0.01098901 0.2252747 0.6483516 0.0009082652
2 591.9555 0.01098901 0.4065934 0.8021978 0.0081743869
3 535.0815 0.01098901 0.4549451 0.8021978 0.0145322434
4 479.5088 0.01098901 0.1868132 0.4065934 0.0281562216
5 384.6838 0.03296703 0.3992674 0.5824176 0.0708446866
... ... ... ... ... ...
7 357.286601 0.01098901 0.01098901 0.01098901 0.09173479
8 208.429718 0.03296703 0.03296703 0.03296703 0.30336058
9 108.674290 0.47252747 0.60805861 0.93406593 0.53405995
10 94.934229 0.31868132 0.39560440 0.47252747 0.60581290
11 3.997655 0.84615385 0.84615385 0.84615385 0.96457766
>
> ## Use p.adjust() or qvalue::qvalue() to control the FDR
> p.adjust(mcols(clus)$pvalue, method = 'BH')
[1] 0.009990917 0.044959128 0.053284893 0.077429609 0.141537996 0.141537996 0.144154665
[8] 0.417120799 0.652739933 0.666394187 0.964577657
>
>
> ## Reproducibility info
> options(width = 120)
> session_info()
Session info -----------------------------------------------------------------------------------------------------------
setting value
version R version 3.2.0 Patched (2015-05-18 r68382)
system x86_64, darwin10.8.0
ui AQUA
language (EN)
collate en_US.UTF-8
tz America/New_York
Packages ---------------------------------------------------------------------------------------------------------------
package * version date source
acepack 1.3-3.3 2014-11-24 CRAN (R 3.2.0)
AnnotationDbi 1.31.16 2015-06-10 Bioconductor
Biobase 2.29.1 2015-05-03 Bioconductor
BiocGenerics * 0.15.2 2015-06-05 Bioconductor
BiocParallel 1.3.25 2015-06-12 Bioconductor
biomaRt 2.25.1 2015-04-21 Bioconductor
Biostrings 2.37.2 2015-05-07 Bioconductor
bitops 1.0-6 2013-08-17 CRAN (R 3.2.0)
bumphunter 1.9.1 2015-05-05 Bioconductor
cluster 2.0.1 2015-01-31 CRAN (R 3.2.0)
codetools 0.2-11 2015-03-10 CRAN (R 3.2.0)
colorspace 1.2-6 2015-03-11 CRAN (R 3.2.0)
curl 0.9 2015-06-19 CRAN (R 3.2.1)
DBI 0.3.1 2014-09-24 CRAN (R 3.2.0)
derfinder * 1.3.3 2015-06-15 Bioconductor
derfinderHelper 1.3.0 2015-04-17 Bioconductor
devtools * 1.8.0 2015-05-09 CRAN (R 3.2.0)
digest 0.6.8 2014-12-31 CRAN (R 3.2.0)
doRNG 1.6 2014-03-07 CRAN (R 3.2.0)
foreach 1.4.2 2014-04-11 CRAN (R 3.2.0)
foreign 0.8-63 2015-02-20 CRAN (R 3.2.0)
Formula 1.2-1 2015-04-07 CRAN (R 3.2.0)
futile.logger 1.4.1 2015-04-20 CRAN (R 3.2.0)
futile.options 1.0.0 2010-04-06 CRAN (R 3.2.0)
GenomeInfoDb * 1.5.7 2015-06-02 Bioconductor
GenomicAlignments 1.5.9 2015-05-13 Bioconductor
GenomicFeatures 1.21.13 2015-06-09 Bioconductor
GenomicFiles 1.5.3 2015-05-13 Bioconductor
GenomicRanges * 1.21.15 2015-06-09 Bioconductor
ggplot2 1.0.1.9000 2015-06-12 Github (hadley/ggplot2@3b6a126)
git2r 0.10.1 2015-05-07 CRAN (R 3.2.0)
gridExtra 0.9.1 2012-08-09 CRAN (R 3.2.0)
gtable 0.1.2 2012-12-05 CRAN (R 3.2.0)
Hmisc 3.16-0 2015-04-30 CRAN (R 3.2.0)
IRanges * 2.3.12 2015-06-16 Bioconductor
iterators 1.0.7 2014-04-11 CRAN (R 3.2.0)
lambda.r 1.1.7 2015-03-20 CRAN (R 3.2.0)
lattice 0.20-31 2015-03-30 CRAN (R 3.2.0)
latticeExtra 0.6-26 2013-08-15 CRAN (R 3.2.0)
locfit 1.5-9.1 2013-04-20 CRAN (R 3.2.0)
magrittr 1.5 2014-11-22 CRAN (R 3.2.0)
MASS 7.3-40 2015-03-21 CRAN (R 3.2.0)
Matrix 1.2-1 2015-06-01 CRAN (R 3.2.0)
matrixStats 0.14.0 2015-02-14 CRAN (R 3.2.0)
memoise 0.2.1 2014-04-22 CRAN (R 3.2.0)
munsell 0.4.2 2013-07-11 CRAN (R 3.2.0)
nnet 7.3-9 2015-02-11 CRAN (R 3.2.0)
pkgmaker 0.22 2014-05-14 CRAN (R 3.2.0)
plyr 1.8.3 2015-06-12 CRAN (R 3.2.0)
proto 0.3-10 2012-12-22 CRAN (R 3.2.0)
qvalue 2.1.0 2015-04-17 Bioconductor
RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.2.0)
Rcpp 0.11.6 2015-05-01 CRAN (R 3.2.0)
RCurl 1.95-4.6 2015-04-24 CRAN (R 3.2.0)
registry 0.2 2012-01-24 CRAN (R 3.2.0)
reshape2 1.4.1 2014-12-06 CRAN (R 3.2.0)
rngtools 1.2.4 2014-03-06 CRAN (R 3.2.0)
rpart 4.1-9 2015-02-24 CRAN (R 3.2.0)
Rsamtools 1.21.8 2015-06-03 Bioconductor
RSQLite 1.0.0 2014-10-25 CRAN (R 3.2.0)
rtracklayer 1.29.10 2015-06-13 Bioconductor
rversions 1.0.1 2015-06-06 CRAN (R 3.2.0)
S4Vectors * 0.7.6 2015-06-16 Bioconductor
scales 0.2.5 2015-06-12 CRAN (R 3.2.0)
stringi 0.5-5 2015-06-29 CRAN (R 3.2.1)
stringr 1.0.0 2015-04-30 CRAN (R 3.2.0)
SummarizedExperiment 0.1.5 2015-05-17 Bioconductor
survival 2.38-2 2015-06-12 CRAN (R 3.2.0)
XML 3.98-1.2 2015-05-31 CRAN (R 3.2.0)
xml2 0.1.1 2015-06-02 CRAN (R 3.2.0)
xtable 1.7-4 2014-09-12 CRAN (R 3.2.0)
XVector 0.9.1 2015-04-30 Bioconductor
zlibbioc 1.15.0 2015-04-17 Bioconductor
> proc.time()
user system elapsed
13.672 0.369 14.119
> Sys.time()
[1] "2015-07-10 12:32:59 EDT"
>
## See https://support.bioconductor.org/p/68709 for context
## Required packages
library('derfinder')
library('GenomicRanges')
library('devtools')
#### First section
## Get regions
example('analyzeChr', 'derfinder')
regs <- results$regions$regions
## Note that all of them are from the same chromosome.
## If you have data for more than one chromsome, then you can split the
## GRanges object into a list, with one element per chromosome.
table(seqnames(regs))
## Split regions by cluster
clus <- endoapply(split(regs, regs$cluster), identity)
## Calculate the total area for each cluster
mcols(clus)$area <- sapply(clus, function(x) { sum(x$area)})
## Order clusters by area
clus <- clus[order(mcols(clus)$area, decreasing = TRUE)]
## Check the first two clusters by area
clus[1:2]
## One option to get p-values for clusters is to take the minimum, or mean p-value
mcols(clus)$minPval <- sapply(clus, function(x) min(x$pvalues))
mcols(clus)$meanPval <- sapply(clus, function(x) mean(x$pvalues))
mcols(clus)$maxPval <- sapply(clus, function(x) max(x$pvalues))
## Note that the function you choose to summarize the region level p-values into
## cluster level p-values will matter. Using min() is less conservative than
## using max().
#### Second section
## Get the null areas
null <- results$regions$nullStats * results$regions$nullWidths
## Get some null clusters with number of elements equal to those observed.
## Note that this code doesn't consider the distance between the null regions.
## So take these numbers with caution.
n <- 100
set.seed(20150710) ## Choose your own seed
index <- sample(seq_len(length(null)), n * sum(elementLengths(clus)), replace = TRUE)
groups <- factor(rep(rep(names(clus), elementLengths(clus)), n), levels = names(clus))
## Check that each corresponds to elementLenths(clus) * n
table(groups)
## Split null areas into resampled groups
area_by_clus <- split(null[index], groups)
## Split by number of regions in each cluster, calculate null areas
clus_null <- unlist(mapply(function(areas, nRegs, n) {
sapply(split(areas, rep(seq_len(n), each = nRegs)), sum)
}, area_by_clus, elementLengths(clus), MoreArgs = list(n = n), SIMPLIFY = FALSE), use.names = FALSE)
## Calculate (approximated) p-values for clusters
mcols(clus)$pvalue <- derfinder:::.calcPval(mcols(clus)$area, clus_null)
## View them all
mcols(clus)
## Use p.adjust() or qvalue::qvalue() to control the FDR
p.adjust(mcols(clus)$pvalue, method = 'BH')
## Reproducibility info
options(width = 120)
session_info()
proc.time()
Sys.time()
view raw clusterPvalue.R hosted with ❤ by GitHub

 

Best,

Leonardo

ADD REPLY

Login before adding your answer.

Traffic: 983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6