tldr: using derfinder for individual chromosomes works, but merging them using do.call(c)
or do.call(rbind)
leads to errors. How to perform DRE with all chromosomes?
I have been following the tutorials for derfinder
running the analysis for an individual chromosome:
fullCov <- fullCoverage(files = files, chrs = 'chr21', verbose = FALSE)
## Get the region matrix of Expressed Regions (ERs)
regionMat <- regionMatrix(fullCov, cutoff = 30, L = 76, verbose = FALSE)
With this information at hand one can extract the count matrix which can be fed to DESeq2
:
## Round matrix
counts <- round(regionMat$chr21$coverageMatrix)
## Round matrix and specify design
dse <- DESeqDataSetFromMatrix(counts, pheno, ~ group + gender)
## Perform DE analysis
dse <- DESeq(dse, test = 'LRT', reduced = ~ gender, fitType = 'local')
And add DRE to the identified regions:
## Extract results
mcols(regionMat$chr21$regions) <- c(mcols(regionMat$chr21$regions), results(dse))
This is all very straightforward and clear, but it means doing the analysis for each individual chromosome, which might be a good idea for experiments with loads of data (or no access to enough RAM), however I would like have the differential expression results for all regions in a single analysis.
Initially I merged the individual chromosome derfinder
matrices with this script but this also returns a list with individual results, which makes sense since it is basically a do.call
c
applied to individual results:
regionMat <- do.call(c, regionMat)
class(regionMat)
[1] "list"
lapply(regionMat, class)
$chr1
[1] "list"
$chr10
[1] "list"
So I am back to sort of where I started. I also tried other ways to merge the regions for the chromosomes with other methods, but it would either lose the information of the region coordinates or the regions would have duplicated names.
I have been testing a recount
tutorial geared towards derfinder
and in this case it seems to work, that is the output of merging the chromosome regions results in unique region names. expressed_regions
calls derfinder::findRegions
and returns a Granges
obj:
## GRanges object with 328481 ranges and 6 metadata columns:
## seqnames ranges strand | value
## <Rle> <IRanges> <Rle> | <numeric>
## chr1.1 chr1 [14523, 14523] * | 5.00949192047119
## chr1.2 chr1 [14525, 14595] * | 5.70283872980467
## chr1.3 chr1 [14617, 14617] * | 5.00848007202148
## chr1.4 chr1 [14619, 14825] * | 12.3351036891845
## chr1.5 chr1 [14970, 14977] * | 5.50534588098526
## ... ... ... ... . ...
## chrY.228 chrY [56834474, 56834481] * | 5.71176493167877
## chrY.229 chrY [56834566, 56834584] * | 5.19306634601794
## chrY.230 chrY [56838897, 56838903] * | 5.12728071212769
## chrY.231 chrY [56846114, 56846130] * | 6.55811685674331
## chrY.232 chrY [56851125, 56851160] * | 8.35790546735128
## area indexStart indexEnd cluster clusterL
## <numeric> <integer> <integer> <Rle> <Rle>
## chr1.1 5.00949192047119 14523 14523 1 7012
## chr1.2 404.901549816132 14525 14595 1 7012
## chr1.3 5.00848007202148 14617 14617 1 7012
## chr1.4 2553.36646366119 14619 14825 1 7012
## chr1.5 44.0427670478821 14970 14977 1 7012
## ... ... ... ... ... ...
## chrY.228 45.6941194534302 56834474 56834481 104 2789
## chrY.229 98.6682605743408 56834566 56834584 104 2789
## chrY.230 35.8909649848938 56838897 56838903 105 7
## chrY.231 111.487986564636 56846114 56846130 106 17
## chrY.232 300.884596824646 56851125 56851160 107 36
## -------
## seqinfo: 24 sequences from an unspecified genome
and then coverageMatrix
, which in turn calls derfinder:::.railMatChrRegion
, returns a RangedSummarizedExperiment
, and not a list as derfinder
. From this point on one can extract the count matrix, which includes all chromosomes, and proceed to DESeq2
(or limma
). This however feels like a hack, and I would probably have to change the code because the data is not human.
My question is:
- What am I missing that would allow to merge regions/counts from all chromosomes? It seems to me that I am missing something obvious here.
It worked, thanks! I wonder if this could be something to add to the vignette? In any case, it is now here for those struggling with the same issue.
I added this as an example in regionMatrix() to derfinder version 1.13.1. Details: https://github.com/lcolladotor/derfinder/commit/cb7f8bdb9d99aae0a86758aa3e5bde00712a14d6