derfinder: how to merge regions and counts from individual chromosomes
1
0
Entering edit mode
@antonio-miguel-de-jesus-domingues-5182
Last seen 9 months ago
Germany

tldr: using derfinder for individual chromosomes works, but merging them using do.call(c) or do.call(rbind) leads to errors. How to perform DRE with all chromosomes?

I have been following the tutorials for derfinder running the analysis for an individual chromosome:

fullCov <- fullCoverage(files = files, chrs = 'chr21', verbose = FALSE)

## Get the region matrix of Expressed Regions (ERs)
regionMat <- regionMatrix(fullCov, cutoff = 30, L = 76, verbose = FALSE)

With this information at hand one can extract the count matrix which can be fed to DESeq2:

## Round matrix
counts <- round(regionMat$chr21$coverageMatrix)

## Round matrix and specify design
dse <- DESeqDataSetFromMatrix(counts, pheno, ~ group + gender)

## Perform DE analysis
dse <- DESeq(dse, test = 'LRT', reduced = ~ gender, fitType = 'local')

And add DRE to the identified regions:

## Extract results
mcols(regionMat$chr21$regions) <- c(mcols(regionMat$chr21$regions), results(dse))

This is all very straightforward and clear, but it means doing the analysis for each individual chromosome, which might be a good idea for experiments with loads of data (or no access to enough RAM), however I would like have the differential expression results for all regions in a single analysis.

Initially I merged the individual chromosome derfinder matrices with this script but this also returns a list with individual results, which makes sense since it is basically a do.call c applied to individual results:

regionMat <- do.call(c, regionMat)
class(regionMat)
[1] "list"

lapply(regionMat, class)
$chr1
[1] "list"

$chr10
[1] "list"

So I am back to sort of where I started. I also tried other ways to merge the regions for the chromosomes with other methods, but it would either lose the information of the region coordinates or the regions would have duplicated names.

I have been testing a recount tutorial geared towards derfinder and in this case it seems to work, that is the output of merging the chromosome regions results in unique region names. expressed_regions calls derfinder::findRegions and returns a Granges obj:

## GRanges object with 328481 ranges and 6 metadata columns:
##            seqnames               ranges strand |            value
##               <Rle>            <IRanges>  <Rle> |        <numeric>
##     chr1.1     chr1       [14523, 14523]      * | 5.00949192047119
##     chr1.2     chr1       [14525, 14595]      * | 5.70283872980467
##     chr1.3     chr1       [14617, 14617]      * | 5.00848007202148
##     chr1.4     chr1       [14619, 14825]      * | 12.3351036891845
##     chr1.5     chr1       [14970, 14977]      * | 5.50534588098526
##        ...      ...                  ...    ... .              ...
##   chrY.228     chrY [56834474, 56834481]      * | 5.71176493167877
##   chrY.229     chrY [56834566, 56834584]      * | 5.19306634601794
##   chrY.230     chrY [56838897, 56838903]      * | 5.12728071212769
##   chrY.231     chrY [56846114, 56846130]      * | 6.55811685674331
##   chrY.232     chrY [56851125, 56851160]      * | 8.35790546735128
##                        area indexStart  indexEnd cluster clusterL
##                   <numeric>  <integer> <integer>   <Rle>    <Rle>
##     chr1.1 5.00949192047119      14523     14523       1     7012
##     chr1.2 404.901549816132      14525     14595       1     7012
##     chr1.3 5.00848007202148      14617     14617       1     7012
##     chr1.4 2553.36646366119      14619     14825       1     7012
##     chr1.5 44.0427670478821      14970     14977       1     7012
##        ...              ...        ...       ...     ...      ...
##   chrY.228 45.6941194534302   56834474  56834481     104     2789
##   chrY.229 98.6682605743408   56834566  56834584     104     2789
##   chrY.230 35.8909649848938   56838897  56838903     105        7
##   chrY.231 111.487986564636   56846114  56846130     106       17
##   chrY.232 300.884596824646   56851125  56851160     107       36
##   -------
##   seqinfo: 24 sequences from an unspecified genome

and then coverageMatrix, which in turn calls derfinder:::.railMatChrRegion, returns a RangedSummarizedExperiment, and not a list as derfinder. From this point on one can extract the count matrix, which includes all chromosomes, and proceed to DESeq2 (or limma). This however feels like a hack, and I would probably have to change the code because the data is not human.

My question is:

  • What am I missing that would allow to merge regions/counts from all chromosomes? It seems to me that I am missing something obvious here.

Session Info

derfinder recount • 2.0k views
ADD COMMENT
2
Entering edit mode
@lcolladotor
Last seen 16 days ago
United States

Hi,

You can combine the different pieces of regionMat (the one that is a list with one element per chromosome) manually as shown in the code below:

library('derfinder')

## Modified code from ?regionMatrix

## Create some toy data
library('IRanges')
x <- Rle(round(runif(1e4, max=10)))
y <- Rle(round(runif(1e4, max=10)))
z <- Rle(round(runif(1e4, max=10)))

## Expand fullCov to include another toy chr (same data in this case)
fullCov <- list('chr21' = DataFrame(x, y, z), 'chr22' = DataFrame(x, y, z))

## Calculate a proxy of library size
libSize <- sapply(fullCov$chr21, sum)

## Run region matrix normalizing the coverage
regionMat <- regionMatrix(fullCov = fullCov, maxRegionGap = 10L, 
    maxClusterGap = 300L, L = 36, totalMapped = libSize, targetSize = 4e4,
    returnBP = FALSE)
    
## Check structure
names(regionMat)
sapply(regionMat, names)

## Combine into a single one
library('GenomicRanges')
## First extract the data
regs <- unlist(GRangesList(lapply(regionMat, '[[', 'regions')))
covMat <- do.call(rbind, lapply(regionMat, '[[', 'coverageMatrix'))
## Force the names to match
names(regs) <- rownames(covMat) <- seq_len(length(regs))
## Combine into a list (not really needed)
mergedRegionMat <- list('regions' = regs, 'coverageMatrix' = covMat)

This is how it should look in the end:

> names(mergedRegionMat)
[1] "regions"        "coverageMatrix"
> sapply(mergedRegionMat, head)
$regions
GRanges object with 6 ranges and 6 metadata columns:
    seqnames     ranges strand |            value             area indexStart  indexEnd cluster clusterL
       <Rle>  <IRanges>  <Rle> |        <numeric>        <numeric>  <integer> <integer>   <Rle>    <Rle>
  1    chr21 [  1,  10]      * | 5.77803355465454 57.7803355465454          1        10       1     4982
  2    chr21 [ 43,  47]      * | 5.43169736843222 27.1584868421611         11        15       1     4982
  3    chr21 [ 67,  76]      * | 5.64540589142032 56.4540589142032         16        25       1     4982
  4    chr21 [113, 116]      * | 6.25778793906636 25.0311517562654         26        29       1     4982
  5    chr21 [142, 164]      * | 5.77733096804666 132.878612265073         30        52       1     4982
  6    chr21 [225, 248]      * | 5.70243562063813 136.858454895315         53        76       1     4982
  -------
  seqinfo: 2 sequences from an unspecified genome

$coverageMatrix
          x         y         z
1 1.6894633 1.3772820 1.7482827
2 0.6668934 0.7774979 0.8188159
3 1.5116251 1.6216384 1.5712414
4 0.6891232 0.6886410 0.7081651
5 3.4900755 3.8652752 3.7178670
6 3.4233862 3.8652752 4.1162099

> packageVersion('derfinder')
[1] ‘1.12.0’

 

Best,

Leonardo

 

 

ADD COMMENT
0
Entering edit mode

It worked, thanks! I wonder if this could be something to add to the vignette? In any case, it is now here for those struggling with the same issue.

1
Entering edit mode

I added this as an example in regionMatrix() to derfinder version 1.13.1. Details: https://github.com/lcolladotor/derfinder/commit/cb7f8bdb9d99aae0a86758aa3e5bde00712a14d6

ADD REPLY

Login before adding your answer.

Traffic: 960 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6