Question: Detection of large regions of expression
1
4 weeks ago by
Germany

I have been trying to leverage definder's ability to detect expression without the need of an annotation to identify broad regions of expressions that we observe in certain conditions:

(not shown but there is no annotated gene in this region)

The issue is that it seems that the expressed regions reported by derfinder are smaller than what would be expected by visual inspection. As you can see above there is quite a large expressed region of which only a fraction is detected. This doesn't seem to be an issue of low coverage because, to the left, there is a an area with similar high coverage which is not reported as expressed. This an example but I found several of these in my data.

The question is if this a setting which needs to be changed/tweaked, or is this something to be expected?

The analysis was run with the following settings:

fullCov <- fullCoverage(
files = files,
chrs = chrom,
cutoff = 0,
verbose = TRUE,
totalMapped = total_mapped,
filter = "one",
mc.cores = nproc
)

regionMat <- regionMatrix(
fullCov,
cutoff = min_cov,
maxClusterGap = 3000L,
returnBP = TRUE,
verbose = TRUE,
filter = "one",
targetSize = targetSize,
mc.cores = nproc
)


Coverage was taken directly from bam files, read length = 75 and cutoff = 5. Session info can be found here.

modified 4 weeks ago by Leonardo Collado Torres690 • written 4 weeks ago by António Miguel de Jesus Domingues470
Answer: Detection of large regions of expression
2
4 weeks ago by
United States

Hi,

It seems that you are normalizing your data twice, no? The file name for the third track includes normalized. If that's the case, you don't need to use derfinder to scale the data again to a given library size. That is, don't specify totalMapped (leave it with the default value of NULL) in your fullCoverage() call.

• What is the length of files? Is it two? one?
• The code you provided, is that the correct version? You should be getting an error when you run regionMatrix(filter = 'one').
• What is targetSize in your regionMatrix() call? In general, if you already scaled your data in your fullCoverage() call, then you should not scale it again in the regionMatrix() call.

I couldn't see derfinder listed on your R session information, but I assume that you are using R 3.5 with Bioconductor 3.8 given the version numbers of packages like GenomeInfoDb and GenomicAlignments. Though the situation you reported is not affected by this, in general, I recommend that you use the latest Bioconductor release. Right now, that would be Bioconductor 3.9 with R 3.6.

Best, Leonardo

It seems that you are normalizing your data twice, no?

That seems to have been the major issue. Once I remove filterring from the fullCoverage it all started to make sense.

The code you provided, is that the correct version? You should be getting an error when you run regionMatrix(filter = 'one').

No, my apologies, and yes - it did give an error. My initial thought was that the filter option (one / mean) was the culprit, started playing around it and ended up posting the wrong code.

What is targetSize in your regionMatrix() call? In general, if you already scaled your data in your fullCoverage() call, then you should not scale it again in the regionMatrix() call.

targetSize is total mapped reads since I was reading from the bam files. I have switched to normalized bigwigs.

What is the length of files? Is it two? one?

23, but I am not sure why it matters. That said, since posting this, I started spliting the Expressed Regions filtering by condition because I am not only interested in which regions are differentially expressed, but also how much of the genome is expressed in each condition and large those regions are. So right now length of files is 3 :)

Cheers Leonardo.

The length of files mattered regarding the filter option. Looks like it's all resolved now though =) Though I'll say it again: you don't need to use totalMapped and targetSize more than once (so either in fullCoverage() or in regionMatrix() but not both).

Got it! For future reference, and in case someone else stumbles upon this, this is the final version of the commands I used:

fullCov <- fullCoverage(
files = files,
chrs = chrom,
verbose = TRUE,
totalMapped = total_mapped,
targetSize = targetSize,
filter = "mean",
mc.cores = nproc
)

regionMat <- regionMatrix(
fullCov,
cutoff = min_cov,
maxClusterGap = 3000L,
returnBP = FALSE,
verbose = TRUE,
mc.cores = nproc
)


Awesome, thanks for sharing the code! =)

Best, Leo