Question

Detection of large regions of expression

1

Entering edit mode

@antonio-miguel-de-jesus-domingues-5182

Last seen 3 months ago

Germany

I have been trying to leverage definder's ability to detect expression without the need of an annotation to identify broad regions of expressions that we observe in certain conditions:

(not shown but there is no annotated gene in this region)

The issue is that it seems that the expressed regions reported by derfinder are smaller than what would be expected by visual inspection. As you can see above there is quite a large expressed region of which only a fraction is detected. This doesn't seem to be an issue of low coverage because, to the left, there is a an area with similar high coverage which is not reported as expressed. This an example but I found several of these in my data.

The question is if this a setting which needs to be changed/tweaked, or is this something to be expected?

The analysis was run with the following settings:

fullCov <- fullCoverage(
      files = files,
      chrs = chrom,
      cutoff = 0,
      L = read_length,
      verbose = TRUE,
      totalMapped = total_mapped,
      filter = "one",
      mc.cores = nproc
)

regionMat <- regionMatrix(
   fullCov,
   cutoff = min_cov,
   L = read_length,
   maxClusterGap = 3000L,
   returnBP = TRUE,
   verbose = TRUE,
   filter = "one",
   targetSize = targetSize,
   mc.cores = nproc
   )

Coverage was taken directly from bam files, read length = 75 and cutoff = 5. Session info can be found here.

derfinder intergenic expression • 1.2k views

ADD COMMENT • link updated 4.6 years ago by Leonardo Collado Torres ★ 1.0k • written 4.6 years ago by António Miguel de Jesus Domingues ▴ 510

score 2 · Accepted Answer · 2019-09-17

2

Entering edit mode

Leonardo Collado Torres ★ 1.0k

@lcolladotor

Last seen 19 days ago

United States

Hi,

It seems that you are normalizing your data twice, no? The file name for the third track includes normalized. If that's the case, you don't need to use derfinder to scale the data again to a given library size. That is, don't specify totalMapped (leave it with the default value of NULL) in your fullCoverage() call.

What is the length of files? Is it two? one?
The code you provided, is that the correct version? You should be getting an error when you run regionMatrix(filter = 'one').
What is targetSize in your regionMatrix() call? In general, if you already scaled your data in your fullCoverage() call, then you should not scale it again in the regionMatrix() call.

I couldn't see derfinder listed on your R session information, but I assume that you are using R 3.5 with Bioconductor 3.8 given the version numbers of packages like GenomeInfoDb and GenomicAlignments. Though the situation you reported is not affected by this, in general, I recommend that you use the latest Bioconductor release. Right now, that would be Bioconductor 3.9 with R 3.6.

Best, Leonardo

ADD COMMENT • link 4.6 years ago Leonardo Collado Torres ★ 1.0k

0

Entering edit mode

It seems that you are normalizing your data twice, no?

That seems to have been the major issue. Once I remove filterring from the fullCoverage it all started to make sense.

The code you provided, is that the correct version? You should be getting an error when you run regionMatrix(filter = 'one').

No, my apologies, and yes - it did give an error. My initial thought was that the filter option (one / mean) was the culprit, started playing around it and ended up posting the wrong code.

What is targetSize in your regionMatrix() call? In general, if you already scaled your data in your fullCoverage() call, then you should not scale it again in the regionMatrix() call.

targetSize is total mapped reads since I was reading from the bam files. I have switched to normalized bigwigs.

What is the length of files? Is it two? one?

23, but I am not sure why it matters. That said, since posting this, I started spliting the Expressed Regions filtering by condition because I am not only interested in which regions are differentially expressed, but also how much of the genome is expressed in each condition and large those regions are. So right now length of files is 3 :)

Cheers Leonardo.

ADD REPLY • link 4.6 years ago António Miguel de Jesus Domingues ▴ 510

0

Entering edit mode

The length of files mattered regarding the filter option. Looks like it's all resolved now though =) Though I'll say it again: you don't need to use totalMapped and targetSize more than once (so either in fullCoverage() or in regionMatrix() but not both).

ADD REPLY • link 4.6 years ago Leonardo Collado Torres ★ 1.0k

0

Entering edit mode

Got it! For future reference, and in case someone else stumbles upon this, this is the final version of the commands I used:

fullCov <- fullCoverage(
   files = files,
   chrs = chrom,
   L = read_length,
   verbose = TRUE,
   totalMapped = total_mapped,
   targetSize = targetSize,
   filter = "mean",
   mc.cores = nproc
)

regionMat <- regionMatrix(
   fullCov,
   cutoff = min_cov,
   L = read_length,
   maxClusterGap = 3000L,
   returnBP = FALSE,
   verbose = TRUE,
   mc.cores = nproc
)

ADD REPLY • link 4.6 years ago António Miguel de Jesus Domingues ▴ 510

0

Entering edit mode

Awesome, thanks for sharing the code! =)

Best, Leo

ADD REPLY • link 4.6 years ago Leonardo Collado Torres ★ 1.0k