Question: Detection of large regions of expression
1
gravatar for António Miguel de Jesus Domingues
4 weeks ago by
Germany

I have been trying to leverage definder's ability to detect expression without the need of an annotation to identify broad regions of expressions that we observe in certain conditions:

(not shown but there is no annotated gene in this region)

The issue is that it seems that the expressed regions reported by derfinder are smaller than what would be expected by visual inspection. As you can see above there is quite a large expressed region of which only a fraction is detected. This doesn't seem to be an issue of low coverage because, to the left, there is a an area with similar high coverage which is not reported as expressed. This an example but I found several of these in my data.

The question is if this a setting which needs to be changed/tweaked, or is this something to be expected?

The analysis was run with the following settings:

fullCov <- fullCoverage(
      files = files,
      chrs = chrom,
      cutoff = 0,
      L = read_length,
      verbose = TRUE,
      totalMapped = total_mapped,
      filter = "one",
      mc.cores = nproc
)

regionMat <- regionMatrix(
   fullCov,
   cutoff = min_cov,
   L = read_length,
   maxClusterGap = 3000L,
   returnBP = TRUE,
   verbose = TRUE,
   filter = "one",
   targetSize = targetSize,
   mc.cores = nproc
   )

Coverage was taken directly from bam files, read length = 75 and cutoff = 5. Session info can be found here.

ADD COMMENTlink modified 4 weeks ago by Leonardo Collado Torres690 • written 4 weeks ago by António Miguel de Jesus Domingues470
Answer: Detection of large regions of expression
2
gravatar for Leonardo Collado Torres
4 weeks ago by
United States
Leonardo Collado Torres690 wrote:

Hi,

It seems that you are normalizing your data twice, no? The file name for the third track includes normalized. If that's the case, you don't need to use derfinder to scale the data again to a given library size. That is, don't specify totalMapped (leave it with the default value of NULL) in your fullCoverage() call.

  • What is the length of files? Is it two? one?
  • The code you provided, is that the correct version? You should be getting an error when you run regionMatrix(filter = 'one').
  • What is targetSize in your regionMatrix() call? In general, if you already scaled your data in your fullCoverage() call, then you should not scale it again in the regionMatrix() call.

I couldn't see derfinder listed on your R session information, but I assume that you are using R 3.5 with Bioconductor 3.8 given the version numbers of packages like GenomeInfoDb and GenomicAlignments. Though the situation you reported is not affected by this, in general, I recommend that you use the latest Bioconductor release. Right now, that would be Bioconductor 3.9 with R 3.6.

Best, Leonardo

ADD COMMENTlink written 4 weeks ago by Leonardo Collado Torres690

It seems that you are normalizing your data twice, no?

That seems to have been the major issue. Once I remove filterring from the fullCoverage it all started to make sense.

The code you provided, is that the correct version? You should be getting an error when you run regionMatrix(filter = 'one').

No, my apologies, and yes - it did give an error. My initial thought was that the filter option (one / mean) was the culprit, started playing around it and ended up posting the wrong code.

What is targetSize in your regionMatrix() call? In general, if you already scaled your data in your fullCoverage() call, then you should not scale it again in the regionMatrix() call.

targetSize is total mapped reads since I was reading from the bam files. I have switched to normalized bigwigs.

What is the length of files? Is it two? one?

23, but I am not sure why it matters. That said, since posting this, I started spliting the Expressed Regions filtering by condition because I am not only interested in which regions are differentially expressed, but also how much of the genome is expressed in each condition and large those regions are. So right now length of files is 3 :)

Cheers Leonardo.

ADD REPLYlink written 28 days ago by António Miguel de Jesus Domingues470

The length of files mattered regarding the filter option. Looks like it's all resolved now though =) Though I'll say it again: you don't need to use totalMapped and targetSize more than once (so either in fullCoverage() or in regionMatrix() but not both).

ADD REPLYlink written 27 days ago by Leonardo Collado Torres690

Got it! For future reference, and in case someone else stumbles upon this, this is the final version of the commands I used:

fullCov <- fullCoverage(
   files = files,
   chrs = chrom,
   L = read_length,
   verbose = TRUE,
   totalMapped = total_mapped,
   targetSize = targetSize,
   filter = "mean",
   mc.cores = nproc
)

regionMat <- regionMatrix(
   fullCov,
   cutoff = min_cov,
   L = read_length,
   maxClusterGap = 3000L,
   returnBP = FALSE,
   verbose = TRUE,
   mc.cores = nproc
)
ADD REPLYlink written 26 days ago by António Miguel de Jesus Domingues470

Awesome, thanks for sharing the code! =)

Best, Leo

ADD REPLYlink written 23 days ago by Leonardo Collado Torres690
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 186 users visited in the last hour