Question

CAGEr - Error with aggregateTagClusters() - invalid class IRanges object: width(x) cannot contain negative integers

1

Entering edit mode

mirkocelii ▴ 40

@mirkocelii-23498

Last seen 4.4 years ago

Hello, I'm analyzing CAGE data with CAGEr for the first time. I'm using R 3.6.3 and bioconductor-cager 1.28.0 installed with Miniconda 3. I've been following the vignette tutorial with my own data until the step 3.9) Creating consensus promoters across samples. I've merged my replicates and all data look good. If i run normalizeTagCount(ce, method = "simpleTpm") all is good. Bu if I run normalizeTagCount(ce, method = "powerLaw", fitInRange = c(5, 1000), alpha = 1.05, T = 1e+07) when running aggregateTagClusters() i have this error

aggregateTagClusters(ce, tpmThreshold = 5, qLow = 0.1, qUp = 0.9, maxDist = 100)
Error in validObject(x) : invalid class “IRanges” object:
    'width(x)' cannot contain negative integers

what does it mean exactly? maybe that the column range (e.g chr1 1000-1002) must be End - Start > 0? this is my object ce

> ce
    A CAGEexp object of 1 listed
     experiment with a user-defined name and respective class.
     Containing an ExperimentList class object of length 1:
     [1] tagCountMatrix: RangedSummarizedExperiment with 56789463 rows and 5 columns
    Features:
     experiments() - obtain the ExperimentList instance
     colData() - the primary/phenotype DFrame
     sampleMap() - the sample availability DFrame
     `$`, `[`, `[[` - extract colData columns, subset, or experiment
     *Format() - convert into a long or wide DFrame
     assays() - convert ExperimentList to a SimpleList of matrices

> ce[["tagCountMatrix"]]
class: RangedSummarizedExperiment
dim: 56789809 5
metadata(0):
assays(2): counts normalizedTpmMatrix
rownames: NULL
rowData names(3): genes annotation filteredCTSSidx
colnames(5): SAMP_A SAMP_B SAMP_C SAMP_D SAMP_E
colData names(0):

If I look at one sample: with tagClustersGR() and then count the negative values, they look all >0

> x = tagClustersGR( ce, "SAMP_A"     )
> head(x)
TagClusters object with 6 ranges and 6 metadata columns:
    seqnames        ranges strand |     score   nr_ctss dominant_ctss tpm.dominant_ctss q_0.1 q_0.9
       <Rle>     <IRanges>  <Rle> |     <Rle> <integer>     <integer>             <Rle> <Rle> <Rle>
  1     chr1 634007-634008      + |  2.280710         2        634008         1.2023170     1     2
  2     chr1 817968-817970      + |  3.010468         2        817970         1.8081513     1     3
  3     chr1 904811-904815      + |  2.156786         2        904811         1.0783931     1     5
  4     chr1 923918-923921      + |  2.634931         2        923921         1.8081513     1     4
  5     chr1 960582-960647      + | 13.178705         8        960628         2.1632628     1    66
  6     chr1 966480-966488      + |  1.267194         2        966480         0.6986541     1     9
  -------
  seqinfo: 38 sequences from an unspecified genome; no seqlengths
> sum( width(ranges(x)) <=0)
[1] 0
> sum( start(ranges(x)) <=0)

[1] 0

if I go to the previous step and i run the command tagClusters(), again i don't fine any window with negative values

> x = tagClusters( ce, "SAMP_A"     )
> head(x)
  cluster  chr  start    end strand       tpm nr_ctss dominant_ctss tpm.dominant_ctss q_0.1 q_0.9
1       1 chr1 634006 634008      +  2.280710       2        634008         1.2023170     1     2
2       2 chr1 817967 817970      +  3.010468       2        817970         1.8081513     1     3
3       3 chr1 904810 904815      +  2.156786       2        904811         1.0783931     1     5
4       4 chr1 923917 923921      +  2.634931       2        923921         1.8081513     1     4
5       5 chr1 960581 960647      + 13.178705       8        960628         2.1632628     1    66
6       6 chr1 966479 966488      +  1.267194       2        966480         0.6986541     1     9
> sum(x$"end"-x$"start" <=0)
[1] 0
> sum( x$"tpm" <=0)
[1] 0

I have repeated this for all the 5 samples, and I didn't find any negative values. How can I find the line with the negative values? How can I remove this lines from the object?

CAGEr IRanges aggregateTagClusters • 1.4k views

ADD COMMENT • link 4.6 years ago mirkocelii ▴ 40

0

Entering edit mode

Dear mirkocelii, thank you for your detailed report. I think that the negative values appear during the aggregation of the tag clusters; I wonder if it happens near the start of some chromosomes (chrM for instance). I also wonder if it would be prevented by providing chromosome length information to the CAGEexp object. Does a BSgenome package exist for the genome you aligned to? If yes, can you use it like in the vignette at the time of creating the CAGEexp object ? If not or if it does not solve the problem, can you isolate the problem in a slimmer test case ? For instance by testing CAGEexp objects containing only a single chromosomes, and then portions of this chromosome?

ADD REPLY • link 4.6 years ago Charles Plessy ▴ 180

0

Entering edit mode

Dear Charles, thanks for the quick reply. I was wonder too that it could happened near the start of some chromosomes, hence as a control I removed all the CTSS of position <200 of all chromosomes, but it didn't change. I'm working on human with the BSgenome.Hsapiens.UCSC.hg38 package and I did use it as in the vignette

ce <- CAGEexp( genomeName     = "BSgenome.Hsapiens.UCSC.hg38"
             , inputFiles     = samples
             , inputFilesType = "ctss"
             , sampleLabels   = labels )

ADD REPLY • link 4.6 years ago mirkocelii ▴ 40

0

Entering edit mode

Good, can you make a test for each chromosome separately, with something like the following:

ce.bak <- ce
ce@metadata[["tagClusters"]] <- ce@metadata[["tagClusters"]][seqnames(ce@metadata[["tagClusters"]]) == "chr1"]
aggregateTagClusters(ce, ......

edited once because my advice to subset CTSStagCountSE(ce) was not relevant.

ADD REPLY • link 4.6 years ago Charles Plessy ▴ 180

score 1 · Accepted Answer · 2020-05-09

1

Entering edit mode

Charles Plessy ▴ 180

@charles-plessy-7857

Last seen 15 months ago

Japan

I just found this in .make.consensus.clusters():

  suppressWarningsstartclusters.gr) <- startclusters.gr) - plus.minus) # Suppress warnings
  suppressWarningsendclusters.gr)   <- endclusters.gr)   + plus.minus) # because we trim later
  clusters.gr <- reducetrimclusters.gr))

It does not work as intended in the absence of information on chromosome length. (Did it in the past?)

trim(GRanges("chr1", IRanges(-1,2)))
GRanges object with 1 range and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1      -1-2      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Can you try to add seqinfo to the tag clusters and run the aggregation again ?

seqinfo(ce@metadata[["tagClusters"]]) <- seqinfo(CAGEr:::getRefGenome(genomeName(ce)))[seqlevels(ce@metadata[["tagClusters"]])]

ADD COMMENT • link 4.6 years ago Charles Plessy ▴ 180

1

Entering edit mode

Dear Charles, thanks for your advices, I removed from CTCC files all those lines referring to alternative, unknown, random and mt chromosomes and now it works well!