CAGEr - Error with aggregateTagClusters() - invalid class IRanges object: width(x) cannot contain negative integers
1
1
Entering edit mode
mirkocelii ▴ 40
@mirkocelii-23498
Last seen 2.1 years ago

Hello, I'm analyzing CAGE data with CAGEr for the first time. I'm using R 3.6.3 and bioconductor-cager 1.28.0 installed with Miniconda 3. I've been following the vignette tutorial with my own data until the step 3.9) Creating consensus promoters across samples. I've merged my replicates and all data look good. If i run normalizeTagCount(ce, method = "simpleTpm") all is good. Bu if I run normalizeTagCount(ce, method = "powerLaw", fitInRange = c(5, 1000), alpha = 1.05, T = 1e+07) when running aggregateTagClusters() i have this error

aggregateTagClusters(ce, tpmThreshold = 5, qLow = 0.1, qUp = 0.9, maxDist = 100)
Error in validObject(x) : invalid class “IRanges” object:
'width(x)' cannot contain negative integers


what does it mean exactly? maybe that the column range (e.g chr1 1000-1002) must be End - Start > 0? this is my object ce

> ce
A CAGEexp object of 1 listed
experiment with a user-defined name and respective class.
Containing an ExperimentList class object of length 1:
[1] tagCountMatrix: RangedSummarizedExperiment with 56789463 rows and 5 columns
Features:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DFrame
sampleMap() - the sample availability DFrame
$, [, [[ - extract colData columns, subset, or experiment *Format() - convert into a long or wide DFrame assays() - convert ExperimentList to a SimpleList of matrices > ce[["tagCountMatrix"]] class: RangedSummarizedExperiment dim: 56789809 5 metadata(0): assays(2): counts normalizedTpmMatrix rownames: NULL rowData names(3): genes annotation filteredCTSSidx colnames(5): SAMP_A SAMP_B SAMP_C SAMP_D SAMP_E colData names(0):  If I look at one sample: with tagClustersGR() and then count the negative values, they look all >0 > x = tagClustersGR( ce, "SAMP_A" ) > head(x) TagClusters object with 6 ranges and 6 metadata columns: seqnames ranges strand | score nr_ctss dominant_ctss tpm.dominant_ctss q_0.1 q_0.9 <Rle> <IRanges> <Rle> | <Rle> <integer> <integer> <Rle> <Rle> <Rle> 1 chr1 634007-634008 + | 2.280710 2 634008 1.2023170 1 2 2 chr1 817968-817970 + | 3.010468 2 817970 1.8081513 1 3 3 chr1 904811-904815 + | 2.156786 2 904811 1.0783931 1 5 4 chr1 923918-923921 + | 2.634931 2 923921 1.8081513 1 4 5 chr1 960582-960647 + | 13.178705 8 960628 2.1632628 1 66 6 chr1 966480-966488 + | 1.267194 2 966480 0.6986541 1 9 ------- seqinfo: 38 sequences from an unspecified genome; no seqlengths > sum( width(ranges(x)) <=0) [1] 0 > sum( start(ranges(x)) <=0)  [1] 0 if I go to the previous step and i run the command tagClusters(), again i don't fine any window with negative values > x = tagClusters( ce, "SAMP_A" ) > head(x) cluster chr start end strand tpm nr_ctss dominant_ctss tpm.dominant_ctss q_0.1 q_0.9 1 1 chr1 634006 634008 + 2.280710 2 634008 1.2023170 1 2 2 2 chr1 817967 817970 + 3.010468 2 817970 1.8081513 1 3 3 3 chr1 904810 904815 + 2.156786 2 904811 1.0783931 1 5 4 4 chr1 923917 923921 + 2.634931 2 923921 1.8081513 1 4 5 5 chr1 960581 960647 + 13.178705 8 960628 2.1632628 1 66 6 6 chr1 966479 966488 + 1.267194 2 966480 0.6986541 1 9 > sum(x$"end"-x$"start" <=0) [1] 0 > sum( x$"tpm" <=0)
[1] 0


I have repeated this for all the 5 samples, and I didn't find any negative values. How can I find the line with the negative values? How can I remove this lines from the object?

CAGEr IRanges aggregateTagClusters • 478 views
0
Entering edit mode

Dear mirkocelii, thank you for your detailed report. I think that the negative values appear during the aggregation of the tag clusters; I wonder if it happens near the start of some chromosomes (chrM for instance). I also wonder if it would be prevented by providing chromosome length information to the CAGEexp object. Does a BSgenome package exist for the genome you aligned to? If yes, can you use it like in the vignette at the time of creating the CAGEexp object ? If not or if it does not solve the problem, can you isolate the problem in a slimmer test case ? For instance by testing CAGEexp objects containing only a single chromosomes, and then portions of this chromosome?

0
Entering edit mode

Dear Charles, thanks for the quick reply. I was wonder too that it could happened near the start of some chromosomes, hence as a control I removed all the CTSS of position <200 of all chromosomes, but it didn't change. I'm working on human with the BSgenome.Hsapiens.UCSC.hg38 package and I did use it as in the vignette

ce <- CAGEexp( genomeName     = "BSgenome.Hsapiens.UCSC.hg38"
, inputFiles     = samples
, inputFilesType = "ctss"
, sampleLabels   = labels )

0
Entering edit mode

Good, can you make a test for each chromosome separately, with something like the following:

ce.bak <- ce
aggregateTagClusters(ce, ......


edited once because my advice to subset CTSStagCountSE(ce) was not relevant.

1
Entering edit mode
@charles-plessy-7857
Last seen 3 months ago
Japan

I just found this in .make.consensus.clusters():

  suppressWarningsstartclusters.gr) <- startclusters.gr) - plus.minus) # Suppress warnings
suppressWarningsendclusters.gr)   <- endclusters.gr)   + plus.minus) # because we trim later
clusters.gr <- reducetrimclusters.gr))


It does not work as intended in the absence of information on chromosome length. (Did it in the past?)

trim(GRanges("chr1", IRanges(-1,2)))
GRanges object with 1 range and 0 metadata columns:
seqnames    ranges strand
<Rle> <IRanges>  <Rle>
[1]     chr1      -1-2      *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths


Can you try to add seqinfo to the tag clusters and run the aggregation again ?

seqinfo(ce@metadata[["tagClusters"]]) <- seqinfo(CAGEr:::getRefGenome(genomeName(ce)))[seqlevels(ce@metadata[["tagClusters"]])]

1
Entering edit mode

Dear Charles, thanks for your advices, I removed from CTCC files all those lines referring to alternative, unknown, random and mt chromosomes and now it works well!

1
Entering edit mode

Thanks for the feedback. Can you marked the issue solved?