Hello, I'm analyzing CAGE data with CAGEr for the first time. I'm using R 3.6.3 and bioconductor-cager 1.28.0 installed with Miniconda 3. I've been following the vignette tutorial with my own data until the step 3.9) Creating consensus promoters across samples. I've merged my replicates and all data look good. If i run normalizeTagCount(ce, method = "simpleTpm") all is good. Bu if I run normalizeTagCount(ce, method = "powerLaw", fitInRange = c(5, 1000), alpha = 1.05, T = 1e+07) when running aggregateTagClusters() i have this error
aggregateTagClusters(ce, tpmThreshold = 5, qLow = 0.1, qUp = 0.9, maxDist = 100)
Error in validObject(x) : invalid class “IRanges” object:
'width(x)' cannot contain negative integers
what does it mean exactly? maybe that the column range (e.g chr1 1000-1002) must be End - Start > 0? this is my object ce
> ce
A CAGEexp object of 1 listed
experiment with a user-defined name and respective class.
Containing an ExperimentList class object of length 1:
[1] tagCountMatrix: RangedSummarizedExperiment with 56789463 rows and 5 columns
Features:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DFrame
sampleMap() - the sample availability DFrame
`$`, `[`, `[[` - extract colData columns, subset, or experiment
*Format() - convert into a long or wide DFrame
assays() - convert ExperimentList to a SimpleList of matrices
> ce[["tagCountMatrix"]]
class: RangedSummarizedExperiment
dim: 56789809 5
metadata(0):
assays(2): counts normalizedTpmMatrix
rownames: NULL
rowData names(3): genes annotation filteredCTSSidx
colnames(5): SAMP_A SAMP_B SAMP_C SAMP_D SAMP_E
colData names(0):
If I look at one sample: with tagClustersGR() and then count the negative values, they look all >0
> x = tagClustersGR( ce, "SAMP_A" )
> head(x)
TagClusters object with 6 ranges and 6 metadata columns:
seqnames ranges strand | score nr_ctss dominant_ctss tpm.dominant_ctss q_0.1 q_0.9
<Rle> <IRanges> <Rle> | <Rle> <integer> <integer> <Rle> <Rle> <Rle>
1 chr1 634007-634008 + | 2.280710 2 634008 1.2023170 1 2
2 chr1 817968-817970 + | 3.010468 2 817970 1.8081513 1 3
3 chr1 904811-904815 + | 2.156786 2 904811 1.0783931 1 5
4 chr1 923918-923921 + | 2.634931 2 923921 1.8081513 1 4
5 chr1 960582-960647 + | 13.178705 8 960628 2.1632628 1 66
6 chr1 966480-966488 + | 1.267194 2 966480 0.6986541 1 9
-------
seqinfo: 38 sequences from an unspecified genome; no seqlengths
> sum( width(ranges(x)) <=0)
[1] 0
> sum( start(ranges(x)) <=0)
[1] 0
if I go to the previous step and i run the command tagClusters(), again i don't fine any window with negative values
> x = tagClusters( ce, "SAMP_A" )
> head(x)
cluster chr start end strand tpm nr_ctss dominant_ctss tpm.dominant_ctss q_0.1 q_0.9
1 1 chr1 634006 634008 + 2.280710 2 634008 1.2023170 1 2
2 2 chr1 817967 817970 + 3.010468 2 817970 1.8081513 1 3
3 3 chr1 904810 904815 + 2.156786 2 904811 1.0783931 1 5
4 4 chr1 923917 923921 + 2.634931 2 923921 1.8081513 1 4
5 5 chr1 960581 960647 + 13.178705 8 960628 2.1632628 1 66
6 6 chr1 966479 966488 + 1.267194 2 966480 0.6986541 1 9
> sum(x$"end"-x$"start" <=0)
[1] 0
> sum( x$"tpm" <=0)
[1] 0
I have repeated this for all the 5 samples, and I didn't find any negative values. How can I find the line with the negative values? How can I remove this lines from the object?
Dear mirkocelii, thank you for your detailed report. I think that the negative values appear during the aggregation of the tag clusters; I wonder if it happens near the start of some chromosomes (
chrM
for instance). I also wonder if it would be prevented by providing chromosome length information to theCAGEexp
object. Does a BSgenome package exist for the genome you aligned to? If yes, can you use it like in the vignette at the time of creating theCAGEexp
object ? If not or if it does not solve the problem, can you isolate the problem in a slimmer test case ? For instance by testingCAGEexp
objects containing only a single chromosomes, and then portions of this chromosome?Dear Charles, thanks for the quick reply. I was wonder too that it could happened near the start of some chromosomes, hence as a control I removed all the CTSS of position <200 of all chromosomes, but it didn't change. I'm working on human with the BSgenome.Hsapiens.UCSC.hg38 package and I did use it as in the vignette
Good, can you make a test for each chromosome separately, with something like the following:
edited once because my advice to subset
CTSStagCountSE(ce)
was not relevant.