Behaviour of getCTSS on a CAGEset - extra reads?
2
1
Entering edit mode
sarac ▴ 20
@sarac-21308
Last seen 4.8 years ago

When counting reads from a bamfile, I manually extracted the reads aligned to a particular location and found that CAGEr had clustered some reads at the location that did not exist in the bamfile.

The commands I am using are:

> My_CAGEset <- new("CAGEset", genomeName = "BSgenome.Hsapiens.UCSC.hg38", 
                     inputFiles = <input_path>, inputFilesType = "bam", 
                     sampleLabels = c(<sample_names>)) 
> ctss <-  getCTSS(My_CAGEset)

If I write ctss to out to a table/file and count the number of tags assigned to a position in this file vs manually counting the number of sequences that are aligned to that position in the bam/sam the numbers do not correspond to each other.

Is there some aspect of CAGEr I am misunderstanding that explains this behaviour?

cager cage ctss • 1.2k views
ADD COMMENT
0
Entering edit mode

Hello,

the getCTSS function applies several filters on the data, and when using BAM files as input it will also attempt to correct the "G bias" of CAGE, as explained in the function's documentation. If this does not answer your question, can you give me a reproductible test case illustrating your problem ?

Have a nice day,

-- Charles

ADD REPLY
0
Entering edit mode

Hi Charles, Many thanks for getting back to me.

When I output the ctss object above as a table I get, for example, the following:

chr5    150412750   -   2
chr5    150412751   +   9932
chr5    150412751   -   48
chr5    150412752   +   514
chr5    150412752   -   235
chr5    150412753   +   261
chr5    150412753   -   1920
chr5    150412754   +   607
chr5    150412754   -   66
chr5    150412755   -   47
chr5    150412756   -   1
chr5    150412757   +   71
chr5    150412757   -   16
chr5    150412758   +   3026

For a small region in chromosome 5 between 150412750-150412758

If I use genomecov as part of Bedtools, which displays different the data differently (piled up reads, not the start point of reads) I get the following:

Minus
chr5    150412750   -   16881
chr5    150412751   -   16792
chr5    150412752   -   16660
chr5    150412753   -   6574
chr5    150412754   -   4190
chr5    150412755   -   3867
chr5    150412756   -   3213
chr5    150412757   -   3150
chr5    150412758   -   119
Plus
chr5    150406961   +   1
chr5    150449786   +   1
chr5    150449787   +   1
chr5    150449788   +   1
chr5    150449789   +   1
chr5    150449790   +   1
chr5    150449791   +   1
chr5    150449792   +   1
chr5    150449793   +   1

The output is, of course, very different, but what is striking is the lack of reads in the + direction in the genomecov version compared to that from the ctss CAGEr output. I should note that the bamfiles used for the genomecov function were sorted and filtered for quality prior to use, but with a reasonably lax filtration step.

Finally, if I manually search the bam/samfile for positive sequences in this region I do not get any hits. This is the case if I expand the range to cover any 'pile up' sequences - although they should not be contributing.

If necessary, I can email you the bamfiles themselves?

ADD REPLY
0
Entering edit mode

Yes, please send me a minimal BAM file to my maintainer address.

ADD REPLY
0
Entering edit mode
@charles-plessy-7857
Last seen 6 months ago
Japan

Hi Sara, you are right, it seems that the correctSystematicG option is badly broken. I do not have time to fix this in the short term, so if your results do not critically depend on it, may I suggest to turn it to FALSE ?

ADD COMMENT
0
Entering edit mode

Thank you for your reply and for advice on the work around.

ADD REPLY
0
Entering edit mode
@charles-plessy-7857
Last seen 6 months ago
Japan

Dear Sarac,

We fixed a bug in G correction for CAGEset experiments at the end of last year:

https://github.com/charles-plessy/CAGEr/pull/26

Would you mind trying again ?

Have a nice day,

-- Charles

ADD COMMENT

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6