When counting reads from a bamfile, I manually extracted the reads aligned to a particular location and found that CAGEr had clustered some reads at the location that did not exist in the bamfile.
The commands I am using are:
> My_CAGEset <- new("CAGEset", genomeName = "BSgenome.Hsapiens.UCSC.hg38", inputFiles = <input_path>, inputFilesType = "bam", sampleLabels = c(<sample_names>)) > ctss <- getCTSS(My_CAGEset)
If I write ctss to out to a table/file and count the number of tags assigned to a position in this file vs manually counting the number of sequences that are aligned to that position in the bam/sam the numbers do not correspond to each other.
Is there some aspect of CAGEr I am misunderstanding that explains this behaviour?