Entering edit mode
Hello list,
My name is Makis, I am a bioinformatics post-doc fellow and, recently,
I started working on a CAGE project. I am new on the field and trying
to figure out some things, so I would much appreciate any comments on
my problem. I am sorry if my question sounds a bit confusing...
I have a Treatment vs Control (independent samples) experimental
design with 3 replicates in each condition. Usually what people do is:
1. Use samtools to keep the reads that exhibit a high mapping quality
score (here "-q 20") and convert the .bam files (containing the
quality data) into bed format:
samtools view -q 20 -F 768 -u input.bam | ~/bin/bamToBed -i stdin |
gzip -c > output.bed.gz
2. summarizing all count data (generate transcription start sites
clusters (CTSS))
3. use intersectBed to extract only promoter/gene overlapping entries
from a known (e.g. Refseq) or a custom annotation. The counts of a set
of ctss_ids (step 2) belonging to a specific gene region are
summarized (for each sample). This is the (gene) expression to be
analyzed.
4. Do differential expression analysis using the total counts for
genes from step 3. This can be done by edgeR, DESeq etc.
I would like to ask (i) is it (biologically or mathematically)
meaningful to perform differential expression analysis for the ctss
ids (data of step 2) by an appropriate method and then finding genes
that contain many differentially expressed ctss_ids (with an
enrichment test)? (ii) Does anyone have any idea how different my
results would be from the ones obtained by the above (steps 1-4)
pipeline (perhaps FDR adj p-values would be very different due to
higher dimensionality...)? (iii) If my strategy is meaningful (DE on
ctss_ids), should I use a version of cufflinks DE analysis (since I do
not have total counts any more)? I am a bit confused because for CAGE
data the gene length (isoform length) does not matter. The data I see
are just transcription start sites.
Thank you in advance for your help.
Best Regards,
Makis
[[alternative HTML version deleted]]