I would like to test the mapping using the subread::subjunc
aligner, but I'm not sure what feature and attributes to use when adding a gtf file. Usually one summarize the data on the gene level using the gene_id
and exon
parameters of the gtf file.
But I would like to work on the transcripts. Would it than make more sense to summarize the reads on the transcript_id
and exon
instead in order to better estimate the differences in transcript levels?
Or just running the normal parameters to get the counts on a gene level and later using the diffSpliceDGE
command on the normalized object (like shown in the user's guide).
thanks
thanks again. so How do I get the exon counts?
This is the command I use now?
Would this give me the counts per exon or per gene?
I'm working with a gtf file from C. elegans. In the third column I have the
exon
feature and in the 9th column thegene_id
attribute (ortranscript_id
).Annotation to the subjunc aligner helps the aligner to better map the reads. The aligner generates BAM files, which contains mapped reads (and also the reads than cannot be mapped). However the aligner itself doesn't give you the read counts to genes or exons.
You need to run featureCounts for having the read counts. FeatureCounts can assign reads on the meta-feature level (i.e. genes) or the feature level (i.e. exons). Please find the users guides and tutorials on http://subread.sourceforge.net/. There is also a case study for the entire workflow of read mapping, counting and DE analysis on http://subread.sourceforge.net/RNAseqCaseStudy.html. Although the case study uses Rsubread, which runs in the R environment, the R functions in the Rsubread package and the command-line programs in the Subread package are well mapped, including the parameters. Hence the case study can also provide useful information for running the read aligner and featureCounts in the command-line Subread package.
I assume, that it would make more sense to quantify the reads based on the gene-level and not the exon-level, following by using the
diffSpliceDGE
test for differential splicing analysis.How does the quantification on the eon level works on overlapped genes? Are these reads being discarded or assigned randomly. I guess both of these options are not really perfect.
Read alignment and read counting are separate steps.
subjunc
only aligns and is the same regardless of whether you are planning on a gene or exon analysis. You have to usefeatureCounts
to get gene or exon counts.You are perhaps getting mislead by the fact that
subjunc
optionally uses GTF annotation, but it does so only to assist the alignment, not to count reads.Thank you both Gordon Smyth and @Yang Liao.
I know there is a difference between the mapping step (
subjunc
) and the quantification (featureCounts
).I would like to understand how to quantify the data after the mapping. From the answers I got here, I understand that it is best to quantify on the exon-level something like that:
Would these next steps make sense:
-g
and-t
with exons)edgeR
and create a DEGList object 3.run the data through thediffSpliceDGE
command. (Can this command also work with exon-level based counting, or should I use gene-level counts here?)thanks again