Entering edit mode
Paul Geeleher
★
1.3k
@paul-geeleher-2679
Last seen 10.2 years ago
Hi,
I've been going through this RNA-seq use case
(http://bioconductor.org/help/course-
materials/2010/CSAMA10/Lab-8-RNAseqUseCase.pdf)
with some data I have and I'm wondering about section 2.4 where they
calculate gene expression by counting the number of reads that alight
to within the boundaries of a genes, then normalize these based on the
length of the gene. Some of the code is as follows:
dmGeneBounds <- CSAMA10::geneBounds(dmTxDb)
dmGeneBounds <- dmGeneBounds[seqnames(dmGeneBounds) %in%
levels(seqnames(alnRanges))]
head(dmGeneBounds, 3)
dmGeneCounts <- countOverlaps(dmGeneBounds, alnRanges)
dmRPKM <- CSAMA10::rpkm(dmGeneCounts, dmGeneBounds)
My question is, is this actually correct, could you publish using this
method or is this just meant as a simple example?
I'm interested in the ranks of the genes in the samples for a
subsequent analysis, but I would have assumed that you'd have to count
the number of reads that map to the EXONS of each gene and normalize
by the length of the EXONS, rather then the gene itself?
If this is the case I wonder if there a tutorial that shows how to do
that...
--
Paul Geeleher
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com