Starting from featureCounts generated raw counts file, I used edgeR to estimate the DE analysis and it went well. Now I use CPM normalized files to explore some specific genes expression in multiple pathways. I am aware that CPM are corrected for library size without considering gene length. Is that OK to use this file for individual gene analysis and generate plots for publication OR do I need another normalized file? Keeping it in mind, I was trying to get RPKM normalized file. But even after reading similar posts, I am not sure how can I get input gene length to rpkm() function. This discussion tells that recent version of edgeR can directly find gene length from DGEList object. I am using edgeR_3.28.1 and can anyone direct me how to get the gene length so that I can export RPKM? Related info: I downloaded rice genome from MSU and reference assembly was done with Hisat2. Currently, I have only raw counts files with me(ie, no .bam files available).
Here is the code I used to generate CPM. normalization,
raw_counts<-read.delim("rawcounts.txt",row.names="Locus",check.names = TRUE) targets<-read.table("targets.txt",header=T,sep="\t") group<-factor(paste(targets$Genotype,targets$Time,targets$Treatment,sep=".")) cbind(targets,Group=group) y<-DGEList(counts = raw_counts, group = group) #Filterout low count genes keep <-rowSums(cpm(y)>=2) >=2 y <- y[keep, , keep.lib.sizes=FALSE] y<-calcNormFactors(y) CPM<-cpm(y) #How can I incorporate gene length in rpkm()? RPKM<-rpkm(y)