Dear All,
I am using EdgeR package to extract the differential expression . I do not have replicates in my samples.
Now I want to know the normalized read count of my datasets.
I have tried with rpkm method but edgeR follow the TMM method for normalization so there is a variation at normalized read.
Could anyone suggest how can I get the normalized read count for all my reads.
Here is my code-
countdata <- read.table("CD.txt",header = T,sep = "\t",row.names = 1)
attach(countdata)
group <- factor(c(1,2))
y <- DGEList(counts = countdata, lib.size = c(126724,347238), group = group)
y$samples
keep <- rowSums(rpkm(y)>1) >= 2
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y,method="TMM")
design <- model.matrix(~group)
y$samples
bcv <- 0.2
y <- DGEList(counts = countdata, lib.size = c(126724,347238), group = group,genes=data.frame(Length=GeneLength))
y <- calcNormFactors(y)
RPKM <- rpkm(y)
et <- exactTest(y, dispersion=bcv^2)
y1 <- y
y1$samples$group <- 1
y0 <- estimateCommonDisp(y1[c(CHI_ReadCnt,DEH_ReadCnt),])
y$common.dispersion <- y0$common.dispersion
fit <- glmFit(y, design)
lrt <- glmLRT(fit)
resSig <- res[ which(res$PValue < 0.05 ), ]
I shall be very thankful.
Thank you for your response.
yes I want RPKM values of my differential datasets. Can I get that from edgeR?
Could you please provide me a codes for complete analysis ??
I shall be very thankful.
The code you gave in your question already computes RPKM. You even called it "RPKM".
I agree but when I calculate manually to cross check my final results from edger using log values it gives different values seems it is not normalized by RPKM method it taking TMM method in background. Even if it follow TMM method can I get per sample normalized read count.
If I were you, I'd be explicit about the value of the
gene.length
parameter in my call torpkm
, but other than that, you can rest assured that it is doing the right thing.If you really want to know what these functions are doing in more detail so that you can compare them with what you are doing manually, you should look at the source code for
rpkm.DGEList
andrpkm.default.
You'll see that this delegates to the
cpm
function, which has been rewritten in C/C++ for efficiency ... I went back to find the last "pure R" version of cpm, which seems to be edgeR version 3.14.0You can download the source version of that package and look at the code in the
R/cpm.R
function if you really want to know what's going on.