Fitted values from EdgeR
2
1
Entering edit mode
R ▴ 40
@r-5604
Last seen 5 months ago
Germany

Hi all,

I wonder of the values in lrt$fitted.values are cpm normalized values? des <- model.matrix(~KRAS) fit <- glmFit(y2, des) lrt <- edgeR::glmLRT(fit) edger r • 2.2k views ADD COMMENT 0 Entering edit mode Thanks! So what are the fitted values then? ADD REPLY 0 Entering edit mode Dear Gordon. What is the best way to get out cpm values when I use lib.size from my libraries and norm.factors from my spike-ins. Please see pipeline below: s=spike-ins counts=miRNAs s <- read.delim("exprs.txt", sep="\t", check.names=FALSE) df.mdp <- read.delim("miRNAs_expressed_all_samples_ALL.csv", sep="\t", check.names=FALSE, stringsAsFactors=FALSE) counts <- as.matrix(df.mdp[,grep("\\d\\d\\d", colnames(df.mdp))]) sids <- as.character(read.table("config_braf.csv", stringsAsFactors=FALSE)[,3]) colnames(counts) <- sids colnames(counts.N) <- sids s <- s[,sids] group <- sids group[grep("\\dR", sids)] <- "Normal" group[grep("\\dG", sids)] <- "Tumor" group <- factor(group) patient <- factor(sapply(str_split(sids, 'R|G'), function(x) x[[1]])) d <- edgeR::DGEList(counts = round(counts), group=group) d.s <- edgeR::DGEList(counts = s) d$samples$lib.size <- colSums(d$counts) # recalculate size factors

d$counts <- rbind(d$counts, d.s$counts) ## use lib sizes from mirna and norm factors from spike-ins method <- "TMM" # c("TMM","RLE","upperquartile","none") method <- "RLE" y <- calcNormFactors(d, method=method) y.s <- calcNormFactors(d.s, method=method) plot(y$samples$lib.size, y.s$samples$lib.size) y$samples$lib.size <- y.s$samples$lib.size y2 <- estimateDisp(y, trend.method="loess") des <- model.matrix(~patient + group) fit <- glmFit(y2, des) lrt <- edgeR::glmLRT(fit) tab <- topTags(lrt, n=Inf)@.Data[[1]] ADD REPLY 0 Entering edit mode You've also posted this CPM normalization issue as two separate questions, at print matrix from EdgeR that are normalized by cpm and by spike-in and Detect global differences in miRNA expression between tumor and normal using spike-ins. Please try to avoid redundancy, as we'll end up answering the same thing multiple times. ADD REPLY 2 Entering edit mode @james-w-macdonald-5106 Last seen 3 hours ago United States Let's look. > example(glmLRT) <snip> glmLRT> d <- DGEList(y) <snip> glmLRT> # Likelihood ratio tests for trend glmLRT> results <- glmLRT(fit, coef=2) <snip> > head(results$fitted.values)
x0        x1        x2
gene.1 9.618113 29.449462 69.696630
gene.2 4.787029  5.444920  4.787005
gene.3 8.071594  7.879561  5.945545
gene.4 5.807543  6.605920  5.807928
gene.5 6.194186  5.686449  4.035013
gene.6 3.528931  5.335160  6.234458

> head(d$counts) x0 x1 x2 gene.1 6 44 55 gene.2 5 5 5 gene.3 7 10 5 gene.4 8 2 8 gene.5 5 8 3 gene.6 5 2 8 Does that answer your question? ADD COMMENT 1 Entering edit mode To clarify James' point; the fitted values are on the same scale as the raw counts themselves. This demonstrates that they are not CPM values. If you want to convert them to CPMs, you could do something like this: cpm(results$fitted.values, lib.size=exp(getOffset(d)))

The getOffset call allows us to handle non-unity normalization factors.

0
Entering edit mode

Dear Aaron,

Considering your reply, which of the following options do you think would be the correct way to obtain normalized counts?

fit<-glmFit(y,design)

lrt<-glmLRT(fit, coef=14:2)

norm_counts <- cpm(lrt$fitted.values, lib.size = exp(getOffset(y))) norm_counts2 <- cpm(lrt$fitted.values, normalized.lib.sizes = TRUE)

norm_counts3<- cpm(y, lib.size = exp(getOffset(y)))

norm_counts4 <- cpm(y, normalized.lib.sizes = TRUE)

Thank you very much! Is it ok to use then this normalized counts to proceed with clustering or is it better to use the log2 of the counts (using log = TRUE in the cpm function)?

Best,

Andrew

1
Entering edit mode

Using cpm is the recommended way to obtain normalized expression values for other (i.e., non-DB) analyses such as clustering and dimensionality reduction. Running it as you have done with norm_counts4 is the simplest approach, so you might as well go with that. In fact, normalized.lib.sizes=TRUE is the default anyway, so you can even save yourself some typing by leaving that argument out.

And yes, it is usually better to perform a log-transformation, see my comments here:

A: Robust transformation of raw RNA-seq counts for exploratory data analysis and hi

Note that we generally don't use the term "normalized counts" in edgeR, as this implies that you can treat the values as counts (and use them in GLM fitting, etc.). This may not be appropriate as the scaling will distort the mean-variance relationship when the library sizes are very different, e.g., a "normalized count" of 1 that was derived from an original count of 100 is much less variable than the same value derived from an original count of 1.

0
Entering edit mode

Dear Aaron,

Thank you very much for your quick reply! I will keep on going with this in mind.

Kind regards.

Andrew

0
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

The help page for glmLRT explains the meaning of all the output components.

For more mathematical detail, here's the paper that describes the methodology:

The quantities that are called fitted.values in the output are denoted by the Greek letter mu in the article. Generally speaking, these quantities are for internal edgeR use and you do not need to manipulate them yourself in a practical analysis.