Question

Fitted values from EdgeR

1

Entering edit mode

R ▴ 40

@r-5604

Last seen 3.1 years ago

Germany

Hi all,

I wonder of the values in lrt$fitted.values are cpm normalized values?

des <- model.matrix(~KRAS)
fit <- glmFit(y2, des)
lrt <- edgeR::glmLRT(fit)

edger r • 3.6k views

ADD COMMENT • link 8.8 years ago R ▴ 40

0

Entering edit mode

Thanks! So what are the fitted values then?

ADD REPLY • link 8.8 years ago R ▴ 40

0

Entering edit mode

Dear Gordon. What is the best way to get out cpm values when I use lib.size from my libraries and norm.factors from my spike-ins. Please see pipeline below:

s=spike-ins

counts=miRNAs

s <- read.delim("exprs.txt", sep="\t", check.names=FALSE)

df.mdp <- read.delim("miRNAs_expressed_all_samples_ALL.csv", sep="\t", check.names=FALSE, stringsAsFactors=FALSE)
counts <- as.matrix(df.mdp[,grep("\\d\\d\\d", colnames(df.mdp))])

sids <- as.character(read.table("config_braf.csv", stringsAsFactors=FALSE)[,3])
colnames(counts) <- sids
colnames(counts.N) <- sids
s <- s[,sids]

group <- sids
group[grep("\\dR", sids)] <- "Normal"
group[grep("\\dG", sids)] <- "Tumor"

group <- factor(group)
patient <- factor(sapply(str_split(sids, 'R|G'), function(x) x[[1]]))

d <- edgeR::DGEList(counts = round(counts), group=group)
d.s <- edgeR::DGEList(counts = s)

d$samples$lib.size <- colSums(d$counts) # recalculate size factors

## add spike-ins
d$counts <- rbind(d$counts, d.s$counts)

## use lib sizes from mirna and norm factors from spike-ins

method <- "TMM" # c("TMM","RLE","upperquartile","none")
method <- "RLE"

y <- calcNormFactors(d, method=method)
y.s <- calcNormFactors(d.s, method=method)
plot(y$samples$lib.size, y.s$samples$lib.size)
y$samples$lib.size <- y.s$samples$lib.size
y2 <- estimateDisp(y, trend.method="loess")

des <- model.matrix(~patient + group)
fit <- glmFit(y2, des)
lrt <- edgeR::glmLRT(fit)
tab <- topTags(lrt, n=Inf)@.Data[[1]]

ADD REPLY • link 8.8 years ago R ▴ 40

0

Entering edit mode

You've also posted this CPM normalization issue as two separate questions, at print matrix from EdgeR that are normalized by cpm and by spike-in and Detect global differences in miRNA expression between tumor and normal using spike-ins. Please try to avoid redundancy, as we'll end up answering the same thing multiple times.

ADD REPLY • link 8.8 years ago Aaron Lun ★ 28k

score 2 · Answer 1 · 2015-07-10

2

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 2 hours ago

United States

Let's look.

> example(glmLRT)
<snip>
glmLRT> d <- DGEList(y)
<snip>
glmLRT> # Likelihood ratio tests for trend
glmLRT> results <- glmLRT(fit, coef=2)

<snip>

> head(results$fitted.values)
             x0        x1        x2
gene.1 9.618113 29.449462 69.696630
gene.2 4.787029  5.444920  4.787005
gene.3 8.071594  7.879561  5.945545
gene.4 5.807543  6.605920  5.807928
gene.5 6.194186  5.686449  4.035013
gene.6 3.528931  5.335160  6.234458


> head(d$counts)
       x0 x1 x2
gene.1  6 44 55
gene.2  5  5  5
gene.3  7 10  5
gene.4  8  2  8
gene.5  5  8  3
gene.6  5  2  8

Does that answer your question?

ADD COMMENT • link 8.8 years ago James W. MacDonald 65k

1

Entering edit mode

To clarify James' point; the fitted values are on the same scale as the raw counts themselves. This demonstrates that they are not CPM values. If you want to convert them to CPMs, you could do something like this:

cpm(results$fitted.values, lib.size=exp(getOffset(d)))

The getOffset call allows us to handle non-unity normalization factors.

ADD REPLY • link 8.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Dear Aaron,

Considering your reply, which of the following options do you think would be the correct way to obtain normalized counts?

fit<-glmFit(y,design)

lrt<-glmLRT(fit, coef=14:2)

norm_counts <- cpm(lrt$fitted.values, lib.size = exp(getOffset(y)))

norm_counts2 <- cpm(lrt$fitted.values, normalized.lib.sizes = TRUE)

norm_counts3<- cpm(y, lib.size = exp(getOffset(y)))

norm_counts4 <- cpm(y, normalized.lib.sizes = TRUE)

Thank you very much! Is it ok to use then this normalized counts to proceed with clustering or is it better to use the log2 of the counts (using log = TRUE in the cpm function)?

Best,

Andrew

ADD REPLY • link 5.9 years ago georomano • 0

1

Entering edit mode

Using cpm is the recommended way to obtain normalized expression values for other (i.e., non-DB) analyses such as clustering and dimensionality reduction. Running it as you have done with norm_counts4 is the simplest approach, so you might as well go with that. In fact, normalized.lib.sizes=TRUE is the default anyway, so you can even save yourself some typing by leaving that argument out.

And yes, it is usually better to perform a log-transformation, see my comments here:

A: Robust transformation of raw RNA-seq counts for exploratory data analysis and hi

Note that we generally don't use the term "normalized counts" in edgeR, as this implies that you can treat the values as counts (and use them in GLM fitting, etc.). This may not be appropriate as the scaling will distort the mean-variance relationship when the library sizes are very different, e.g., a "normalized count" of 1 that was derived from an original count of 100 is much less variable than the same value derived from an original count of 1.

ADD REPLY • link 5.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Dear Aaron,

Thank you very much for your quick reply! I will keep on going with this in mind.

Kind regards.

Andrew

ADD REPLY • link 5.9 years ago georomano • 0

score 0 · Answer 2 · 2015-07-11

The help page for glmLRT explains the meaning of all the output components.

For more mathematical detail, here's the paper that describes the methodology:

http://nar.oxfordjournals.org/content/40/10/4288

The quantities that are called fitted.values in the output are denoted by the Greek letter mu in the article. Generally speaking, these quantities are for internal edgeR use and you do not need to manipulate them yourself in a practical analysis.