Question

edgeR TMM values

2

Entering edit mode

ribioinfo ▴ 100

@ribioinfo-9434

Last seen 4.6 years ago

Hi, i am using edgeR and i want to print the reads normalized with the TMM method but i have not found the command. Is there a command in edgeR that could help me?

Thank you

Riccardo

edger • 18k views

ADD COMMENT • link 9.2 years ago ribioinfo ▴ 100

0

Entering edit mode

Thank you, I meant to normalize the expression values. If I multiply the cpm values by a million would i get the TMM values?

ADD REPLY • link 9.2 years ago ribioinfo ▴ 100

1

Entering edit mode

I don't think it's clear what you are asking for. Let's assume that y is your DGEList with your count data, which you already called calcNormFactors on.

Are you after the TMM normalization factors? These are stored in your y$samples$norm.factors column.

Do you just want a gene expression matrix from your data, normalized by a "simple" per-million factor? Call cpm(y, normalized.lib.sizes=FALSE)

But you probably don't want that.

If you're after gene expression normalized by sequencing depth (adjusted by TMM factors), just call cpm(y) as Aaron has already suggested.

If you aren't after any of these three things, can you please explain in more detail what you want?

ADD REPLY • link 9.2 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

I would to compare the normalized counts of DESeq2 with edgeR. In order to do this have i to use calcNormFactors and then cpm?

ADD REPLY • link 9.2 years ago ribioinfo ▴ 100

2

Entering edit mode

This would be tricky, as the values returned by cpm are on a per-million scale, while - if I remember correctly - the values from DESeq2 are something on the scale of the original counts. This makes it difficult to compare the normalized values directly between methods. To me, such a comparison doesn't seem to have any purpose. If you just want to compare normalization strategies, you can simply compare the size factors from DESeq2 with the effective library sizes (lib.size*norm.factor) from edgeR. If you want to compare the effect of the normalization strategies, then you should have a downstream analysis in mind (e.g., PCA, clustering). For most of these downstream analyses, you're comparing between samples in the same data set so the scale of the normalized expression from each method shouldn't matter.

ADD REPLY • link 9.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Ok, thank you. If I would to do a clustering analysis cpm(y, normalized.lib.sizes=TRUE) is correct?

ADD REPLY • link 9.2 years ago ribioinfo ▴ 100

1

Entering edit mode

Gordon and co. typically suggest that you pass in a value between 3-5 for prior.count in your call to cpm, as well.

ADD REPLY • link 9.2 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Depending on what clustering you're doing, you'll probably want to set log=TRUE as well. This stabilizes the variance between genes of different abundance. Otherwise, high-abundance genes with correspondingly large variances would dominate the calculations, e.g., for Euclidean distances. This probably wouldn't be helpful, as you'd end up clustering on the measurement error/random variability of constitutive housekeeping genes rather than on interesting biological differences in genes that are expressed at lower abundances.

ADD REPLY • link 9.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron, may I ask a relevant question here? If I just do:

(Raw count table)*(norm.factor), where 
norm.factor=edgeR::calcNormFactors((Raw count table),method='TMM').

And then conduct downstream analysis. Does this procedure make sense? After downstream analysis, can I say that I applied TMM method for normalisation?

If the above procedure does not make sense, how about the following procedure:

(Raw count table)/((norm.factor)*colSums((Raw count table))), where

colSums((Raw count table)) stands for library size (each column represents a sample or a cell)

Thank you very much!

ADD REPLY • link 8.1 years ago wt215 • 0

0

Entering edit mode

Best to start a new post for a new question.

ADD REPLY • link 8.1 years ago Aaron Lun ★ 28k

score 5 · Answer 1 · 2016-01-20

5

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 45 minutes ago

The city by the bay

You can't normalize reads, because that doesn't really make any sense. You can, however, adjust read counts to obtain normalized expression values. I suggest you have a look at ?calcNormFactors and ?cpm. I won't regurgitate the documentation here; suffice to say you run calcNormFactors to get a DGEList, and then run cpm on that DGEList to get a matrix of normalized expression values.

ADD COMMENT • link 9.2 years ago Aaron Lun ★ 28k

2

Entering edit mode

Or alternatively rpkm() to normalize by gene length as well. cpm() gives expression values normalized by library size and TMM. rpkm() gives expression values normalized by library size, TMM and gene length.

ADD REPLY • link 9.2 years ago Gordon Smyth 52k