Question: edgeR TMM values
1
gravatar for ribioinfo
3.5 years ago by
ribioinfo70
ribioinfo70 wrote:

Hi, i am using edgeR and i want to print the reads normalized with the TMM method but i have not found the command. Is there a command in edgeR that could help me?

Thank you

Riccardo

edger • 9.8k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by ribioinfo70
Answer: edgeR TMM values
4
gravatar for Aaron Lun
3.5 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

You can't normalize reads, because that doesn't really make any sense. You can, however, adjust read counts to obtain normalized expression values. I suggest you have a look at ?calcNormFactors and ?cpm. I won't regurgitate the documentation here; suffice to say you run calcNormFactors to get a DGEList, and then run cpm on that DGEList to get a matrix of normalized expression values.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Aaron Lun24k
1

Or alternatively rpkm() to normalize by gene length as well. cpm() gives expression values normalized by library size and TMM. rpkm() gives expression values normalized by library size, TMM and gene length.

ADD REPLYlink written 3.5 years ago by Gordon Smyth37k
Answer: edgeR TMM values
0
gravatar for ribioinfo
3.5 years ago by
ribioinfo70
ribioinfo70 wrote:

Thank you, I meant to normalize the expression values. If I multiply the cpm values by a million would i get the TMM values?

ADD COMMENTlink written 3.5 years ago by ribioinfo70
1

I don't think it's clear what you are asking for. Let's assume that y is your DGEList with your count data, which you already called calcNormFactors on.

Are you after the TMM normalization factors? These are stored in your y$samples$norm.factors column.

Do you just want a gene expression matrix from your data, normalized by a "simple" per-million factor? Call cpm(y, normalized.lib.sizes=FALSE)

But you probably don't want that.

If you're after gene expression normalized by sequencing depth (adjusted by TMM factors), just call cpm(y) as Aaron has already suggested.

If you aren't after any of these three things, can you please explain in more detail what you want?

ADD REPLYlink written 3.5 years ago by Steve Lianoglou12k

I would to compare the normalized counts of DESeq2 with edgeR. In order to do this have i to use calcNormFactors and then cpm?

ADD REPLYlink written 3.5 years ago by ribioinfo70
2

This would be tricky, as the values returned by cpm are on a per-million scale, while - if I remember correctly - the values from DESeq2 are something on the scale of the original counts. This makes it difficult to compare the normalized values directly between methods. To me, such a comparison doesn't seem to have any purpose. If you just want to compare normalization strategies, you can simply compare the size factors from DESeq2 with the effective library sizes (lib.size*norm.factor) from edgeR. If you want to compare the effect of the normalization strategies, then you should have a downstream analysis in mind (e.g., PCA, clustering). For most of these downstream analyses, you're comparing between samples in the same data set so the scale of the normalized expression from each method shouldn't matter.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Aaron Lun24k

Ok, thank you. If I would to do a clustering analysis cpm(y, normalized.lib.sizes=TRUE) is correct?

ADD REPLYlink written 3.5 years ago by ribioinfo70
1

Gordon and co. typically suggest that you pass in a value between 3-5 for prior.count in your call to cpm, as well.

ADD REPLYlink written 3.5 years ago by Steve Lianoglou12k

Depending on what clustering you're doing, you'll probably want to set log=TRUE as well. This stabilizes the variance between genes of different abundance. Otherwise, high-abundance genes with correspondingly large variances would dominate the calculations, e.g., for Euclidean distances. This probably wouldn't be helpful, as you'd end up clustering on the measurement error/random variability of constitutive housekeeping genes rather than on interesting biological differences in genes that are expressed at lower abundances.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Aaron Lun24k

Hi Aaron, may I ask a relevant question here? If I just do:

(Raw count table)*(norm.factor), where 
norm.factor=edgeR::calcNormFactors((Raw count table),method='TMM').

And then conduct downstream analysis. Does this procedure make sense? After downstream analysis, can I say that I applied TMM method for normalisation?

If the above procedure does not make sense, how about the following procedure:

(Raw count table)/((norm.factor)*colSums((Raw count table))), where
colSums((Raw count table)) stands for library size (each column represents a sample or a cell)

Thank you very much!

ADD REPLYlink written 2.4 years ago by wt2150

Best to start a new post for a new question.

ADD REPLYlink written 2.4 years ago by Aaron Lun24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 119 users visited in the last hour