edgeR TMM values
1
2
Entering edit mode
ribioinfo ▴ 100
@ribioinfo-9434
Last seen 3.6 years ago

Hi, i am using edgeR and i want to print the reads normalized with the TMM method but i have not found the command. Is there a command in edgeR that could help me?

Thank you

Riccardo

edger • 17k views
ADD COMMENT
0
Entering edit mode

Thank you, I meant to normalize the expression values. If I multiply the cpm values by a million would i get the TMM values?

ADD REPLY
1
Entering edit mode

I don't think it's clear what you are asking for. Let's assume that y is your DGEList with your count data, which you already called calcNormFactors on.

Are you after the TMM normalization factors? These are stored in your y$samples$norm.factors column.

Do you just want a gene expression matrix from your data, normalized by a "simple" per-million factor? Call cpm(y, normalized.lib.sizes=FALSE)

But you probably don't want that.

If you're after gene expression normalized by sequencing depth (adjusted by TMM factors), just call cpm(y) as Aaron has already suggested.

If you aren't after any of these three things, can you please explain in more detail what you want?

ADD REPLY
0
Entering edit mode

I would to compare the normalized counts of DESeq2 with edgeR. In order to do this have i to use calcNormFactors and then cpm?

ADD REPLY
2
Entering edit mode

This would be tricky, as the values returned by cpm are on a per-million scale, while - if I remember correctly - the values from DESeq2 are something on the scale of the original counts. This makes it difficult to compare the normalized values directly between methods. To me, such a comparison doesn't seem to have any purpose. If you just want to compare normalization strategies, you can simply compare the size factors from DESeq2 with the effective library sizes (lib.size*norm.factor) from edgeR. If you want to compare the effect of the normalization strategies, then you should have a downstream analysis in mind (e.g., PCA, clustering). For most of these downstream analyses, you're comparing between samples in the same data set so the scale of the normalized expression from each method shouldn't matter.

ADD REPLY
0
Entering edit mode

Ok, thank you. If I would to do a clustering analysis cpm(y, normalized.lib.sizes=TRUE) is correct?

ADD REPLY
1
Entering edit mode

Gordon and co. typically suggest that you pass in a value between 3-5 for prior.count in your call to cpm, as well.

ADD REPLY
0
Entering edit mode

Depending on what clustering you're doing, you'll probably want to set log=TRUE as well. This stabilizes the variance between genes of different abundance. Otherwise, high-abundance genes with correspondingly large variances would dominate the calculations, e.g., for Euclidean distances. This probably wouldn't be helpful, as you'd end up clustering on the measurement error/random variability of constitutive housekeeping genes rather than on interesting biological differences in genes that are expressed at lower abundances.

ADD REPLY
0
Entering edit mode

Hi Aaron, may I ask a relevant question here? If I just do:

(Raw count table)*(norm.factor), where 
norm.factor=edgeR::calcNormFactors((Raw count table),method='TMM').

And then conduct downstream analysis. Does this procedure make sense? After downstream analysis, can I say that I applied TMM method for normalisation?

If the above procedure does not make sense, how about the following procedure:

(Raw count table)/((norm.factor)*colSums((Raw count table))), where
colSums((Raw count table)) stands for library size (each column represents a sample or a cell)

Thank you very much!

ADD REPLY
0
Entering edit mode

Best to start a new post for a new question.

ADD REPLY
5
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 2 hours ago
The city by the bay

You can't normalize reads, because that doesn't really make any sense. You can, however, adjust read counts to obtain normalized expression values. I suggest you have a look at ?calcNormFactors and ?cpm. I won't regurgitate the documentation here; suffice to say you run calcNormFactors to get a DGEList, and then run cpm on that DGEList to get a matrix of normalized expression values.

ADD COMMENT
2
Entering edit mode

Or alternatively rpkm() to normalize by gene length as well. cpm() gives expression values normalized by library size and TMM. rpkm() gives expression values normalized by library size, TMM and gene length.

ADD REPLY

Login before adding your answer.

Traffic: 689 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6