Question: Voom transformation from counts and normalization to negative values
0
7 months ago by
Beginner50
Beginner50 wrote:

Dear all,

I'm using ht-seq raw counts RNA-seq data from TCGA. For Normalizing the data first I used voom() transformation and converted them to log-CPM values.

I have used this voom function from Lima package to normalise data. t_index is the samples. The below function I got with some google search.

vm <- function(x){
cond <- factor(ifelse(seq(1,dim(x)[2],1) %in% t_index, 1,  0))
d <- model.matrix(~1+cond)
x <- t(apply(x,1,as.numeric))
ex <- voom(x,d,plot=F)
return(ex\$E)
}​

I have some couple of questions regarding the above function. Need some explanation from any one of you please.

Why I see negative values after normalisation? And what type of normalisation is applied after voom? Is normalisation across samples?

Any help is appreciated. Thank you.

rnaseq limma statistics voom tcga • 573 views
modified 7 months ago by Ryan C. Thompson7.3k • written 7 months ago by Beginner50
Answer: Voom transformation from counts and normalization to negative values
3
7 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.3k wrote:

The voom function computes logCPM values using whatever normalization information you choose to provide. This is documented in the help page for voom. Generally, you should be running it on a DGEList object after running calcNormFactors on the DGEList. See the help page for calcNormFactors if you want to know what kind of normalization it is doing.

In addition, if you only want logCPM values, you should not be using voom at all. The purpose of voom is to compute the weights required to counteract the mean-variance trend in the data. If all you want is logCPM, then use the cpm function from edgeR with log=TRUE, again after running calcNormFactors. However, depending on what you intend to do with the transformed data, DESeq2's rlog transformation might be more useful to you.

sorry that is not what I want. Anyways here. I see the answer to my question [Voom on TCGA data shifts count distributions towards negative values ] But not aware about which normalisation method is applied with voom? Is it quantile normalisation? And is it applied across samples?

This is also documented in the help page. See the appropriately-named normalize.method argument. The default is "none", which performs no additional normalization after the logCPM transformation.

ok. I see. I'm little confused with this post [https://www.biostars.org/p/153013/#337075] The voom function I mentioned in the my question was taken from this link and applied on raw counts. They say that voom transformation and to normalise data the above mentioned function is used. But not aware which method they have used for normalisation.

sorry, I'm trying to understand it. But not sure much about it.

1

The use of voom in the context of that question is extraneous. The function is calling voom and then throwing away the weights that it calculated, keeping only the logCPM values. If all you want is the logCPM values, then use the cpm function as I've described in my answer.

Looking at the other code in that answer, I see many other mistakes. For example, they do not use calcNormFactors or any other method to normalize for composition bias, and I would expect significant composition bias to be present an a tumor-vs-normal comparison. I don't think that code is a very good example to base your work on.

Edit: Someone in the comments for that post mentions that the input data may already be quantile-normalized, but it's not clear. In any case, the code doesn't mention anything about that, so I doubt the author is aware if the input data has already been normalized.

Thanks a lot for the information. So, basically raw htseq counts with above mentioned voom function in the question gives logCPM values in negative and positive. Is logCPM values are normalized expression data?

1

logCPM values are what they sound like: the logarithm (in base 2) of the counts for that gene divided by the total millions of counts. This normalizes for differences in sequencing depth between samples and nothing else. If you don't provide any further information, that's exactly what you get. If you use calcNormFactors, then the logCPM values will be additionally normalized for composition bias using the method that you chose when running that function. Either way, if you're not going to use the weights in your downstream analysis, then there's no reason to use voom.