Question

Voom transformation from counts and normalization to negative values

0

Entering edit mode

Beginner ▴ 60

@beginner-15939

Last seen 2.6 years ago

Switzerland

Dear all,

I'm using ht-seq raw counts RNA-seq data from TCGA. For Normalizing the data first I used voom() transformation and converted them to log-CPM values.

I have used this voom function from Lima package to normalise data. t_index is the samples. The below function I got with some google search.

vm <- function(x){
  cond <- factor(ifelse(seq(1,dim(x)[2],1) %in% t_index, 1,  0))
  d <- model.matrix(~1+cond)
  x <- t(apply(x,1,as.numeric))
  ex <- voom(x,d,plot=F)
  return(ex$E)
}

I have some couple of questions regarding the above function. Need some explanation from any one of you please.

Why I see negative values after normalisation? And what type of normalisation is applied after voom? Is normalisation across samples?

Any help is appreciated. Thank you.

rnaseq voom limma tcga statistics • 8.8k views

ADD COMMENT • link updated 7.1 years ago by Ryan C. Thompson ★ 7.9k • written 7.1 years ago by Beginner ▴ 60

score 5 · Answer 1 · 2018-09-24

5

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 12 months ago

Icahn School of Medicine at Mount Sinai…

The voom function computes logCPM values using whatever normalization information you choose to provide. This is documented in the help page for voom. Generally, you should be running it on a DGEList object after running calcNormFactors on the DGEList. See the help page for calcNormFactors if you want to know what kind of normalization it is doing.

In addition, if you only want logCPM values, you should not be using voom at all. The purpose of voom is to compute the weights required to counteract the mean-variance trend in the data. If all you want is logCPM, then use the cpm function from edgeR with log=TRUE, again after running calcNormFactors. However, depending on what you intend to do with the transformed data, DESeq2's rlog transformation might be more useful to you.

ADD COMMENT • link 7.1 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

sorry that is not what I want. Anyways here. I see the answer to my question [Voom on TCGA data shifts count distributions towards negative values ] But not aware about which normalisation method is applied with voom? Is it quantile normalisation? And is it applied across samples?

ADD REPLY • link 7.1 years ago Beginner ▴ 60

0

Entering edit mode

This is also documented in the help page. See the appropriately-named normalize.method argument. The default is "none", which performs no additional normalization after the logCPM transformation.

ADD REPLY • link 7.1 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

ok. I see. I'm little confused with this post [https://www.biostars.org/p/153013/#337075] The voom function I mentioned in the my question was taken from this link and applied on raw counts. They say that voom transformation and to normalise data the above mentioned function is used. But not aware which method they have used for normalisation.

sorry, I'm trying to understand it. But not sure much about it.

ADD REPLY • link 7.1 years ago Beginner ▴ 60

1

Entering edit mode

The use of voom in the context of that question is extraneous. The function is calling voom and then throwing away the weights that it calculated, keeping only the logCPM values. If all you want is the logCPM values, then use the cpm function as I've described in my answer.

Looking at the other code in that answer, I see many other mistakes. For example, they do not use calcNormFactors or any other method to normalize for composition bias, and I would expect significant composition bias to be present an a tumor-vs-normal comparison. I don't think that code is a very good example to base your work on.

Edit: Someone in the comments for that post mentions that the input data may already be quantile-normalized, but it's not clear. In any case, the code doesn't mention anything about that, so I doubt the author is aware if the input data has already been normalized.

ADD REPLY • link 7.1 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Thanks a lot for the information. So, basically raw htseq counts with above mentioned voom function in the question gives logCPM values in negative and positive. Is logCPM values are normalized expression data?

ADD REPLY • link 7.1 years ago Beginner ▴ 60

2

Entering edit mode

logCPM values are what they sound like: the logarithm (in base 2) of the counts for that gene divided by the total millions of counts. This normalizes for differences in sequencing depth between samples and nothing else. If you don't provide any further information, that's exactly what you get. If you use calcNormFactors, then the logCPM values will be additionally normalized for composition bias using the method that you chose when running that function. Either way, if you're not going to use the weights in your downstream analysis, then there's no reason to use voom.