Voom Normalization and negative numbers

1

Entering edit mode

Michael Breen ▴ 370

@michael-breen-5999

Last seen 9.6 years ago

Hi all, We are applying Voom normalization to RNA-Seq Counts with the following code: library(edgeR) count <- read.delim("Counts.txt", check.names=FALSE, stringsAsFactors=FALSE) targets <- read.delim("Targets.txt", check.names=FALSE, stringsAsFactors=FALSE) #filter y <- DGEList(counts=rawdata[,2:31], genes=rawdata[,1:1]) keep <- rowSums(cpm(y)>10) >= 15 y <- y[keep,] dim (y) #norm y <- calcNormFactors(y) #voom VST <- voom(y,design=NULL,plot=TRUE) voom_matrix <- cbind(VST$genes, VST$E) write.table (voom_matrix, "VOOM_Matrix.txt", sep="\t") However, I find that even after this filtering step, I am finding negative expression values within my voom normalized matrix. Why is this? Michael -- M.S. Breen PhD, Bioinformatics and Genomics Clinical and Experimental Sciences Univ. of Southampton [[alternative HTML version deleted]]

Normalization Normalization • 8.5k views

ADD COMMENT • link updated 10.1 years ago by Davis, Wade ▴ 350 • written 10.1 years ago by Michael Breen ▴ 370

0

Entering edit mode

Pekka Kohonen ▴ 190

@pekka-kohonen-5862

Last seen 6.3 years ago

Sweden

Dear Michael, Could this be because some counts (after normalization) are less than 1? And log of a number > 0 and < 1 is a negative number? Best, Pekka 2014-04-01 10:12 GMT+01:00 Michael Breen <breenbioinformatics at="" gmail.com="">: > Hi all, > > We are applying Voom normalization to RNA-Seq Counts with the following > code: > > > library(edgeR) > count <- read.delim("Counts.txt", check.names=FALSE, stringsAsFactors=FALSE) > targets <- read.delim("Targets.txt", check.names=FALSE, > stringsAsFactors=FALSE) > > #filter > y <- DGEList(counts=rawdata[,2:31], genes=rawdata[,1:1]) > keep <- rowSums(cpm(y)>10) >= 15 > y <- y[keep,] > dim (y) > > #norm > y <- calcNormFactors(y) > > #voom > VST <- voom(y,design=NULL,plot=TRUE) > voom_matrix <- cbind(VST$genes, VST$E) > write.table (voom_matrix, "VOOM_Matrix.txt", sep="\t") > > > However, I find that even after this filtering step, I am finding negative > expression values within my voom normalized matrix. Why is this? > > Michael > > > > > > > > > -- > M.S. Breen > PhD, Bioinformatics and Genomics > Clinical and Experimental Sciences > Univ. of Southampton > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.1 years ago Pekka Kohonen ▴ 190

0

Entering edit mode

Davis, Wade ▴ 350

@davis-wade-2803

Last seen 9.6 years ago

Hi Micheal, As described in the help file (?voom) the $E component of the output object contains a "numeric matrix of normalized expression values on the log2 scale". So negative values indicate low levels of (normalized) expression. Even though your filtering step filters out genes with 14 or less samples (~ half the samples) with cpm >10 you could easily get low levels of expression for any particular sample. Imagine, for a given gene, that half you samples have cpm >10 and the other half have cpm=0.1. You would expect to see the later half with negative normalized expression levels. Wade ________________________________________ From: Michael Breen [breenbioinformatics@gmail.com] Sent: Tuesday, April 01, 2014 4:12 AM To: bioconductor at r-project.org Subject: [BioC] Voom Normalization and negative numbers Hi all, We are applying Voom normalization to RNA-Seq Counts with the following code: library(edgeR) count <- read.delim("Counts.txt", check.names=FALSE, stringsAsFactors=FALSE) targets <- read.delim("Targets.txt", check.names=FALSE, stringsAsFactors=FALSE) #filter y <- DGEList(counts=rawdata[,2:31], genes=rawdata[,1:1]) keep <- rowSums(cpm(y)>10) >= 15 y <- y[keep,] dim (y) #norm y <- calcNormFactors(y) #voom VST <- voom(y,design=NULL,plot=TRUE) voom_matrix <- cbind(VST$genes, VST$E) write.table (voom_matrix, "VOOM_Matrix.txt", sep="\t") However, I find that even after this filtering step, I am finding negative expression values within my voom normalized matrix. Why is this? Michael -- M.S. Breen PhD, Bioinformatics and Genomics Clinical and Experimental Sciences Univ. of Southampton [[alternative HTML version deleted]]

ADD COMMENT • link 10.1 years ago Davis, Wade ▴ 350

0

Entering edit mode

Thank you for the explanation. It was in fact quite obvious. However, does there arise conceptual problems with comprehending negative expression values? We know, the cpm is somewhere between 0-1, but does the log derivative of these numbers cause any down-stream problems? Differential expression or correlation analysis for example. Michael On Tue, Apr 1, 2014 at 6:19 PM, Davis, Wade <davisjwa@health.missouri.edu>wrote: > Hi Micheal, > As described in the help file (?voom) the $E component of the output > object contains a "numeric matrix of normalized expression values on the > log2 scale". > > So negative values indicate low levels of (normalized) expression. > > Even though your filtering step filters out genes with 14 or less samples > (~ half the samples) with cpm >10 you could easily get low levels of > expression for any particular sample. > > Imagine, for a given gene, that half you samples have cpm >10 and the > other half have cpm=0.1. You would expect to see the later half with > negative normalized expression levels. > > Wade > > ________________________________________ > From: Michael Breen [breenbioinformatics@gmail.com] > Sent: Tuesday, April 01, 2014 4:12 AM > To: bioconductor@r-project.org > Subject: [BioC] Voom Normalization and negative numbers > > Hi all, > > We are applying Voom normalization to RNA-Seq Counts with the following > code: > > > library(edgeR) > count <- read.delim("Counts.txt", check.names=FALSE, > stringsAsFactors=FALSE) > targets <- read.delim("Targets.txt", check.names=FALSE, > stringsAsFactors=FALSE) > > #filter > y <- DGEList(counts=rawdata[,2:31], genes=rawdata[,1:1]) > keep <- rowSums(cpm(y)>10) >= 15 > y <- y[keep,] > dim (y) > > #norm > y <- calcNormFactors(y) > > #voom > VST <- voom(y,design=NULL,plot=TRUE) > voom_matrix <- cbind(VST$genes, VST$E) > write.table (voom_matrix, "VOOM_Matrix.txt", sep="\t") > > > However, I find that even after this filtering step, I am finding negative > expression values within my voom normalized matrix. Why is this? > > Michael > > > > > > > > > -- > M.S. Breen > PhD, Bioinformatics and Genomics > Clinical and Experimental Sciences > Univ. of Southampton > > [[alternative HTML version deleted]] > > > -- M.S. Breen PhD, Bioinformatics and Genomics Clinical and Experimental Sciences Univ. of Southampton [[alternative HTML version deleted]]

ADD REPLY • link 10.1 years ago Michael Breen ▴ 370

0

Entering edit mode

There should be no conceptual problems with comprehending negative expression values, aside from the strange look collaborators may give you the first time they encounter it. But this is really not much different from Q-PCR experiments when the âCt level for the target gene is more abundant than the reference gene (aka housekeeping gene): you get negative numbers. Ultimately, these are measure of relative expression, unless you have done spike-ins with careful calibration. Even then, many would argue that it is still not an absolute measure. With that in mind, the scale itself is somewhat arbitrary, even if with the concept of log2 cpm. These issues would not impact the results of differential expression or correlation; they are all still valid with negative expression measures. Wade From: Michael Breen [mailto:breenbioinformatics@gmail.com] Sent: Wednesday, April 02, 2014 6:08 AM To: Davis, Wade Cc: bioconductor@r-project.org Subject: Re: [BioC] Voom Normalization and negative numbers Thank you for the explanation. It was in fact quite obvious. However, does there arise conceptual problems with comprehending negative expression values? We know, the cpm is somewhere between 0-1, but does the log derivative of these numbers cause any down-stream problems? Differential expression or correlation analysis for example. Michael On Tue, Apr 1, 2014 at 6:19 PM, Davis, Wade <davisjwa@health.missouri.edu<mailto:davisjwa@health.missouri.edu>> wrote: Hi Micheal, As described in the help file (?voom) the $E component of the output object contains a "numeric matrix of normalized expression values on the log2 scale". So negative values indicate low levels of (normalized) expression. Even though your filtering step filters out genes with 14 or less samples (~ half the samples) with cpm >10 you could easily get low levels of expression for any particular sample. Imagine, for a given gene, that half you samples have cpm >10 and the other half have cpm=0.1. You would expect to see the later half with negative normalized expression levels. Wade ________________________________________ From: Michael Breen [breenbioinformatics@gmail.com<mailto:breenbioinformatics@gmail.com>] Sent: Tuesday, April 01, 2014 4:12 AM To: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: [BioC] Voom Normalization and negative numbers Hi all, We are applying Voom normalization to RNA-Seq Counts with the following code: library(edgeR) count <- read.delim("Counts.txt", check.names=FALSE, stringsAsFactors=FALSE) targets <- read.delim("Targets.txt", check.names=FALSE, stringsAsFactors=FALSE) #filter y <- DGEList(counts=rawdata[,2:31], genes=rawdata[,1:1]) keep <- rowSums(cpm(y)>10) >= 15 y <- y[keep,] dim (y) #norm y <- calcNormFactors(y) #voom VST <- voom(y,design=NULL,plot=TRUE) voom_matrix <- cbind(VST$genes, VST$E) write.table (voom_matrix, "VOOM_Matrix.txt", sep="\t") However, I find that even after this filtering step, I am finding negative expression values within my voom normalized matrix. Why is this? Michael -- M.S. Breen PhD, Bioinformatics and Genomics Clinical and Experimental Sciences Univ. of Southampton [[alternative HTML version deleted]] -- M.S. Breen PhD, Bioinformatics and Genomics Clinical and Experimental Sciences Univ. of Southampton [[alternative HTML version deleted]]

ADD REPLY • link 10.1 years ago Davis, Wade ▴ 350

0

Entering edit mode

Thanks Wade, Is there not a difference when interpreting a calculated negative â between two relative measures of expression and a calculated negative log cpm for just one measure? I also believe that differential expression or correlation analysis would not be harmed by this. But I thought to get other opinons as well. Yours, Michael On Wed, Apr 2, 2014 at 3:58 PM, Davis, Wade <davisjwa@health.missouri.edu>wrote: > There should be no conceptual problems with comprehending negative > expression values, aside from the strange look collaborators may give you > the first time they encounter it. But this is really not much different > from Q-PCR experiments when the âCt level for the target gene is more > abundant than the reference gene (aka housekeeping gene): you get negative > numbers. > > > > Ultimately, these are measure of relative expression, unless you have done > spike-ins with careful calibration. Even then, many would argue that it is > still not an absolute measure. With that in mind, the scale itself is > somewhat arbitrary, even if with the concept of log2 cpm. > > > > These issues would not impact the results of differential expression or > correlation; they are all still valid with negative expression measures. > > > > Wade > > > > > > *From:* Michael Breen [mailto:breenbioinformatics@gmail.com] > *Sent:* Wednesday, April 02, 2014 6:08 AM > *To:* Davis, Wade > *Cc:* bioconductor@r-project.org > *Subject:* Re: [BioC] Voom Normalization and negative numbers > > > > Thank you for the explanation. It was in fact quite obvious. > > However, does there arise conceptual problems with comprehending negative > expression values? We know, the cpm is somewhere between 0-1, but does the > log derivative of these numbers cause any down-stream problems? > Differential expression or correlation analysis for example. > > Michael > > > > On Tue, Apr 1, 2014 at 6:19 PM, Davis, Wade <davisjwa@health.missouri.edu> > wrote: > > Hi Micheal, > As described in the help file (?voom) the $E component of the output > object contains a "numeric matrix of normalized expression values on the > log2 scale". > > So negative values indicate low levels of (normalized) expression. > > Even though your filtering step filters out genes with 14 or less samples > (~ half the samples) with cpm >10 you could easily get low levels of > expression for any particular sample. > > Imagine, for a given gene, that half you samples have cpm >10 and the > other half have cpm=0.1. You would expect to see the later half with > negative normalized expression levels. > > Wade > > ________________________________________ > From: Michael Breen [breenbioinformatics@gmail.com] > Sent: Tuesday, April 01, 2014 4:12 AM > To: bioconductor@r-project.org > Subject: [BioC] Voom Normalization and negative numbers > > > Hi all, > > We are applying Voom normalization to RNA-Seq Counts with the following > code: > > > library(edgeR) > count <- read.delim("Counts.txt", check.names=FALSE, > stringsAsFactors=FALSE) > targets <- read.delim("Targets.txt", check.names=FALSE, > stringsAsFactors=FALSE) > > #filter > y <- DGEList(counts=rawdata[,2:31], genes=rawdata[,1:1]) > keep <- rowSums(cpm(y)>10) >= 15 > y <- y[keep,] > dim (y) > > #norm > y <- calcNormFactors(y) > > #voom > VST <- voom(y,design=NULL,plot=TRUE) > voom_matrix <- cbind(VST$genes, VST$E) > write.table (voom_matrix, "VOOM_Matrix.txt", sep="\t") > > > However, I find that even after this filtering step, I am finding negative > expression values within my voom normalized matrix. Why is this? > > Michael > > > > > > > > > -- > M.S. Breen > PhD, Bioinformatics and Genomics > Clinical and Experimental Sciences > Univ. of Southampton > > [[alternative HTML version deleted]] > > > > > -- > > M.S. Breen > > PhD, Bioinformatics and Genomics > > Clinical and Experimental Sciences > > Univ. of Southampton > -- M.S. Breen PhD, Bioinformatics and Genomics Clinical and Experimental Sciences Univ. of Southampton [[alternative HTML version deleted]]

ADD REPLY • link 10.1 years ago Michael Breen ▴ 370

0

Entering edit mode

Itâs not a perfect analogy, but the end is the same: you have a negative number that represents an expression level, and that is my point. In the qPCR example, it would be because the difference between the reference and the target; in log cpm, it is because the expression is low. The reason for the log transform of the cpm is that the resulting measure has infinite support and this leads to nicer mathematical properties; otherwise you have problems near the 0 lower bound. Some the same reasons, people model log odds rather than odds. In practical terms, if Iâm looking at the results of a study and the average expression of a gene (after a voom-based analysis) is negative, I donât get as excited about that gene. With limited resources to do deeper sequencing and follow-up experiments, most people will pursue the higher expressed genes before those with cpm < 1. Of course, you should always visualize your data when you have many groups, because the expression signal could be strong in a single group, but the mean across all samples/groups is low. These are just my thoughts; others may have a different opinion! Wade From: Michael Breen [mailto:breenbioinformatics@gmail.com] Sent: Wednesday, April 02, 2014 11:33 AM To: Davis, Wade Cc: bioconductor@r-project.org Subject: Re: [BioC] Voom Normalization and negative numbers Thanks Wade, Is there not a difference when interpreting a calculated negative â between two relative measures of expression and a calculated negative log cpm for just one measure? I also believe that differential expression or correlation analysis would not be harmed by this. But I thought to get other opinons as well. Yours, Michael On Wed, Apr 2, 2014 at 3:58 PM, Davis, Wade <davisjwa@health.missouri.edu<mailto:davisjwa@health.missouri.edu>> wrote: There should be no conceptual problems with comprehending negative expression values, aside from the strange look collaborators may give you the first time they encounter it. But this is really not much different from Q-PCR experiments when the âCt level for the target gene is more abundant than the reference gene (aka housekeeping gene): you get negative numbers. Ultimately, these are measure of relative expression, unless you have done spike-ins with careful calibration. Even then, many would argue that it is still not an absolute measure. With that in mind, the scale itself is somewhat arbitrary, even if with the concept of log2 cpm. These issues would not impact the results of differential expression or correlation; they are all still valid with negative expression measures. Wade From: Michael Breen [mailto:breenbioinformatics@gmail.com<mailto:breen bioinformatics@gmail.com="">] Sent: Wednesday, April 02, 2014 6:08 AM To: Davis, Wade Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: Re: [BioC] Voom Normalization and negative numbers Thank you for the explanation. It was in fact quite obvious. However, does there arise conceptual problems with comprehending negative expression values? We know, the cpm is somewhere between 0-1, but does the log derivative of these numbers cause any down-stream problems? Differential expression or correlation analysis for example. Michael On Tue, Apr 1, 2014 at 6:19 PM, Davis, Wade <davisjwa@health.missouri.edu<mailto:davisjwa@health.missouri.edu>> wrote: Hi Micheal, As described in the help file (?voom) the $E component of the output object contains a "numeric matrix of normalized expression values on the log2 scale". So negative values indicate low levels of (normalized) expression. Even though your filtering step filters out genes with 14 or less samples (~ half the samples) with cpm >10 you could easily get low levels of expression for any particular sample. Imagine, for a given gene, that half you samples have cpm >10 and the other half have cpm=0.1. You would expect to see the later half with negative normalized expression levels. Wade ________________________________________ From: Michael Breen [breenbioinformatics@gmail.com<mailto:breenbioinformatics@gmail.com>] Sent: Tuesday, April 01, 2014 4:12 AM To: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: [BioC] Voom Normalization and negative numbers Hi all, We are applying Voom normalization to RNA-Seq Counts with the following code: library(edgeR) count <- read.delim("Counts.txt", check.names=FALSE, stringsAsFactors=FALSE) targets <- read.delim("Targets.txt", check.names=FALSE, stringsAsFactors=FALSE) #filter y <- DGEList(counts=rawdata[,2:31], genes=rawdata[,1:1]) keep <- rowSums(cpm(y)>10) >= 15 y <- y[keep,] dim (y) #norm y <- calcNormFactors(y) #voom VST <- voom(y,design=NULL,plot=TRUE) voom_matrix <- cbind(VST$genes, VST$E) write.table (voom_matrix, "VOOM_Matrix.txt", sep="\t") However, I find that even after this filtering step, I am finding negative expression values within my voom normalized matrix. Why is this? Michael -- M.S. Breen PhD, Bioinformatics and Genomics Clinical and Experimental Sciences Univ. of Southampton [[alternative HTML version deleted]] -- M.S. Breen PhD, Bioinformatics and Genomics Clinical and Experimental Sciences Univ. of Southampton -- M.S. Breen PhD, Bioinformatics and Genomics Clinical and Experimental Sciences Univ. of Southampton [[alternative HTML version deleted]]

ADD REPLY • link 10.1 years ago Davis, Wade ▴ 350

Login before adding your answer.