Question

edgeR and calcNormFactors manually

0

Entering edit mode

Scott Daniel ▴ 30

@scott-daniel-6403

Last seen 9.6 years ago

Dear Colleagues, I have an RNA-seq library with 27 samples. 9 of these samples are what we call input as the other 18 are derived from them through a biochemical process. As such, I have already scaled the genes in those 18 samples to roughly add-up to the corresponding inputs. To manage this at the calcNormFactors step I have the following code: y$samples y_inputs=y$counts[,grep(".[i]..",colnames(y$counts))] norm_factor_inputs <- calcNormFactors(y_inputs) y$samples$norm.factors[grep(".[i]..",row.names(y$samples))] = norm_factor_inputs y$samples$norm.factors[grep(".[p]..",row.names(y$samples))] = norm_factor_inputs y$samples$norm.factors[grep(".[r]..",row.names(y$samples))] = norm_factor_inputs y$samples So what I'm basically trying to do is calcNormFactors for just the inputs and then copy them to be the NormFactors for the p and r samples. Is copying them like this telling edgeR to re-compute the library sizes? It doesn't seem to be happening because the lib.size before and after these commands are the same. Best, Scott Daniel Zarnescu Lab MCB Ph.D. Program University of Arizona [[alternative HTML version deleted]]

edgeR edgeR • 2.1k views

ADD COMMENT • link updated 10.0 years ago by Gordon Smyth 50k • written 10.0 years ago by Scott Daniel ▴ 30

score 0 · Answer 1 · 2014-04-18

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 3 minutes ago

WEHI, Melbourne, Australia

Dear Scott, Normalization scaling factors are specific to each library, so you cannot meaningfully compute them for one set of libraries and copy the results to a larger set of libraries. If you try to explain what you are actually trying to achieve, we might be able to give you some advice. As it is, I don't yet understand why you aren't simply using calcNormFactors() in the way it is intended, which would be to apply it to all libraries at once, or why you need to do any prior ad hoc scaling. Not sure what you mean by an 'input' sample, because 'input' is usually associated with ChIP-seq rather than RNA-seq. Best wishes Gordon > Date: Tue, 15 Apr 2014 16:40:53 -0700 > From: Scott Daniel <scottdaniel at="" email.arizona.edu=""> > To: bioconductor at r-project.org > Subject: [BioC] edgeR and calcNormFactors manually > > Dear Colleagues, > > I have an RNA-seq library with 27 samples. 9 of these samples are what we > call input as the other 18 are derived from them through a biochemical > process. As such, I have already scaled the genes in those 18 samples to > roughly add-up to the corresponding inputs. > > To manage this at the calcNormFactors step I have the following code: > y$samples > y_inputs=y$counts[,grep(".[i]..",colnames(y$counts))] > norm_factor_inputs <- calcNormFactors(y_inputs) > y$samples$norm.factors[grep(".[i]..",row.names(y$samples))] = > norm_factor_inputs > y$samples$norm.factors[grep(".[p]..",row.names(y$samples))] = > norm_factor_inputs > y$samples$norm.factors[grep(".[r]..",row.names(y$samples))] = > norm_factor_inputs > y$samples > > So what I'm basically trying to do is calcNormFactors for just the inputs > and then copy them to be the NormFactors for the p and r samples. Is > copying them like this telling edgeR to re-compute the library sizes? It > doesn't seem to be happening because the lib.size before and after these > commands are the same. > > Best, > Scott Daniel > Zarnescu Lab > MCB Ph.D. Program > University of Arizona ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 10.0 years ago Gordon Smyth 50k

0

Entering edit mode

Hi, First off, I determined that assigning the normalization factors manually as-it-were does affect the results (# of DE genes). I did this by comparing it to a set where I just set p and r to a scaling factor of 1. The library sizes don't change in the y$samples but the results change. To go back to the general discussion of why we did ad-hoc scaling, I'll try to explain. The 'input' I am speaking of is the RNA sample preceding a polysome fractionation. We then take the resultant products and split them into 'RNP' (which includes the monosome, free RNA and ribo-nucelo- proteins) and 'Poly' (which includes actively-translating, polysomal RNA). We've then done RNA-seq on all three samples (input, rnp, poly). The idea is that we scale the poly and rnp counts to the input using a simple linear model. We had another mathematician explain it like this: "So then what you have to do is basically construct a mx2 matrix A, where m = total number of genes. Then you fill the first column of A with the counts for all genes in your polysome sub-sample, and the second column of A with the counts for all genes in your RNP sub-sample. The you make a mx1 matrix Y, which is the counts for all genes in the input sub-sample. Then you find the least-squares solution (in R I think it's lsqfit(A,Y) or something) to the equation Y = Ax and solve for x, which will be a 2x1 vector with your scaling factors n_p and n_r. Then, you replace your polysome counts with P*n_p and your RNP counts with R*n_r" So, since we have done this manual scaling, our mathematician recommended we calculate normalization factors for input only and then apply these factors to the corresponding polysome and rnp samples. Thanks. Best, Scott Daniel Zarnescu Lab MCB Ph.D. Program University of Arizona On Thu, Apr 17, 2014 at 6:26 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Scott, > > Normalization scaling factors are specific to each library, so you cannot > meaningfully compute them for one set of libraries and copy the results to > a larger set of libraries. > > If you try to explain what you are actually trying to achieve, we might be > able to give you some advice. > > As it is, I don't yet understand why you aren't simply using > calcNormFactors() in the way it is intended, which would be to apply it to > all libraries at once, or why you need to do any prior ad hoc scaling. > > Not sure what you mean by an 'input' sample, because 'input' is usually > associated with ChIP-seq rather than RNA-seq. > > Best wishes > Gordon > > Date: Tue, 15 Apr 2014 16:40:53 -0700 >> From: Scott Daniel <scottdaniel@email.arizona.edu> >> To: bioconductor@r-project.org >> Subject: [BioC] edgeR and calcNormFactors manually >> >> Dear Colleagues, >> >> I have an RNA-seq library with 27 samples. 9 of these samples are what we >> call input as the other 18 are derived from them through a biochemical >> process. As such, I have already scaled the genes in those 18 samples to >> roughly add-up to the corresponding inputs. >> >> To manage this at the calcNormFactors step I have the following code: >> y$samples >> y_inputs=y$counts[,grep(".[i]..",colnames(y$counts))] >> norm_factor_inputs <- calcNormFactors(y_inputs) >> y$samples$norm.factors[grep(".[i]..",row.names(y$samples))] = >> norm_factor_inputs >> y$samples$norm.factors[grep(".[p]..",row.names(y$samples))] = >> norm_factor_inputs >> y$samples$norm.factors[grep(".[r]..",row.names(y$samples))] = >> norm_factor_inputs >> y$samples >> >> So what I'm basically trying to do is calcNormFactors for just the inputs >> and then copy them to be the NormFactors for the p and r samples. Is >> copying them like this telling edgeR to re-compute the library sizes? It >> doesn't seem to be happening because the lib.size before and after these >> commands are the same. >> >> Best, >> Scott Daniel >> Zarnescu Lab >> MCB Ph.D. Program >> University of Arizona >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}

ADD REPLY • link 10.0 years ago Scott Daniel ▴ 30

0

Entering edit mode

Dear Scott, I'm afraid I still have little idea of what you're trying to achieve. I'm not even quite sure whether you are looking for differential exprssion and, if so, between which samples. Your mathematician is describing least squares regression though the origin. I don't see that that would be sensible thing to do in the edgeR context. Have you considered ?normalizeChIPtoInput or ?calcNormOffsetsforChIP Best Gordon On Fri, 25 Apr 2014, Scott Daniel wrote: > Hi, > > First off, I determined that assigning the normalization factors manually > as-it-were does affect the results (# of DE genes). I did this by comparing > it to a set where I just set p and r to a scaling factor of 1. The library > sizes don't change in the y$samples but the results change. > > To go back to the general discussion of why we did ad-hoc scaling, I'll try > to explain. The 'input' I am speaking of is the RNA sample preceding a > polysome fractionation. We then take the resultant products and split them > into 'RNP' (which includes the monosome, free RNA and ribo-nucelo- proteins) > and 'Poly' (which includes actively-translating, polysomal RNA). We've then > done RNA-seq on all three samples (input, rnp, poly). > > The idea is that we scale the poly and rnp counts to the input using a > simple linear model. We had another mathematician explain it like this: > "So then what you have to do is basically construct a mx2 matrix A, where m > = total number of genes. Then you fill the first column of A with the > counts for all genes in your polysome sub-sample, and the second column of > A with the counts for all genes in your RNP sub-sample. The you make a mx1 > matrix Y, which is the counts for all genes in the input sub-sample. Then > you find the least-squares solution (in R I think it's lsqfit(A,Y) or > something) to the equation Y = Ax and solve for x, which will be a 2x1 > vector with your scaling factors n_p and n_r. Then, you replace your > polysome counts with P*n_p and your RNP counts with R*n_r" > > So, since we have done this manual scaling, our mathematician recommended > we calculate normalization factors for input only and then apply these > factors to the corresponding polysome and rnp samples. > > Thanks. > > Best, > Scott Daniel > Zarnescu Lab > MCB Ph.D. Program > University of Arizona > > > On Thu, Apr 17, 2014 at 6:26 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Dear Scott, >> >> Normalization scaling factors are specific to each library, so you cannot >> meaningfully compute them for one set of libraries and copy the results to >> a larger set of libraries. >> >> If you try to explain what you are actually trying to achieve, we might be >> able to give you some advice. >> >> As it is, I don't yet understand why you aren't simply using >> calcNormFactors() in the way it is intended, which would be to apply it to >> all libraries at once, or why you need to do any prior ad hoc scaling. >> >> Not sure what you mean by an 'input' sample, because 'input' is usually >> associated with ChIP-seq rather than RNA-seq. >> >> Best wishes >> Gordon >> >> Date: Tue, 15 Apr 2014 16:40:53 -0700 >>> From: Scott Daniel <scottdaniel at="" email.arizona.edu=""> >>> To: bioconductor at r-project.org >>> Subject: [BioC] edgeR and calcNormFactors manually >>> >>> Dear Colleagues, >>> >>> I have an RNA-seq library with 27 samples. 9 of these samples are what we >>> call input as the other 18 are derived from them through a biochemical >>> process. As such, I have already scaled the genes in those 18 samples to >>> roughly add-up to the corresponding inputs. >>> >>> To manage this at the calcNormFactors step I have the following code: >>> y$samples >>> y_inputs=y$counts[,grep(".[i]..",colnames(y$counts))] >>> norm_factor_inputs <- calcNormFactors(y_inputs) >>> y$samples$norm.factors[grep(".[i]..",row.names(y$samples))] = >>> norm_factor_inputs >>> y$samples$norm.factors[grep(".[p]..",row.names(y$samples))] = >>> norm_factor_inputs >>> y$samples$norm.factors[grep(".[r]..",row.names(y$samples))] = >>> norm_factor_inputs >>> y$samples >>> >>> So what I'm basically trying to do is calcNormFactors for just the inputs >>> and then copy them to be the NormFactors for the p and r samples. Is >>> copying them like this telling edgeR to re-compute the library sizes? It >>> doesn't seem to be happening because the lib.size before and after these >>> commands are the same. >>> >>> Best, >>> Scott Daniel >>> Zarnescu Lab >>> MCB Ph.D. Program >>> University of Arizona ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 10.0 years ago Gordon Smyth 50k