edgeR: calcNormFactors question

0

Entering edit mode

gowtham ▴ 210

@gowtham-5301

Last seen 9.7 years ago

Hi Everyone, I am analyzing a RNAseq experiment with two groups each having two replicates. One out of 4 libraries have only half as much reads mapping to genome. Lib Fe+.1 has only 4 million reads while other are 9 million +. But still the norm.factors are not much different. With my naive understanding i expect Fe+.1 to be very different from others. I would like to know if what I see is okay? > oldsetDGE <- calcNormFactors(oldsetDGE) > oldsetDGE$samples group lib.size norm.factors fe-.1 2 9664343 0.9865411 fe-.2 2 11248827 1.0812947 fe+.1 1 4194124 0.9662389 fe+.2 1 9963626 0.9701888 Thanks very much, Gowthaman -- Gowthaman Bioinformatics Systems Programmer. SBRI, 307 West lake Ave N Suite 500 Seattle, WA. 98109-5219 Phone : LAB 206-256-7188 (direct). [[alternative HTML version deleted]]

RNASeq RNASeq • 1.6k views

ADD COMMENT • link 11.9 years ago gowtham ▴ 210

0

Entering edit mode

gowtham ▴ 210

@gowtham-5301

Last seen 9.7 years ago

Was meaning to say the following as a part of first email: I understand, norm.factor calculated by edgeR considers information more than library size. I understand the issue of limited "real estate" is factored in as well. So, I assume, what is see may be okay. But, still would like to get some second opinion for experts. Thanks again, Gowthaman On Thu, Jun 21, 2012 at 4:07 PM, gowtham <ragowthaman@gmail.com> wrote: > Hi Everyone, > I am analyzing a RNAseq experiment with two groups each having two > replicates. One out of 4 libraries have only half as much reads mapping to > genome. > > Lib Fe+.1 has only 4 million reads while other are 9 million +. But still > the norm.factors are not much different. With my naive understanding i > expect Fe+.1 to be very different from others. I would like to know if what > I see is okay? > > > oldsetDGE <- calcNormFactors(oldsetDGE) > > oldsetDGE$samples > group lib.size norm.factors > fe-.1 2 9664343 0.9865411 > fe-.2 2 11248827 1.0812947 > fe+.1 1 4194124 0.9662389 > fe+.2 1 9963626 0.9701888 > > > Thanks very much, > Gowthaman > -- > Gowthaman > > Bioinformatics Systems Programmer. > SBRI, 307 West lake Ave N Suite 500 > Seattle, WA. 98109-5219 > Phone : LAB 206-256-7188 (direct). > -- Gowthaman Bioinformatics Systems Programmer. SBRI, 307 West lake Ave N Suite 500 Seattle, WA. 98109-5219 Phone : LAB 206-256-7188 (direct). [[alternative HTML version deleted]]

ADD COMMENT • link 11.9 years ago gowtham ▴ 210

0

Entering edit mode

gowtham ▴ 210

@gowtham-5301

Last seen 9.7 years ago

Sorry about repeated mailing: I have attached a smear plot of the data incase that helps anyone attempting to answer my doubt..... On Thu, Jun 21, 2012 at 4:07 PM, gowtham <ragowthaman at="" gmail.com=""> wrote: > Hi Everyone, > I am analyzing a RNAseq experiment with two groups each having two > replicates. One out of 4 libraries have only half as much reads mapping to > genome. > > Lib Fe+.1 has only 4 million reads while other are 9 million +. But still > the norm.factors are not much different. With my naive understanding i > expect Fe+.1 to be very different from others. I would like to know if what > I see is okay? > > > oldsetDGE <- calcNormFactors(oldsetDGE) > > oldsetDGE$samples > group lib.size norm.factors > fe-.1 2 9664343 0.9865411 > fe-.2 2 11248827 1.0812947 > fe+.1 1 4194124 0.9662389 > fe+.2 1 9963626 0.9701888 > > > Thanks very much, > Gowthaman > -- > Gowthaman > > Bioinformatics Systems Programmer. > SBRI, 307 West lake Ave N Suite 500 > Seattle, WA. 98109-5219 > Phone : LAB 206-256-7188 (direct). > -- Gowthaman Bioinformatics Systems Programmer. SBRI, 307 West lake Ave N Suite 500 Seattle, WA. 98109-5219 Phone : LAB 206-256-7188 (direct).

ADD COMMENT • link 11.9 years ago gowtham ▴ 210

0

Entering edit mode

Hi Gowthaman Your output looks fine. What is more important is that library size is taken into account as an offset later on when you fit the glm. See help(glmFit). Cheers, Belinda -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of gowtham Sent: Friday, 22 June 2012 9:40 AM To: bioconductor Subject: Re: [BioC] edgeR: calcNormFactors question Sorry about repeated mailing: I have attached a smear plot of the data incase that helps anyone attempting to answer my doubt..... On Thu, Jun 21, 2012 at 4:07 PM, gowtham <ragowthaman at="" gmail.com=""> wrote: > Hi Everyone, > I am analyzing a RNAseq experiment with two groups each having two > replicates. One out of 4 libraries have only half as much reads > mapping to genome. > > Lib Fe+.1 has only 4 million reads while other are 9 million +. But > still the norm.factors are not much different. With my naive > understanding i expect Fe+.1 to be very different from others. I would > like to know if what I see is okay? > > > oldsetDGE <- calcNormFactors(oldsetDGE) oldsetDGE$samples > group lib.size norm.factors > fe-.1 2 9664343 0.9865411 > fe-.2 2 11248827 1.0812947 > fe+.1 1 4194124 0.9662389 > fe+.2 1 9963626 0.9701888 > > > Thanks very much, > Gowthaman > -- > Gowthaman > > Bioinformatics Systems Programmer. > SBRI, 307 West lake Ave N Suite 500 > Seattle, WA. 98109-5219 > Phone : LAB 206-256-7188 (direct). > -- Gowthaman Bioinformatics Systems Programmer. SBRI, 307 West lake Ave N Suite 500 Seattle, WA. 98109-5219 Phone : LAB 206-256-7188 (direct). ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 11.9 years ago Belinda Phipson ▴ 130

0

Entering edit mode

Thanks very much Belinda. That is comforting. My DGEList object has library sizes added to it. Do I still need to supply a numeric vector with library sizes while fiting glm? Or is it automatically pulled from DGEList object? Reading help, i understand its automatic. Please advice me if I am wrong. " If y is a DGEList object then the default for lib.size is the product of the library sizes and the normalization factors (in the samples slot of the object). " Thanks, Gowthaman On Thu, Jun 21, 2012 at 4:58 PM, Belinda Phipson <phipson@wehi.edu.au>wrote: > Hi Gowthaman > > Your output looks fine. What is more important is that library size is > taken into account as an offset later on when you fit the glm. See > help(glmFit). > > Cheers, > Belinda > > -----Original Message----- > From: bioconductor-bounces@r-project.org [mailto: > bioconductor-bounces@r-project.org] On Behalf Of gowtham > Sent: Friday, 22 June 2012 9:40 AM > To: bioconductor > Subject: Re: [BioC] edgeR: calcNormFactors question > > Sorry about repeated mailing: I have attached a smear plot of the data > incase that helps anyone attempting to answer my doubt..... > > > On Thu, Jun 21, 2012 at 4:07 PM, gowtham <ragowthaman@gmail.com> wrote: > > > Hi Everyone, > > I am analyzing a RNAseq experiment with two groups each having two > > replicates. One out of 4 libraries have only half as much reads > > mapping to genome. > > > > Lib Fe+.1 has only 4 million reads while other are 9 million +. But > > still the norm.factors are not much different. With my naive > > understanding i expect Fe+.1 to be very different from others. I would > > like to know if what I see is okay? > > > > > oldsetDGE <- calcNormFactors(oldsetDGE) oldsetDGE$samples > > group lib.size norm.factors > > fe-.1 2 9664343 0.9865411 > > fe-.2 2 11248827 1.0812947 > > fe+.1 1 4194124 0.9662389 > > fe+.2 1 9963626 0.9701888 > > > > > > Thanks very much, > > Gowthaman > > -- > > Gowthaman > > > > Bioinformatics Systems Programmer. > > SBRI, 307 West lake Ave N Suite 500 > > Seattle, WA. 98109-5219 > > Phone : LAB 206-256-7188 (direct). > > > > > > -- > Gowthaman > > Bioinformatics Systems Programmer. > SBRI, 307 West lake Ave N Suite 500 > Seattle, WA. 98109-5219 > Phone : LAB 206-256-7188 (direct). > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:20}}

ADD REPLY • link 11.9 years ago gowtham ▴ 210

0

Entering edit mode

Hi Belinda, I think, i am bit confused now. The help document suggest, i should use only one of "offset" and "lib.size". Seems like both of them take the library size into account. And sounds like "offset" has a preference when both are supplied. So, my question is do I have to explicitly ask for one or other? And do I have to explicitly give it a value? fit <- glmFit(d, design) OR fit <- glmFit(d, design, offset=NULL) OR fit <- glmFit(d, design, lib.size=c(9664343, 11248827, 4194124, 9963626)) should I supply some values for "lib.sizes". Note, my DGEList already has library size information in it. Once again thanks for your answer and pointer to glmFit. Gowthaman On Fri, Jun 22, 2012 at 2:18 AM, gowtham <ragowthaman@gmail.com> wrote: > Thanks very much Belinda. That is comforting. > > My DGEList object has library sizes added to it. Do I still need to supply > a numeric vector with library sizes while fiting glm? Or is it > automatically pulled from DGEList object? > > Reading help, i understand its automatic. Please advice me if I am wrong. > " If y is a DGEList object then the default for lib.size is the product > of the library sizes and the normalization factors (in the samples slot > of the object). " > > Thanks, > Gowthaman > > > > > On Thu, Jun 21, 2012 at 4:58 PM, Belinda Phipson <phipson@wehi.edu.au>wrote: > >> Hi Gowthaman >> >> Your output looks fine. What is more important is that library size is >> taken into account as an offset later on when you fit the glm. See >> help(glmFit). >> >> Cheers, >> Belinda >> >> -----Original Message----- >> From: bioconductor-bounces@r-project.org [mailto: >> bioconductor-bounces@r-project.org] On Behalf Of gowtham >> Sent: Friday, 22 June 2012 9:40 AM >> To: bioconductor >> Subject: Re: [BioC] edgeR: calcNormFactors question >> >> Sorry about repeated mailing: I have attached a smear plot of the data >> incase that helps anyone attempting to answer my doubt..... >> >> >> On Thu, Jun 21, 2012 at 4:07 PM, gowtham <ragowthaman@gmail.com> wrote: >> >> > Hi Everyone, >> > I am analyzing a RNAseq experiment with two groups each having two >> > replicates. One out of 4 libraries have only half as much reads >> > mapping to genome. >> > >> > Lib Fe+.1 has only 4 million reads while other are 9 million +. But >> > still the norm.factors are not much different. With my naive >> > understanding i expect Fe+.1 to be very different from others. I would >> > like to know if what I see is okay? >> > >> > > oldsetDGE <- calcNormFactors(oldsetDGE) oldsetDGE$samples >> > group lib.size norm.factors >> > fe-.1 2 9664343 0.9865411 >> > fe-.2 2 11248827 1.0812947 >> > fe+.1 1 4194124 0.9662389 >> > fe+.2 1 9963626 0.9701888 >> > >> > >> > Thanks very much, >> > Gowthaman >> > -- >> > Gowthaman >> > >> > Bioinformatics Systems Programmer. >> > SBRI, 307 West lake Ave N Suite 500 >> > Seattle, WA. 98109-5219 >> > Phone : LAB 206-256-7188 (direct). >> > >> >> >> >> -- >> Gowthaman >> >> Bioinformatics Systems Programmer. >> SBRI, 307 West lake Ave N Suite 500 >> Seattle, WA. 98109-5219 >> Phone : LAB 206-256-7188 (direct). >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ >> > > > > -- > Gowthaman > > Bioinformatics Systems Programmer. > SBRI, 307 West lake Ave N Suite 500 > Seattle, WA. 98109-5219 > Phone : LAB 206-256-7188 (direct). > -- Gowthaman Bioinformatics Systems Programmer. SBRI, 307 West lake Ave N Suite 500 Seattle, WA. 98109-5219 Phone : LAB 206-256-7188 (direct). [[alternative HTML version deleted]]

ADD REPLY • link 11.9 years ago gowtham ▴ 210

0

Entering edit mode

Hi Gowthaman, You shouldn't manually specify the offset in glmFit(), unless you have a specific need to. Short answer, you should use: fit <- glmFit(d, design) >>>> Lib Fe+.1 has only 4 million reads while other are 9 million +. But >>>> still the norm.factors are not much different. With my naive >>>> understanding i expect Fe+.1 to be very different from others. I would >>>> like to know if what I see is okay? This is ok, since the offset used in the downstream modeling is actually the product of the lib.size and norm.factors columns. Best, Mark ---------- Prof. Dr. Mark Robinson Bioinformatics Institute of Molecular Life Sciences University of Zurich Winterthurerstrasse 190 8057 Zurich Switzerland v: +41 44 635 4848 f: +41 44 635 6898 e: mark.robinson at imls.uzh.ch o: Y11-J-16 w: http://tiny.cc/mrobin ---------- http://www.fgcz.ch/Bioconductor2012 On 22.06.2012, at 11:31, gowtham wrote: > Hi Belinda, > I think, i am bit confused now. The help document suggest, i should use > only one of "offset" and "lib.size". Seems like both of them take the > library size into account. And sounds like "offset" has a preference when > both are supplied. > > So, my question is do I have to explicitly ask for one or other? And do I > have to explicitly give it a value? > > > fit <- glmFit(d, design) > > OR > > > fit <- glmFit(d, design, offset=NULL) > > OR > > fit <- glmFit(d, design, lib.size=c(9664343, 11248827, 4194124, 9963626)) > > should I supply some values for "lib.sizes". Note, my DGEList already has > library size information in it. > > > Once again thanks for your answer and pointer to glmFit. > Gowthaman > > On Fri, Jun 22, 2012 at 2:18 AM, gowtham <ragowthaman at="" gmail.com=""> wrote: > >> Thanks very much Belinda. That is comforting. >> >> My DGEList object has library sizes added to it. Do I still need to supply >> a numeric vector with library sizes while fiting glm? Or is it >> automatically pulled from DGEList object? >> >> Reading help, i understand its automatic. Please advice me if I am wrong. >> " If y is a DGEList object then the default for lib.size is the product >> of the library sizes and the normalization factors (in the samples slot >> of the object). " >> >> Thanks, >> Gowthaman >> >> >> >> >> On Thu, Jun 21, 2012 at 4:58 PM, Belinda Phipson <phipson at="" wehi.edu.au="">wrote: >> >>> Hi Gowthaman >>> >>> Your output looks fine. What is more important is that library size is >>> taken into account as an offset later on when you fit the glm. See >>> help(glmFit). >>> >>> Cheers, >>> Belinda >>> >>> -----Original Message----- >>> From: bioconductor-bounces at r-project.org [mailto: >>> bioconductor-bounces at r-project.org] On Behalf Of gowtham >>> Sent: Friday, 22 June 2012 9:40 AM >>> To: bioconductor >>> Subject: Re: [BioC] edgeR: calcNormFactors question >>> >>> Sorry about repeated mailing: I have attached a smear plot of the data >>> incase that helps anyone attempting to answer my doubt..... >>> >>> >>> On Thu, Jun 21, 2012 at 4:07 PM, gowtham <ragowthaman at="" gmail.com=""> wrote: >>> >>>> Hi Everyone, >>>> I am analyzing a RNAseq experiment with two groups each having two >>>> replicates. One out of 4 libraries have only half as much reads >>>> mapping to genome. >>>> >>>> Lib Fe+.1 has only 4 million reads while other are 9 million +. But >>>> still the norm.factors are not much different. With my naive >>>> understanding i expect Fe+.1 to be very different from others. I would >>>> like to know if what I see is okay? >>>> >>>>> oldsetDGE <- calcNormFactors(oldsetDGE) oldsetDGE$samples >>>> group lib.size norm.factors >>>> fe-.1 2 9664343 0.9865411 >>>> fe-.2 2 11248827 1.0812947 >>>> fe+.1 1 4194124 0.9662389 >>>> fe+.2 1 9963626 0.9701888 >>>> >>>> >>>> Thanks very much, >>>> Gowthaman >>>> -- >>>> Gowthaman >>>> >>>> Bioinformatics Systems Programmer. >>>> SBRI, 307 West lake Ave N Suite 500 >>>> Seattle, WA. 98109-5219 >>>> Phone : LAB 206-256-7188 (direct). >>>> >>> >>> >>> >>> -- >>> Gowthaman >>> >>> Bioinformatics Systems Programmer. >>> SBRI, 307 West lake Ave N Suite 500 >>> Seattle, WA. 98109-5219 >>> Phone : LAB 206-256-7188 (direct). >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the >>> addressee. >>> You must not disclose, forward, print or use it without the permission of >>> the sender. >>> ______________________________________________________________________ >>> >> >> >> >> -- >> Gowthaman >> >> Bioinformatics Systems Programmer. >> SBRI, 307 West lake Ave N Suite 500 >> Seattle, WA. 98109-5219 >> Phone : LAB 206-256-7188 (direct). >> > > > > -- > Gowthaman > > Bioinformatics Systems Programmer. > SBRI, 307 West lake Ave N Suite 500 > Seattle, WA. 98109-5219 > Phone : LAB 206-256-7188 (direct). > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.9 years ago Mark Robinson ▴ 880

0

Entering edit mode

Thats fantastic. It feels good when I understand whats happening under the hood. Thanks so much Mark. Gowthaman On Fri, Jun 22, 2012 at 2:50 AM, Mark Robinson <mark.robinson@imls.uzh.ch>wrote: > Hi Gowthaman, > > You shouldn't manually specify the offset in glmFit(), unless you have a > specific need to. Short answer, you should use: > > fit <- glmFit(d, design) > > > >>>> Lib Fe+.1 has only 4 million reads while other are 9 million +. But > >>>> still the norm.factors are not much different. With my naive > >>>> understanding i expect Fe+.1 to be very different from others. I would > >>>> like to know if what I see is okay? > > This is ok, since the offset used in the downstream modeling is actually > the product of the lib.size and norm.factors columns. > > Best, > Mark > > ---------- > Prof. Dr. Mark Robinson > Bioinformatics > Institute of Molecular Life Sciences > University of Zurich > Winterthurerstrasse 190 > 8057 Zurich > Switzerland > > v: +41 44 635 4848 > f: +41 44 635 6898 > e: mark.robinson@imls.uzh.ch > o: Y11-J-16 > w: http://tiny.cc/mrobin > > ---------- > http://www.fgcz.ch/Bioconductor2012 > > On 22.06.2012, at 11:31, gowtham wrote: > > > Hi Belinda, > > I think, i am bit confused now. The help document suggest, i should use > > only one of "offset" and "lib.size". Seems like both of them take the > > library size into account. And sounds like "offset" has a preference when > > both are supplied. > > > > So, my question is do I have to explicitly ask for one or other? And do I > > have to explicitly give it a value? > > > > > > fit <- glmFit(d, design) > > > > OR > > > > > > fit <- glmFit(d, design, offset=NULL) > > > > OR > > > > fit <- glmFit(d, design, lib.size=c(9664343, 11248827, 4194124, 9963626)) > > > > should I supply some values for "lib.sizes". Note, my DGEList already has > > library size information in it. > > > > > > Once again thanks for your answer and pointer to glmFit. > > Gowthaman > > > > On Fri, Jun 22, 2012 at 2:18 AM, gowtham <ragowthaman@gmail.com> wrote: > > > >> Thanks very much Belinda. That is comforting. > >> > >> My DGEList object has library sizes added to it. Do I still need to > supply > >> a numeric vector with library sizes while fiting glm? Or is it > >> automatically pulled from DGEList object? > >> > >> Reading help, i understand its automatic. Please advice me if I am > wrong. > >> " If y is a DGEList object then the default for lib.size is the product > >> of the library sizes and the normalization factors (in the samples slot > >> of the object). " > >> > >> Thanks, > >> Gowthaman > >> > >> > >> > >> > >> On Thu, Jun 21, 2012 at 4:58 PM, Belinda Phipson <phipson@wehi.edu.au> >wrote: > >> > >>> Hi Gowthaman > >>> > >>> Your output looks fine. What is more important is that library size is > >>> taken into account as an offset later on when you fit the glm. See > >>> help(glmFit). > >>> > >>> Cheers, > >>> Belinda > >>> > >>> -----Original Message----- > >>> From: bioconductor-bounces@r-project.org [mailto: > >>> bioconductor-bounces@r-project.org] On Behalf Of gowtham > >>> Sent: Friday, 22 June 2012 9:40 AM > >>> To: bioconductor > >>> Subject: Re: [BioC] edgeR: calcNormFactors question > >>> > >>> Sorry about repeated mailing: I have attached a smear plot of the data > >>> incase that helps anyone attempting to answer my doubt..... > >>> > >>> > >>> On Thu, Jun 21, 2012 at 4:07 PM, gowtham <ragowthaman@gmail.com> > wrote: > >>> > >>>> Hi Everyone, > >>>> I am analyzing a RNAseq experiment with two groups each having two > >>>> replicates. One out of 4 libraries have only half as much reads > >>>> mapping to genome. > >>>> > >>>> Lib Fe+.1 has only 4 million reads while other are 9 million +. But > >>>> still the norm.factors are not much different. With my naive > >>>> understanding i expect Fe+.1 to be very different from others. I would > >>>> like to know if what I see is okay? > >>>> > >>>>> oldsetDGE <- calcNormFactors(oldsetDGE) oldsetDGE$samples > >>>> group lib.size norm.factors > >>>> fe-.1 2 9664343 0.9865411 > >>>> fe-.2 2 11248827 1.0812947 > >>>> fe+.1 1 4194124 0.9662389 > >>>> fe+.2 1 9963626 0.9701888 > >>>> > >>>> > >>>> Thanks very much, > >>>> Gowthaman > >>>> -- > >>>> Gowthaman > >>>> > >>>> Bioinformatics Systems Programmer. > >>>> SBRI, 307 West lake Ave N Suite 500 > >>>> Seattle, WA. 98109-5219 > >>>> Phone : LAB 206-256-7188 (direct). > >>>> > >>> > >>> > >>> > >>> -- > >>> Gowthaman > >>> > >>> Bioinformatics Systems Programmer. > >>> SBRI, 307 West lake Ave N Suite 500 > >>> Seattle, WA. 98109-5219 > >>> Phone : LAB 206-256-7188 (direct). > >>> > >>> > >>> ______________________________________________________________________ > >>> The information in this email is confidential and intended solely for > the > >>> addressee. > >>> You must not disclose, forward, print or use it without the permission > of > >>> the sender. > >>> ______________________________________________________________________ > >>> > >> > >> > >> > >> -- > >> Gowthaman > >> > >> Bioinformatics Systems Programmer. > >> SBRI, 307 West lake Ave N Suite 500 > >> Seattle, WA. 98109-5219 > >> Phone : LAB 206-256-7188 (direct). > >> > > > > > > > > -- > > Gowthaman > > > > Bioinformatics Systems Programmer. > > SBRI, 307 West lake Ave N Suite 500 > > Seattle, WA. 98109-5219 > > Phone : LAB 206-256-7188 (direct). > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Gowthaman Bioinformatics Systems Programmer. SBRI, 307 West lake Ave N Suite 500 Seattle, WA. 98109-5219 Phone : LAB 206-256-7188 (direct). [[alternative HTML version deleted]]

ADD REPLY • link 11.9 years ago gowtham ▴ 210

Login before adding your answer.