Filtering before differential analysis

0

Entering edit mode

Sally ▴ 250

@sally-2430

Last seen 11.1 years ago

Is flagging the same as filtering? In my Limma script it takes only those ESTs with a flag of 0 (which are good spots). myfun <- function(x) as.numeric ( x$Flag >0) Is this not the same as filtering? If I actually remove the absent spots from my Imagene files, then the files each have different lengths and the order of genes is not the same in each file. Sally [[alternative HTML version deleted]]

limma limma • 936 views

ADD COMMENT • link updated 16.7 years ago by Jenny Drnevich ★ 2.0k • written 16.8 years ago by Sally ▴ 250

0

Entering edit mode

Jenny Drnevich ★ 2.0k

@jenny-drnevich-2812

Last seen 9 months ago

United States

Hi Sally, Your script is transferring the flags to weights, and in your script, only ESTs with a flag of 0 get a weight of 1, and all other spots get a weight of 0, which means they are not used at all in the analysis. So yes, you are in effect "filtering" out these individual spots by setting the weights to 0, which is exactly what Gordon said you should not do. I second this opinion for the following reason: a "bad" spot is a spot where you had no information whatsoever on what the expression level might have been, so the number that you get (because you always get a number) has no relationship at all to what the real value was and so you should throw it out by giving it a weight of 0. However, if you don't measure anything above background for a particular spot (which GenePix will flag -50), it's not a "bad" spot, because you do have useful information that the expression level is below detection, and the number that you get will be relatively valid compared to other spots that had detectable expression. Would you throw out values of 0 if you got them in any other scientific measurement? Likely not, so why throw them out here? Cheers, Jenny At 11:30 AM 1/15/2009, Sally wrote: >Is flagging the same as filtering? In my Limma script it takes only >those ESTs with a flag of 0 (which are good spots). > >myfun <- function(x) as.numeric ( x$Flag >0) > > Is this not the same as filtering? If I actually remove the > absent spots from my Imagene files, then the files each have > different lengths and the order of genes is not the same in each file. > >Sally > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu

ADD COMMENT • link 16.7 years ago Jenny Drnevich ★ 2.0k

0

Entering edit mode

Jenny Drnevich ★ 2.0k

@jenny-drnevich-2812

Last seen 9 months ago

United States

Hi Sally, I'll answer both your e-mails here and post it back to the list, so the answers can become part of the archives. At 07:34 PM 1/17/2009, Sally wrote: >Hi Jenny, > >Do you mean I should not have a myfun script? Should I be giving >weights to spots at all? I'm abit confused as to what I should do. That depends on what the flags are in your data files... >Thanks for the reply. I used Imagene to scan my slides. Imagene is >fairly primative. It only has 3 flags: 0, 1, 2. 0 is a 'good' >spot, 1 is a spot which was marked as no good by the user during >gridding, and 2 is a bad spot. Are you saying I should not 'filter >out' these spots (before computing DE genes) using the script > >myfun <- function(x) as.numeric ( x$Flag >0)? > >When you say Gordon says not to flag out [?filter out] spots (in my >case with Imagene) using > >myfun <- function(x) as.numeric ( x$Flag >0). > >How should I re-write this script to include all spots? You should filter out spots that the user marked as no good during the gridding (dust spots, scratches, etc.), but not the spots the program automatically marked as bad (usually not above background). So if you want to give a weight of 1 to all the spots that have flags not equal to 1, then your function is: myfun <- function(x) as.numeric(x$Flag != 1) Good luck, Jenny >Take care, > >Sally >----- Original Message ----- From: "Jenny Drnevich" <drnevich at="" illinois.edu=""> >To: "Sally" <sagoldes at="" shaw.ca="">; <bioconductor at="" stat.math.ethz.ch=""> >Cc: "Sally" <sagoldes at="" shaw.ca=""> >Sent: Friday, January 16, 2009 7:55 AM >Subject: Re: [BioC] Filtering before differential analysis > > >>Hi Sally, >> >>Your script is transferring the flags to weights, and in your >>script, only ESTs with a flag of 0 get a weight of 1, and all other >>spots get a weight of 0, which means they are not used at all in >>the analysis. So yes, you are in effect "filtering" out these >>individual spots by setting the weights to 0, which is exactly what >>Gordon said you should not do. I second this opinion for the >>following reason: a "bad" spot is a spot where you had no >>information whatsoever on what the expression level might have >>been, so the number that you get (because you always get a number) >>has no relationship at all to what the real value was and so you >>should throw it out by giving it a weight of 0. However, if you >>don't measure anything above background for a particular spot >>(which GenePix will flag -50), it's not a "bad" spot, because you >>do have useful information that the expression level is below >>detection, and the number that you get will be relatively valid >>compared to other spots that had detectable expression. Would you >>throw out values of 0 if you got them in any other scientific >>measurement? Likely not, so why throw them out here? >> >>Cheers, >>Jenny >> >>At 11:30 AM 1/15/2009, Sally wrote: >>>Is flagging the same as filtering? In my Limma script it takes >>>only those ESTs with a flag of 0 (which are good spots). >>> >>>myfun <- function(x) as.numeric ( x$Flag >0) >>> >>> Is this not the same as filtering? If I actually remove the >>> absent spots from my Imagene files, then the files each have >>> different lengths and the order of genes is not the same in each file. >>> >>>Sally >>> [[alternative HTML version deleted]] >>> >>>_______________________________________________ >>>Bioconductor mailing list >>>Bioconductor at stat.math.ethz.ch >>>https://stat.ethz.ch/mailman/listinfo/bioconductor >>>Search the archives: >>>http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>Jenny Drnevich, Ph.D. >> >>Functional Genomics Bioinformatics Specialist >>W.M. Keck Center for Comparative and Functional Genomics >>Roy J. Carver Biotechnology Center >>University of Illinois, Urbana-Champaign >> >>330 ERML >>1201 W. Gregory Dr. >>Urbana, IL 61801 >>USA >> >>ph: 217-244-7355 >>fax: 217-265-5066 >>e-mail: drnevich at illinois.edu Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 Ph: 217-244-7355 FAX: 217-265-5066 Email: drnevich at uiuc.edu

ADD COMMENT • link 16.7 years ago Jenny Drnevich ★ 2.0k

Login before adding your answer.