Exclude probes that show sd above 0.1 between replicatevalues

0

Entering edit mode

Oosting, J. PATH ▴ 550

@oosting-j-path-412

Last seen 9.7 years ago

Hi Jo?o Here's how I did it on a set of arrays with 3 replicates. ID is a column that has identical values for replicate spots. spotaverage<-function(x) { # at max 1 NA, and have low variability if sumis.na(x))<2 && sd(x,na.rm=TRUE)<0.1) median(x,na.rm=TRUE) else NA } aggcollumn<-function(x) { agg<-aggregate(x,list(genes=MAn$genes[,"ID"]),FUN=spotaverage) ac<-as.numeric(agg[,2]) names(ac)<-agg[,1] ac } avg.m<-apply(MAn$M,2,aggcollumn) > > Dear list, > > I have a MAList and I want to exclude probes that show a > standard deviation above 0.1 between the replicate values. > The number of within-array replicates for each probe (ndups) > is equal to 2 and the number of spots to step from a probe to > its duplicate (spacing) is equal to 1. > I am not able to to this. Can somebody give me some a hint to > resolve this? > > > Best regards > > Jo?o Fadista > Ph.d. student >

probe probe • 1.0k views

ADD COMMENT • link updated 17.2 years ago by J.delasHeras@ed.ac.uk ★ 1.9k • written 17.2 years ago by Oosting, J. PATH ▴ 550

0

Entering edit mode

J.delasHeras@ed.ac.uk ★ 1.9k

@jdelasherasedacuk-1189

Last seen 8.7 years ago

United Kingdom

If you look at the variation on M values alone (it's a MAList), and throw away those with high variation... that sounds like a reasonable thing to do, except that when you have spots with no signal in only one of the channels, the variation is probably quite high too, and you'd remove them. However, they are probably quite an interesting class of spots to keep (genes that become silenced, or activated, after treatment, not merely down/upregulated). I'm mostly studying experiments when I am interested mostly in these cases of activation/silencing, and not so much in up/downregulation alone. I wonder how people account for these situations... Jose Quoting J.Oosting at lumc.nl: > Hi Jo?o > > Here's how I did it on a set of arrays with 3 replicates. ID is a > column that has identical values for replicate spots. > > spotaverage<-function(x) { > # at max 1 NA, and have low variability > if (sumis.na(x))<2 && sd(x,na.rm=TRUE)<0.1) median(x,na.rm=TRUE) > else NA > } > aggcollumn<-function(x) { > agg<-aggregate(x,list(genes=MAn$genes[,"ID"]),FUN=spotaverage) > ac<-as.numeric(agg[,2]) > names(ac)<-agg[,1] > ac > } > > avg.m<-apply(MAn$M,2,aggcollumn) > > >> >> Dear list, >> >> I have a MAList and I want to exclude probes that show a >> standard deviation above 0.1 between the replicate values. >> The number of within-array replicates for each probe (ndups) >> is equal to 2 and the number of spots to step from a probe to >> its duplicate (spacing) is equal to 1. >> I am not able to to this. Can somebody give me some a hint to >> resolve this? >> >> >> Best regards >> >> Jo?o Fadista >> Ph.d. student >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD COMMENT • link 17.2 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

Hi Jose, IMHO you should use the variability of replicate spots whenever possible. Limma can handle this nicely and for the analysis of differential expression I always leave in the replicate spots, and I let limma handle them. For presentation purposes (i.e. heatmaps) it is usually handy to have averaged values per gene, and I think that removing genes that cannot be measured reliably is a way of improving the visualizations. Any data-manipulation is context dependent, and especially the effects of removing data points should be considered case by case. If you're interested in on/off phenomena you should not remove 'empty' spots. Regards, Jan > > If you look at the variation on M values alone (it's a > MAList), and throw away those with high variation... that > sounds like a reasonable thing to do, except that when you > have spots with no signal in only one of the channels, the > variation is probably quite high too, and you'd remove them. > However, they are probably quite an interesting class of > spots to keep (genes that become silenced, or activated, > after treatment, not merely down/upregulated). > > I'm mostly studying experiments when I am interested mostly > in these cases of activation/silencing, and not so much in > up/downregulation alone. I wonder how people account for > these situations... > > Jose >

ADD REPLY • link 17.2 years ago Oosting, J. PATH ▴ 550

0

Entering edit mode

Hi Jan, not sure if I am understanding. I am with you about the variability of replicate spots... as long as they can be measured reliably, as you say. The question I guess is where do you take these measurements: at the intensity or at the ratio level? If you're looking at the variability based on ratios (M values), replicate spots with no signal in one channel tend to have wildly varying M values (all quite high, in absolute value). Wouldn't a filtering based solely on variation at M value level discard those spots? For these kind of spots the M value is irrelevant (I mean, how much is something divided by *almost* nothing?), we don't really have a use for the actual number, except for the fact that it should be large. As you say, I guess that any analysis depends on what you're after, but most "general" approaches I see mentioned don't seem to care about this particular case when signal is missing only in one channel. In fact, some people just remove any spot where the signal is not detectable in both channels, which for my purposes would be a disaster [1]. I have my own approach to deal with this, and I am reasonably happy, but I am very curious to see how other people approach this issue. [1] We had a while ago a demo of teh software Acuity at our centre. The guy contacted us before asking if we'd have some real data we'd like to use in teh demo. He chose some of my data, which I thought was great, as I had already analysed it using my usual tools. His demo picked up genes I knew to be upregulated... but my "top genes" that I've continued to use in my experiments were all missing, as they had been left behind in one of the filtering steps, either the low intensity filter (applied on *either* channel, or the standard deviation filter on log2 ratios,, not sure which ones, probably both)... it took me a while to convince him that I really really didn't want those spots removed, which surprised me. Is most people really throwing away these kind of spots? Jose Quoting J.Oosting at lumc.nl: > Hi Jose, > > IMHO you should use the variability of replicate spots whenever > possible. Limma can handle this nicely and for the analysis of > differential expression I always leave in the replicate spots, and I let > limma handle them. > > For presentation purposes (i.e. heatmaps) it is usually handy to have > averaged values per gene, and I think that removing genes that cannot be > measured reliably is a way of improving the visualizations. > > Any data-manipulation is context dependent, and especially the effects > of removing data points should be considered case by case. If you're > interested in on/off phenomena you should not remove 'empty' spots. > > Regards, > > Jan > >> >> If you look at the variation on M values alone (it's a >> MAList), and throw away those with high variation... that >> sounds like a reasonable thing to do, except that when you >> have spots with no signal in only one of the channels, the >> variation is probably quite high too, and you'd remove them. >> However, they are probably quite an interesting class of >> spots to keep (genes that become silenced, or activated, >> after treatment, not merely down/upregulated). >> >> I'm mostly studying experiments when I am interested mostly >> in these cases of activation/silencing, and not so much in >> up/downregulation alone. I wonder how people account for >> these situations... >> >> Jose >> > > -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD REPLY • link 17.2 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

Personally, I use single channel analysis because I feel that spots that are "off" under some conditions and "on" under others are most interesting. Using log2 is still a problem, however. --Naomi At 09:19 AM 3/1/2007, J.delasHeras at ed.ac.uk wrote: >Hi Jan, > >not sure if I am understanding. >I am with you about the variability of replicate spots... as long as >they can be measured reliably, as you say. The question I guess is >where do you take these measurements: at the intensity or at the ratio >level? If you're looking at the variability based on ratios (M >values), replicate spots with no signal in one channel tend to have >wildly varying M values (all quite high, in absolute value). Wouldn't >a filtering based solely on variation at M value level discard those >spots? For these kind of spots the M value is irrelevant (I mean, how >much is something divided by *almost* nothing?), we don't really have >a use for the actual number, except for the fact that it should be >large. > >As you say, I guess that any analysis depends on what you're after, >but most "general" approaches I see mentioned don't seem to care about >this particular case when signal is missing only in one channel. In >fact, some people just remove any spot where the signal is not >detectable in both channels, which for my purposes would be a disaster >[1]. I have my own approach to deal with this, and I am reasonably >happy, but I am very curious to see how other people approach this >issue. > >[1] We had a while ago a demo of teh software Acuity at our centre. >The guy contacted us before asking if we'd have some real data we'd >like to use in teh demo. He chose some of my data, which I thought was >great, as I had already analysed it using my usual tools. His demo >picked up genes I knew to be upregulated... but my "top genes" that >I've continued to use in my experiments were all missing, as they had >been left behind in one of the filtering steps, either the low >intensity filter (applied on *either* channel, or the standard >deviation filter on log2 ratios,, not sure which ones, probably >both)... it took me a while to convince him that I really really >didn't want those spots removed, which surprised me. Is most people >really throwing away these kind of spots? > >Jose > > >Quoting J.Oosting at lumc.nl: > > > Hi Jose, > > > > IMHO you should use the variability of replicate spots whenever > > possible. Limma can handle this nicely and for the analysis of > > differential expression I always leave in the replicate spots, and I let > > limma handle them. > > > > For presentation purposes (i.e. heatmaps) it is usually handy to have > > averaged values per gene, and I think that removing genes that cannot be > > measured reliably is a way of improving the visualizations. > > > > Any data-manipulation is context dependent, and especially the effects > > of removing data points should be considered case by case. If you're > > interested in on/off phenomena you should not remove 'empty' spots. > > > > Regards, > > > > Jan > > > >> > >> If you look at the variation on M values alone (it's a > >> MAList), and throw away those with high variation... that > >> sounds like a reasonable thing to do, except that when you > >> have spots with no signal in only one of the channels, the > >> variation is probably quite high too, and you'd remove them. > >> However, they are probably quite an interesting class of > >> spots to keep (genes that become silenced, or activated, > >> after treatment, not merely down/upregulated). > >> > >> I'm mostly studying experiments when I am interested mostly > >> in these cases of activation/silencing, and not so much in > >> up/downregulation alone. I wonder how people account for > >> these situations... > >> > >> Jose > >> > > > > > > > >-- >Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk >The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 >Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 >Swann Building, Mayfield Road >University of Edinburgh >Edinburgh EH9 3JR >UK > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 17.2 years ago Naomi Altman ★ 6.0k

Login before adding your answer.