Boxplots, different results?

0

Entering edit mode

J.delasHeras@ed.ac.uk ★ 1.9k

@jdelasherasedacuk-1189

Last seen 10.5 years ago

United Kingdom

Hi, I am using limma to analyse my cDNA expression arrays (2 channel). I am looking at boxplots generated from the M values of my arrays (MA = product of 'normalizeWithinArrays', but I am not sure I understand the syntax and what the 'boxplot' function is doing. This is because I get slightly different plots if I try (a) or (b) below, which I thought would be equivalent. Am I missing something? (a) boxplot(MA$M[,1],MA$M[,2],MA$M[,3]) (b) boxplot(MAw$M ~ col(MAw$M[,1:3])) The differences are noticeable on teh spots outside the "whiskers". The main box and whiskers themselves *appear* to be the same. I guess some defaults must be different when defining the data as a formula or explicitly naming the vectors... but I'm not finding an obvious note as to which ones they may be? thanks for your help, Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

limma limma • 1.4k views

ADD COMMENT • link 19.6 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

J.delasHeras@ed.ac.uk ★ 1.9k

@jdelasherasedacuk-1189

Last seen 10.5 years ago

United Kingdom

Quoting J.delasHeras at ed.ac.uk: > > Hi, > > I am using limma to analyse my cDNA expression arrays (2 channel). > > I am looking at boxplots generated from the M values of my arrays (MA = > product of 'normalizeWithinArrays', but I am not sure I understand the > syntax and what the 'boxplot' function is doing. > > This is because I get slightly different plots if I try (a) or (b) > below, which > I thought would be equivalent. Am I missing something? > > (a) > boxplot(MA$M[,1],MA$M[,2],MA$M[,3]) > > (b) > boxplot(MAw$M ~ col(MAw$M[,1:3])) > > The differences are noticeable on teh spots outside the "whiskers". The > main box and whiskers themselves *appear* to be the same. I guess some > defaults must be different when defining the data as a formula or > explicitly naming the vectors... but I'm not finding an obvious note as > to which ones they may be? further to the above, I notice the number of observations is not correct when I do (b): > bpv<-boxplot(MAw$M[,1],MAw$M[,2],MAw$M[,3],plot=FALSE) # case (a) > bpf<-boxplot(MAw$M ~ col(MAw$M[,1:3]),plot=FALSE) # case (b) > bpf$n [1] 16218 16218 16218 > bpv$n [1] 9181 9435 9331 the number of spots on these arrays is 10752. Clearly (a) is wrong... Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD COMMENT • link 19.6 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

ligia@ebi.ac.uk ▴ 50

@ligiaebiacuk-1794

Last seen 11.4 years ago

Hi, Jose I've also noticed this feature some time ago. It is related with the way they handle missing data. For example, if you save the output of boxplot in either case, we can see that: a = boxplot(MA$M[,1],MA$M[,2],MA$M[,3]) b = boxplot(MAw$M ~ col(MAw$M[,1:3])) the number of observations is different: a$n b$n Because option (b) is removing the NA entries that are common to all the columns in MAw$M, so you'll have less data points in each vector. However, if you use the command "split" we this will work, giving the same results as option (a): boxplot(split(MAw$M, col(MAw$M[,1:3]))) Best wishes, Ligia > > Hi, > > I am using limma to analyse my cDNA expression arrays (2 channel). > > I am looking at boxplots generated from the M values of my arrays (MA = > product of 'normalizeWithinArrays', but I am not sure I understand the > syntax and what the 'boxplot' function is doing. > > This is because I get slightly different plots if I try (a) or (b) > below, which > I thought would be equivalent. Am I missing something? > > (a) > boxplot(MA$M[,1],MA$M[,2],MA$M[,3]) > > (b) > boxplot(MAw$M ~ col(MAw$M[,1:3])) > > The differences are noticeable on teh spots outside the "whiskers". The > main box and whiskers themselves *appear* to be the same. I guess some > defaults must be different when defining the data as a formula or > explicitly naming the vectors... but I'm not finding an obvious note as > to which ones they may be? > > thanks for your help, > > Jose > > > -- > Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk > The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 > Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 > Swann Building, Mayfield Road > University of Edinburgh > Edinburgh EH9 3JR > UK > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 19.6 years ago ligia@ebi.ac.uk ▴ 50

0

Entering edit mode

Aha! I got it at last!!!! the problem was that my MAw object contains 6 slides, of which I was trying to select 3 to make the boxplots... but wasn't indicating that properly (except for type (b) in which I was just unable to make it work unless I used all slides) If I take all 6 slides everything makes sense: > A<-boxplot(MAw$M[,1],MAw$M[,2],MAw$M[,3],MAw$M[,4],MAw$M[,5],MAw$M[, 6]) > A$n [1] 9181 9435 9331 8743 9065 9696 > C<-boxplot(split(MAw$M[,1:6], col(MAw$M[,1:6]))) > C$n [1] 9181 9435 9331 8743 9065 9696 > CC<-boxplot(split(MAw$M, col(MAw$M[,1:6]))) > CC$n [1] 9181 9435 9331 8743 9065 9696 > B<-boxplot(MAw$M ~ col(MAw$M[,1:6])) > B$n [1] 8109 8109 8109 8109 8109 8109 > B<-boxplot(MAw$M ~ col(MAw$M)) > B$n [1] 8109 8109 8109 8109 8109 8109 Thanks Ligia! regards, Jose Quoting ligia at ebi.ac.uk: > Hi, Jose > > I've also noticed this feature some time ago. > It is related with the way they handle missing data. > > For example, if you save the output of boxplot in either case, we can see > that: > a = boxplot(MA$M[,1],MA$M[,2],MA$M[,3]) > b = boxplot(MAw$M ~ col(MAw$M[,1:3])) > > the number of observations is different: > > a$n > b$n > > Because option (b) is removing the NA entries that are common to all the > columns in MAw$M, so you'll have less data points in each vector. > > However, if you use the command "split" we this will work, giving the same > results as option (a): > > boxplot(split(MAw$M, col(MAw$M[,1:3]))) > > > Best wishes, > Ligia > > >> >> Hi, >> >> I am using limma to analyse my cDNA expression arrays (2 channel). >> >> I am looking at boxplots generated from the M values of my arrays (MA = >> product of 'normalizeWithinArrays', but I am not sure I understand the >> syntax and what the 'boxplot' function is doing. >> >> This is because I get slightly different plots if I try (a) or (b) >> below, which >> I thought would be equivalent. Am I missing something? >> >> (a) >> boxplot(MA$M[,1],MA$M[,2],MA$M[,3]) >> >> (b) >> boxplot(MAw$M ~ col(MAw$M[,1:3])) >> >> The differences are noticeable on teh spots outside the "whiskers". The >> main box and whiskers themselves *appear* to be the same. I guess some >> defaults must be different when defining the data as a formula or >> explicitly naming the vectors... but I'm not finding an obvious note as >> to which ones they may be? >> >> thanks for your help, >> >> Jose >> >> >> -- >> Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk >> The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 >> Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 >> Swann Building, Mayfield Road >> University of Edinburgh >> Edinburgh EH9 3JR >> UK >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD REPLY • link 19.6 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

J.delasHeras@ed.ac.uk ★ 1.9k

@jdelasherasedacuk-1189

Last seen 10.5 years ago

United Kingdom

Hi Ligia, thanks for your reply... your answer makes sense, in that both methods perhaps remove different numbers of spots. However, the number of spots could never be larger than the total! :-) My arrays have 10752 spots in them (14x16 times 48 blocks in a 12x4 fashion) when I use the approach (a) (a) A<-boxplot(MAw$M[,1],MAw$M[,2],MAw$M[,3]) I look at the number of observations, I get different numbers between 9000 and 9500 for each slide. That's okay. It's removing mising values on a "per column" basis. A$n [1] 9181 9435 9331 now, when I try the approach (b) (b) B<-boxplot(MAw$M ~ col(MAw$M[,1:3])) The number of observations is identical for the three slides, consistent with what you say about removing the same spots across slides... but the values are larger than the total! B$n [1] 16218 16218 16218 and now, I try using the 'split' function (very useful, thanks for pointing that one to me, by the way, I'm still rather inexperienced in R) I get yet another result: (c) C<-boxplot(split(MAw$M, col(MAw$M[,1:3]))) C$n [1] 17924 18500 19027 Now the values are different on each slide, but all larger than the maximum 10752... before anyboidy asks: > dim(MAw) [1] 10752 6 So I'm very confused... Jose Quoting ligia at ebi.ac.uk: > Hi, Jose > > I've also noticed this feature some time ago. > It is related with the way they handle missing data. > > For example, if you save the output of boxplot in either case, we can see > that: > a = boxplot(MA$M[,1],MA$M[,2],MA$M[,3]) > b = boxplot(MAw$M ~ col(MAw$M[,1:3])) > > the number of observations is different: > > a$n > b$n > > Because option (b) is removing the NA entries that are common to all the > columns in MAw$M, so you'll have less data points in each vector. > > However, if you use the command "split" we this will work, giving the same > results as option (a): > > boxplot(split(MAw$M, col(MAw$M[,1:3]))) > > > Best wishes, > Ligia > > >> >> Hi, >> >> I am using limma to analyse my cDNA expression arrays (2 channel). >> >> I am looking at boxplots generated from the M values of my arrays (MA = >> product of 'normalizeWithinArrays', but I am not sure I understand the >> syntax and what the 'boxplot' function is doing. >> >> This is because I get slightly different plots if I try (a) or (b) >> below, which >> I thought would be equivalent. Am I missing something? >> >> (a) >> boxplot(MA$M[,1],MA$M[,2],MA$M[,3]) >> >> (b) >> boxplot(MAw$M ~ col(MAw$M[,1:3])) >> >> The differences are noticeable on teh spots outside the "whiskers". The >> main box and whiskers themselves *appear* to be the same. I guess some >> defaults must be different when defining the data as a formula or >> explicitly naming the vectors... but I'm not finding an obvious note as >> to which ones they may be? >> >> thanks for your help, >> >> Jose >> >> >> -- >> Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk >> The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 >> Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 >> Swann Building, Mayfield Road >> University of Edinburgh >> Edinburgh EH9 3JR >> UK >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD COMMENT • link 19.6 years ago J.delasHeras@ed.ac.uk ★ 1.9k

Login before adding your answer.