Hi Ligia,
thanks for your reply... your answer makes sense, in that both methods
perhaps remove different numbers of spots. However, the number of
spots
could never be larger than the total! :-)
My arrays have 10752 spots in them (14x16 times 48 blocks in a 12x4
fashion)
when I use the approach (a)
(a) A<-boxplot(MAw$M[,1],MAw$M[,2],MAw$M[,3])
I look at the number of observations, I get different numbers between
9000 and 9500 for each slide. That's okay. It's removing mising values
on a "per column" basis.
A$n
[1] 9181 9435 9331
now, when I try the approach (b)
(b) B<-boxplot(MAw$M ~ col(MAw$M[,1:3]))
The number of observations is identical for the three slides,
consistent with what you say about removing the same spots across
slides... but the values are larger than the total!
B$n
[1] 16218 16218 16218
and now, I try using the 'split' function (very useful, thanks for
pointing that one to me, by the way, I'm still rather inexperienced in
R) I get yet another result:
(c) C<-boxplot(split(MAw$M, col(MAw$M[,1:3])))
C$n
[1] 17924 18500 19027
Now the values are different on each slide, but all larger than the
maximum 10752...
before anyboidy asks:
> dim(MAw)
[1] 10752 6
So I'm very confused...
Jose
Quoting ligia at ebi.ac.uk:
> Hi, Jose
>
> I've also noticed this feature some time ago.
> It is related with the way they handle missing data.
>
> For example, if you save the output of boxplot in either case, we
can see
> that:
> a = boxplot(MA$M[,1],MA$M[,2],MA$M[,3])
> b = boxplot(MAw$M ~ col(MAw$M[,1:3]))
>
> the number of observations is different:
>
> a$n
> b$n
>
> Because option (b) is removing the NA entries that are common to all
the
> columns in MAw$M, so you'll have less data points in each vector.
>
> However, if you use the command "split" we this will work, giving
the same
> results as option (a):
>
> boxplot(split(MAw$M, col(MAw$M[,1:3])))
>
>
> Best wishes,
> Ligia
>
>
>>
>> Hi,
>>
>> I am using limma to analyse my cDNA expression arrays (2 channel).
>>
>> I am looking at boxplots generated from the M values of my arrays
(MA =
>> product of 'normalizeWithinArrays', but I am not sure I understand
the
>> syntax and what the 'boxplot' function is doing.
>>
>> This is because I get slightly different plots if I try (a) or (b)
>> below, which
>> I thought would be equivalent. Am I missing something?
>>
>> (a)
>> boxplot(MA$M[,1],MA$M[,2],MA$M[,3])
>>
>> (b)
>> boxplot(MAw$M ~ col(MAw$M[,1:3]))
>>
>> The differences are noticeable on teh spots outside the "whiskers".
The
>> main box and whiskers themselves *appear* to be the same. I guess
some
>> defaults must be different when defining the data as a formula or
>> explicitly naming the vectors... but I'm not finding an obvious
note as
>> to which ones they may be?
>>
>> thanks for your help,
>>
>> Jose
>>
>>
>> --
>> Dr. Jose I. de las Heras Email: J.delasHeras
at ed.ac.uk
>> The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
>> Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
>> Swann Building, Mayfield Road
>> University of Edinburgh
>> Edinburgh EH9 3JR
>> UK
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
--
Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK