As always, I am grateful to the developers for donating their
wonderful
software. However, the issue of why the documentation is hard to use
keeps
rearing its head, so ...
One of the problems I am finding with the Bioconductor documentation
is
that it is not sufficiently explicit, so I often need to go into the
code
to determine what the routine is doing. As 2 examples,
lmFit (limma) can take as input an marrayNorm object and by default
extracts "maM". But if you type ?lmFit, this is not given in the
documentation. I have not looked at the Vignette to see if it is
listed
there. However, I see the vignettes as tutorials - I should be able
to
find out what a routine does from its internal documentation. The
documentation should be explicit about what is extracted from each
type of
input object and what is output (if this differs by input object). I
might
note that this is particularly cogent for limma, since limma works
directly
with contrasts for 2-color arrays, but requires an extra contrast step
for
1-channel arrays.
Similarly, I cannot tell from the documentation for maNorm or
maNormMain
whether the background values are used in the normalization. I.e. the
documentation should state which component of the input object will be
used
and how.
Thanks.
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
While I agree that we can always strive for better documentation, I
would disagree that the vignettes are superfluous for "everyday" use
of
bioconductor packages. I have made much use of the limma "user
guide",
perhaps the single most complete piece of documentation outside of the
R manuals (and these are not specific to a given package). While I
agree that some of the "internal" R documentation could be improved
upon, I'm not sure that we can simply ignore information contained in
the vignettes and assume that if it isn't in the "internal"
documentation, it isn't documented. I would like to see maintainers
continue to improve upon the already fantastic source of knowledge
contained in the package vignettes. I'm not a developer for
bioconductor, just a user, so my comments are just opinion, but I do
think the vignettes are an integral source of information, even
regarding algorithms (but not parameter defaults---I agree with Naomi
on this point). I DO think the official stance could be more
specific,
so I, (probably) like Naomi, would like to hear what role
vignettes/user manuals should play in package documentation, as in
practice there is wide variability.
Sean
On Aug 29, 2004, at 9:58 AM, Naomi Altman wrote:
> As always, I am grateful to the developers for donating their
> wonderful software. However, the issue of why the documentation is
> hard to use keeps rearing its head, so ...
>
> One of the problems I am finding with the Bioconductor documentation
> is that it is not sufficiently explicit, so I often need to go into
> the code to determine what the routine is doing. As 2 examples,
>
> lmFit (limma) can take as input an marrayNorm object and by default
> extracts "maM". But if you type ?lmFit, this is not given in the
> documentation. I have not looked at the Vignette to see if it is
> listed there. However, I see the vignettes as tutorials - I should
be
> able to find out what a routine does from its internal
documentation.
> The documentation should be explicit about what is extracted from
each
> type of input object and what is output (if this differs by input
> object). I might note that this is particularly cogent for limma,
> since limma works directly with contrasts for 2-color arrays, but
> requires an extra contrast step for 1-channel arrays.
>
> Similarly, I cannot tell from the documentation for maNorm or
> maNormMain whether the background values are used in the
> normalization. I.e. the documentation should state which component
of
> the input object will be used and how.
>
> Thanks.
>
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Bioinformatics Consulting Center
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348
(Statistics)
> University Park, PA 16802-2111
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
The vignettes are great - perhaps I should not call them "tutorials".
But
like other documentation of this type (the book "SAS for Mixed Models"
comes to mind), it is hard to generalize from the examples. We need
both
the vignettes and the internal documentation. We need good but
explicit
defaults for the general user, and the option to change these defaults
for
the expert user.
Here is an example where the documentation is OK, but the option to
change
the defaults is too limited.
Both limma and marray allow the user read only a limited set of
columns
from gpr and spot files. Why not have this as the default, and let
the
user decide if they want to read in other columns? Some of my clients
like
to filter spots based on quantities like the difference between the
median
and mean spot intensity, the sd of intensity, etc. They currently
need to
flag spots before importing into Bioconductor because they cannot read
these other columns readily into an marrayRaw object.
--Naomi
At 09:19 AM 8/30/2004 -0400, Sean Davis wrote:
>While I agree that we can always strive for better documentation, I
would
>disagree that the vignettes are superfluous for "everyday" use of
>bioconductor packages. I have made much use of the limma "user
guide",
>perhaps the single most complete piece of documentation outside of
the R
>manuals (and these are not specific to a given package). While I
agree
>that some of the "internal" R documentation could be improved upon,
I'm
>not sure that we can simply ignore information contained in the
vignettes
>and assume that if it isn't in the "internal" documentation, it isn't
>documented. I would like to see maintainers continue to improve upon
the
>already fantastic source of knowledge contained in the package
>vignettes. I'm not a developer for bioconductor, just a user, so my
>comments are just opinion, but I do think the vignettes are an
integral
>source of information, even regarding algorithms (but not parameter
>defaults---I agree with Naomi on this point). I DO think the
official
>stance could be more specific, so I, (probably) like Naomi, would
like to
>hear what role vignettes/user manuals should play in package
>documentation, as in practice there is wide variability.
>
>Sean
>
>On Aug 29, 2004, at 9:58 AM, Naomi Altman wrote:
>
>>As always, I am grateful to the developers for donating their
wonderful
>>software. However, the issue of why the documentation is hard to
use
>>keeps rearing its head, so ...
>>
>>One of the problems I am finding with the Bioconductor documentation
is
>>that it is not sufficiently explicit, so I often need to go into the
code
>>to determine what the routine is doing. As 2 examples,
>>
>>lmFit (limma) can take as input an marrayNorm object and by default
>>extracts "maM". But if you type ?lmFit, this is not given in the
>>documentation. I have not looked at the Vignette to see if it is
listed
>>there. However, I see the vignettes as tutorials - I should be able
to
>>find out what a routine does from its internal documentation.
>>The documentation should be explicit about what is extracted from
each
>>type of input object and what is output (if this differs by input
>>object). I might note that this is particularly cogent for limma,
since
>>limma works directly with contrasts for 2-color arrays, but requires
an
>>extra contrast step for 1-channel arrays.
>>
>>Similarly, I cannot tell from the documentation for maNorm or
maNormMain
>>whether the background values are used in the normalization. I.e.
the
>>documentation should state which component of the input object will
be
>>used and how.
>>
>>Thanks.
>>
>>Naomi S. Altman 814-865-3791 (voice)
>>Associate Professor
>>Bioinformatics Consulting Center
>>Dept. of Statistics 814-863-7114 (fax)
>>Penn State University 814-865-1348
(Statistics)
>>University Park, PA 16802-2111
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor@stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
At 11:33 PM 30/08/2004, Naomi Altman wrote:
>The vignettes are great - perhaps I should not call them "tutorials".
But
>like other documentation of this type (the book "SAS for Mixed
Models"
>comes to mind), it is hard to generalize from the examples. We need
both
>the vignettes and the internal documentation. We need good but
explicit
>defaults for the general user, and the option to change these
defaults for
>the expert user.
>
>Here is an example where the documentation is OK, but the option to
change
>the defaults is too limited.
>
>Both limma and marray allow the user read only a limited set of
columns
>from gpr and spot files. Why not have this as the default, and let
the
>user decide if they want to read in other columns? Some of my clients
like
>to filter spots based on quantities like the difference between the
median
>and mean spot intensity, the sd of intensity, etc. They currently
need to
>flag spots before importing into Bioconductor because they cannot
read
>these other columns readily into an marrayRaw object.
The wt.fun argument to read.maimages() function in limma already
provides
the capability to filter or weights spots based on any number of
columns in
the original file. So there no need to read in the extra columns or to
flag
spots before importing. The computation of the flags is done at the
time of
import.
The help document for read.maimages() says:
Spot quality weights may be extracted from the image analysis
files using a ready-made or a user-supplied weight function
'wt.fun'. 'wt.fun' may be any user-supplied function which
accepts
a data.frame argument and returns a vector of non-negative
weights. The columns of the data.frame are as in the image
analysis output files. See 'QualityWeights' for provided weight
functions.
I admit that this is brief, but it does seem explicit.
I know that reading in extra columns can be convenient for other
purposes.
The reason why I decided not to implement this in limma was explained
in a
post to this list on 22 July:
https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-July/005434.
html
Gordon
>--Naomi
>
>Naomi S. Altman 814-865-3791 (voice)
>Associate Professor
>Bioinformatics Consulting Center
>Dept. of Statistics 814-863-7114 (fax)
>Penn State University 814-865-1348
(Statistics)
>University Park, PA 16802-2111
I hit return too soon. I see what Gordon means. I will have a look
at how
to write a weight function. --Naomi
At 09:28 AM 8/31/2004 +1000, Gordon Smyth wrote:
>At 11:33 PM 30/08/2004, Naomi Altman wrote:
>>The vignettes are great - perhaps I should not call them
>>"tutorials". But like other documentation of this type (the book
"SAS
>>for Mixed Models" comes to mind), it is hard to generalize from the
>>examples. We need both the vignettes and the internal
documentation. We
>>need good but explicit defaults for the general user, and the option
to
>>change these defaults for the expert user.
>>
>>Here is an example where the documentation is OK, but the option to
>>change the defaults is too limited.
>>
>>Both limma and marray allow the user read only a limited set of
columns
>>from gpr and spot files. Why not have this as the default, and let
the
>>user decide if they want to read in other columns? Some of my
clients
>>like to filter spots based on quantities like the difference between
the
>>median and mean spot intensity, the sd of intensity, etc. They
currently
>>need to flag spots before importing into Bioconductor because they
cannot
>>read these other columns readily into an marrayRaw object.
>
>The wt.fun argument to read.maimages() function in limma already
provides
>the capability to filter or weights spots based on any number of
columns
>in the original file. So there no need to read in the extra columns
or to
>flag spots before importing. The computation of the flags is done at
the
>time of import.
>
>The help document for read.maimages() says:
> Spot quality weights may be extracted from the image analysis
> files using a ready-made or a user-supplied weight function
> 'wt.fun'. 'wt.fun' may be any user-supplied function which
accepts
> a data.frame argument and returns a vector of non-negative
> weights. The columns of the data.frame are as in the image
> analysis output files. See 'QualityWeights' for provided weight
> functions.
>
>I admit that this is brief, but it does seem explicit.
>
>I know that reading in extra columns can be convenient for other
purposes.
>The reason why I decided not to implement this in limma was explained
in a
>post to this list on 22 July:
>https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-July/005434
.html
>
>Gordon
>
>>--Naomi
>>
>>Naomi S. Altman 814-865-3791 (voice)
>>Associate Professor
>>Bioinformatics Consulting Center
>>Dept. of Statistics 814-863-7114 (fax)
>>Penn State University 814-865-1348
(Statistics)
>>University Park, PA 16802-2111
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
At 11:58 PM 29/08/2004, Naomi Altman wrote:
>As always, I am grateful to the developers for donating their
wonderful
>software. However, the issue of why the documentation is hard to use
>keeps rearing its head, so ...
I'm not sure what you mean by "the issue of why", apart from the
obvious
fact that the software is produced by very busy people as a side
product of
their research lab activities. We can't work full-time on the packages
and
they are never likely to be a fully-featured or as fully-documented as
you
would like. In the case of limma, my aim is for the code and the
documentation to be of a comparable standard to that of the packages
in the
standard distribution of R (base, stats, graphics, utils, methods).
Specific comments and suggestions re where that fails to be the case
are
welcome.
>One of the problems I am finding with the Bioconductor documentation
is
>that it is not sufficiently explicit, so I often need to go into the
code
>to determine what the routine is doing. As 2 examples,
>
>lmFit (limma) can take as input an marrayNorm object and by default
>extracts "maM". But if you type ?lmFit, this is not given in the
>documentation. I have not looked at the Vignette to see if it is
listed
>there. However, I see the vignettes as tutorials - I should be able
to
>find out what a routine does from its internal documentation. The
>documentation should be explicit about what is extracted from each
type of
>input object
Thanks for this feedback. It is true that the documentation doesn't
say
explictiy which slot or component is extracted from each type of
object.
This is partly because it seemed almost self-explanatory. The function
lmFit() simply extracts the expression data from the appropriate slot
or
component of the input data object. It doesn't do any unexpected
processing
or computation which would require special documentation, rather the
value
of the appropriate slot is taken as is. Each class of object has only
one
slot or component which could be sensibly extracted in this way.
Anyway, I have written an extra two paragraphs of explanation in the
Details section of the lmFit() help to make explicit what is extracted
from
each object. This will be in limma 1.7.5 when that is released.
>what is output (if this differs by input object). I might note that
this
>is particularly cogent for limma, since limma works directly with
>contrasts for 2-color arrays, but requires an extra contrast step for
>1-channel arrays.
I don't think that this criticism is fair. The output from lmFit()
does not
vary depending on the input object. It is central to the philosophy of
limma that all the models fitted produce an object of the same
MArrayLM
class, with output components that have the same meaning. It is true
that
one will want to fit different models depending on the meaning of the
input
data, but it is the user's responsibility to choose a sensible model
and to
interpret the output appropriately. The situation is very closely
analogous
to that of lm() in the stats package.
It is not true that the fitted model requires an extra contrasts step
for
1-channel arrays, rather one may use lmFit() with or without
contrasts.fit() for both 2-color or 1-color arrays. See for example
Section
8.3 of the User's Guide which analyses an affy data set without using
contrasts.fit(). For another analysis described in Section 8.4,
contrasts.fit() is used only to obtain F-statistics for a pair of
coefficients of interest. Otherwise the analysis would stand without
the
use of contrasts.fit().
It is actually impossible for lmFit() to determine whether the
expression
values being input are log-ratios or log-expression values when the
input
is a matrix or an exprSet. The affy package for example outputs
exprSet
objects which contain log-ratios while coercion to exprSet from an
marrayNorm object produces an exprSet object which contains log-
ratios. For
this reason it would be impossible for lmFit() to output a different
class
of object depending on the type of input data.
Gordon
>Similarly, I cannot tell from the documentation for maNorm or
maNormMain
>whether the background values are used in the normalization. I.e.
the
>documentation should state which component of the input object will
be
>used and how.
>
>Thanks.
>
>Naomi S. Altman 814-865-3791 (voice)
>Associate Professor
>Bioinformatics Consulting Center
>Dept. of Statistics 814-863-7114 (fax)
>Penn State University 814-865-1348
(Statistics)
>University Park, PA 16802-2111
Oops, I meant to say
"The affy package for example outputs exprSet objects which contain
*log-expression* values while coercion to exprSet from an marrayNorm
object
produces an exprSet object which contains log-ratios. For this reason
it
would be impossible for lmFit() to output a different class of object
depending on the type of input data."
Actually an exprSet can in principle contain almost any sort of
expression
data. The exprs slot can contain absolute (not logged) expression
values
for example.
Gordon