Bioconductor documentation

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

As always, I am grateful to the developers for donating their wonderful software. However, the issue of why the documentation is hard to use keeps rearing its head, so ... One of the problems I am finding with the Bioconductor documentation is that it is not sufficiently explicit, so I often need to go into the code to determine what the routine is doing. As 2 examples, lmFit (limma) can take as input an marrayNorm object and by default extracts "maM". But if you type ?lmFit, this is not given in the documentation. I have not looked at the Vignette to see if it is listed there. However, I see the vignettes as tutorials - I should be able to find out what a routine does from its internal documentation. The documentation should be explicit about what is extracted from each type of input object and what is output (if this differs by input object). I might note that this is particularly cogent for limma, since limma works directly with contrasts for 2-color arrays, but requires an extra contrast step for 1-channel arrays. Similarly, I cannot tell from the documentation for maNorm or maNormMain whether the background values are used in the normalization. I.e. the documentation should state which component of the input object will be used and how. Thanks. Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

Normalization GO limma Normalization GO limma • 1.1k views

ADD COMMENT • link updated 19.7 years ago by Gordon Smyth 50k • written 19.7 years ago by Naomi Altman ★ 6.0k

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 4 months ago

United States

While I agree that we can always strive for better documentation, I would disagree that the vignettes are superfluous for "everyday" use of bioconductor packages. I have made much use of the limma "user guide", perhaps the single most complete piece of documentation outside of the R manuals (and these are not specific to a given package). While I agree that some of the "internal" R documentation could be improved upon, I'm not sure that we can simply ignore information contained in the vignettes and assume that if it isn't in the "internal" documentation, it isn't documented. I would like to see maintainers continue to improve upon the already fantastic source of knowledge contained in the package vignettes. I'm not a developer for bioconductor, just a user, so my comments are just opinion, but I do think the vignettes are an integral source of information, even regarding algorithms (but not parameter defaults---I agree with Naomi on this point). I DO think the official stance could be more specific, so I, (probably) like Naomi, would like to hear what role vignettes/user manuals should play in package documentation, as in practice there is wide variability. Sean On Aug 29, 2004, at 9:58 AM, Naomi Altman wrote: > As always, I am grateful to the developers for donating their > wonderful software. However, the issue of why the documentation is > hard to use keeps rearing its head, so ... > > One of the problems I am finding with the Bioconductor documentation > is that it is not sufficiently explicit, so I often need to go into > the code to determine what the routine is doing. As 2 examples, > > lmFit (limma) can take as input an marrayNorm object and by default > extracts "maM". But if you type ?lmFit, this is not given in the > documentation. I have not looked at the Vignette to see if it is > listed there. However, I see the vignettes as tutorials - I should be > able to find out what a routine does from its internal documentation. > The documentation should be explicit about what is extracted from each > type of input object and what is output (if this differs by input > object). I might note that this is particularly cogent for limma, > since limma works directly with contrasts for 2-color arrays, but > requires an extra contrast step for 1-channel arrays. > > Similarly, I cannot tell from the documentation for maNorm or > maNormMain whether the background values are used in the > normalization. I.e. the documentation should state which component of > the input object will be used and how. > > Thanks. > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 19.7 years ago Sean Davis 21k

0

Entering edit mode

The vignettes are great - perhaps I should not call them "tutorials". But like other documentation of this type (the book "SAS for Mixed Models" comes to mind), it is hard to generalize from the examples. We need both the vignettes and the internal documentation. We need good but explicit defaults for the general user, and the option to change these defaults for the expert user. Here is an example where the documentation is OK, but the option to change the defaults is too limited. Both limma and marray allow the user read only a limited set of columns from gpr and spot files. Why not have this as the default, and let the user decide if they want to read in other columns? Some of my clients like to filter spots based on quantities like the difference between the median and mean spot intensity, the sd of intensity, etc. They currently need to flag spots before importing into Bioconductor because they cannot read these other columns readily into an marrayRaw object. --Naomi At 09:19 AM 8/30/2004 -0400, Sean Davis wrote: >While I agree that we can always strive for better documentation, I would >disagree that the vignettes are superfluous for "everyday" use of >bioconductor packages. I have made much use of the limma "user guide", >perhaps the single most complete piece of documentation outside of the R >manuals (and these are not specific to a given package). While I agree >that some of the "internal" R documentation could be improved upon, I'm >not sure that we can simply ignore information contained in the vignettes >and assume that if it isn't in the "internal" documentation, it isn't >documented. I would like to see maintainers continue to improve upon the >already fantastic source of knowledge contained in the package >vignettes. I'm not a developer for bioconductor, just a user, so my >comments are just opinion, but I do think the vignettes are an integral >source of information, even regarding algorithms (but not parameter >defaults---I agree with Naomi on this point). I DO think the official >stance could be more specific, so I, (probably) like Naomi, would like to >hear what role vignettes/user manuals should play in package >documentation, as in practice there is wide variability. > >Sean > >On Aug 29, 2004, at 9:58 AM, Naomi Altman wrote: > >>As always, I am grateful to the developers for donating their wonderful >>software. However, the issue of why the documentation is hard to use >>keeps rearing its head, so ... >> >>One of the problems I am finding with the Bioconductor documentation is >>that it is not sufficiently explicit, so I often need to go into the code >>to determine what the routine is doing. As 2 examples, >> >>lmFit (limma) can take as input an marrayNorm object and by default >>extracts "maM". But if you type ?lmFit, this is not given in the >>documentation. I have not looked at the Vignette to see if it is listed >>there. However, I see the vignettes as tutorials - I should be able to >>find out what a routine does from its internal documentation. >>The documentation should be explicit about what is extracted from each >>type of input object and what is output (if this differs by input >>object). I might note that this is particularly cogent for limma, since >>limma works directly with contrasts for 2-color arrays, but requires an >>extra contrast step for 1-channel arrays. >> >>Similarly, I cannot tell from the documentation for maNorm or maNormMain >>whether the background values are used in the normalization. I.e. the >>documentation should state which component of the input object will be >>used and how. >> >>Thanks. >> >>Naomi S. Altman 814-865-3791 (voice) >>Associate Professor >>Bioinformatics Consulting Center >>Dept. of Statistics 814-863-7114 (fax) >>Penn State University 814-865-1348 (Statistics) >>University Park, PA 16802-2111 >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor@stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 19.7 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

At 11:33 PM 30/08/2004, Naomi Altman wrote: >The vignettes are great - perhaps I should not call them "tutorials". But >like other documentation of this type (the book "SAS for Mixed Models" >comes to mind), it is hard to generalize from the examples. We need both >the vignettes and the internal documentation. We need good but explicit >defaults for the general user, and the option to change these defaults for >the expert user. > >Here is an example where the documentation is OK, but the option to change >the defaults is too limited. > >Both limma and marray allow the user read only a limited set of columns >from gpr and spot files. Why not have this as the default, and let the >user decide if they want to read in other columns? Some of my clients like >to filter spots based on quantities like the difference between the median >and mean spot intensity, the sd of intensity, etc. They currently need to >flag spots before importing into Bioconductor because they cannot read >these other columns readily into an marrayRaw object. The wt.fun argument to read.maimages() function in limma already provides the capability to filter or weights spots based on any number of columns in the original file. So there no need to read in the extra columns or to flag spots before importing. The computation of the flags is done at the time of import. The help document for read.maimages() says: Spot quality weights may be extracted from the image analysis files using a ready-made or a user-supplied weight function 'wt.fun'. 'wt.fun' may be any user-supplied function which accepts a data.frame argument and returns a vector of non-negative weights. The columns of the data.frame are as in the image analysis output files. See 'QualityWeights' for provided weight functions. I admit that this is brief, but it does seem explicit. I know that reading in extra columns can be convenient for other purposes. The reason why I decided not to implement this in limma was explained in a post to this list on 22 July: https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-July/005434. html Gordon >--Naomi > >Naomi S. Altman 814-865-3791 (voice) >Associate Professor >Bioinformatics Consulting Center >Dept. of Statistics 814-863-7114 (fax) >Penn State University 814-865-1348 (Statistics) >University Park, PA 16802-2111

ADD REPLY • link 19.7 years ago Gordon Smyth 50k

0

Entering edit mode

ADD REPLY • link 19.7 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

I hit return too soon. I see what Gordon means. I will have a look at how to write a weight function. --Naomi At 09:28 AM 8/31/2004 +1000, Gordon Smyth wrote: >At 11:33 PM 30/08/2004, Naomi Altman wrote: >>The vignettes are great - perhaps I should not call them >>"tutorials". But like other documentation of this type (the book "SAS >>for Mixed Models" comes to mind), it is hard to generalize from the >>examples. We need both the vignettes and the internal documentation. We >>need good but explicit defaults for the general user, and the option to >>change these defaults for the expert user. >> >>Here is an example where the documentation is OK, but the option to >>change the defaults is too limited. >> >>Both limma and marray allow the user read only a limited set of columns >>from gpr and spot files. Why not have this as the default, and let the >>user decide if they want to read in other columns? Some of my clients >>like to filter spots based on quantities like the difference between the >>median and mean spot intensity, the sd of intensity, etc. They currently >>need to flag spots before importing into Bioconductor because they cannot >>read these other columns readily into an marrayRaw object. > >The wt.fun argument to read.maimages() function in limma already provides >the capability to filter or weights spots based on any number of columns >in the original file. So there no need to read in the extra columns or to >flag spots before importing. The computation of the flags is done at the >time of import. > >The help document for read.maimages() says: > Spot quality weights may be extracted from the image analysis > files using a ready-made or a user-supplied weight function > 'wt.fun'. 'wt.fun' may be any user-supplied function which accepts > a data.frame argument and returns a vector of non-negative > weights. The columns of the data.frame are as in the image > analysis output files. See 'QualityWeights' for provided weight > functions. > >I admit that this is brief, but it does seem explicit. > >I know that reading in extra columns can be convenient for other purposes. >The reason why I decided not to implement this in limma was explained in a >post to this list on 22 July: >https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-July/005434 .html > >Gordon > >>--Naomi >> >>Naomi S. Altman 814-865-3791 (voice) >>Associate Professor >>Bioinformatics Consulting Center >>Dept. of Statistics 814-863-7114 (fax) >>Penn State University 814-865-1348 (Statistics) >>University Park, PA 16802-2111 > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 19.7 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

At 11:58 PM 29/08/2004, Naomi Altman wrote: >As always, I am grateful to the developers for donating their wonderful >software. However, the issue of why the documentation is hard to use >keeps rearing its head, so ... I'm not sure what you mean by "the issue of why", apart from the obvious fact that the software is produced by very busy people as a side product of their research lab activities. We can't work full-time on the packages and they are never likely to be a fully-featured or as fully-documented as you would like. In the case of limma, my aim is for the code and the documentation to be of a comparable standard to that of the packages in the standard distribution of R (base, stats, graphics, utils, methods). Specific comments and suggestions re where that fails to be the case are welcome. >One of the problems I am finding with the Bioconductor documentation is >that it is not sufficiently explicit, so I often need to go into the code >to determine what the routine is doing. As 2 examples, > >lmFit (limma) can take as input an marrayNorm object and by default >extracts "maM". But if you type ?lmFit, this is not given in the >documentation. I have not looked at the Vignette to see if it is listed >there. However, I see the vignettes as tutorials - I should be able to >find out what a routine does from its internal documentation. The >documentation should be explicit about what is extracted from each type of >input object Thanks for this feedback. It is true that the documentation doesn't say explictiy which slot or component is extracted from each type of object. This is partly because it seemed almost self-explanatory. The function lmFit() simply extracts the expression data from the appropriate slot or component of the input data object. It doesn't do any unexpected processing or computation which would require special documentation, rather the value of the appropriate slot is taken as is. Each class of object has only one slot or component which could be sensibly extracted in this way. Anyway, I have written an extra two paragraphs of explanation in the Details section of the lmFit() help to make explicit what is extracted from each object. This will be in limma 1.7.5 when that is released. >what is output (if this differs by input object). I might note that this >is particularly cogent for limma, since limma works directly with >contrasts for 2-color arrays, but requires an extra contrast step for >1-channel arrays. I don't think that this criticism is fair. The output from lmFit() does not vary depending on the input object. It is central to the philosophy of limma that all the models fitted produce an object of the same MArrayLM class, with output components that have the same meaning. It is true that one will want to fit different models depending on the meaning of the input data, but it is the user's responsibility to choose a sensible model and to interpret the output appropriately. The situation is very closely analogous to that of lm() in the stats package. It is not true that the fitted model requires an extra contrasts step for 1-channel arrays, rather one may use lmFit() with or without contrasts.fit() for both 2-color or 1-color arrays. See for example Section 8.3 of the User's Guide which analyses an affy data set without using contrasts.fit(). For another analysis described in Section 8.4, contrasts.fit() is used only to obtain F-statistics for a pair of coefficients of interest. Otherwise the analysis would stand without the use of contrasts.fit(). It is actually impossible for lmFit() to determine whether the expression values being input are log-ratios or log-expression values when the input is a matrix or an exprSet. The affy package for example outputs exprSet objects which contain log-ratios while coercion to exprSet from an marrayNorm object produces an exprSet object which contains log- ratios. For this reason it would be impossible for lmFit() to output a different class of object depending on the type of input data. Gordon >Similarly, I cannot tell from the documentation for maNorm or maNormMain >whether the background values are used in the normalization. I.e. the >documentation should state which component of the input object will be >used and how. > >Thanks. > >Naomi S. Altman 814-865-3791 (voice) >Associate Professor >Bioinformatics Consulting Center >Dept. of Statistics 814-863-7114 (fax) >Penn State University 814-865-1348 (Statistics) >University Park, PA 16802-2111

ADD COMMENT • link 19.7 years ago Gordon Smyth 50k

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

Oops, I meant to say "The affy package for example outputs exprSet objects which contain *log-expression* values while coercion to exprSet from an marrayNorm object produces an exprSet object which contains log-ratios. For this reason it would be impossible for lmFit() to output a different class of object depending on the type of input data." Actually an exprSet can in principle contain almost any sort of expression data. The exprs slot can contain absolute (not logged) expression values for example. Gordon

ADD COMMENT • link 19.7 years ago Gordon Smyth 50k

Login before adding your answer.