Question

limma topTable question

0

Entering edit mode

Cyrus Harmon ▴ 140

@cyrus-harmon-1173

Last seen 9.6 years ago

Dear bioconductors, Forgive me if this is in the limma user's guide, but I'd like to be able to find the top, say, 100 probe sets in topTable sorted by M, not abs(M). I understand that I can do this with sort and resort and then filter out the negative ones, but this seems a bit clunky and in a sense isn't quite what I've asked, as the top 100 probe sets sorted by M after filtering by abs(M) aren't guaranteed to be the same thing, unless there are <= 100 probe sets. Yes, I can take the whole resorted table and take the top 100 rows, but that seems to be sort of defeating the purpose of topTable. Would it be possible to add another option for M instead of abs(M) for sort.by? Also, the description of what is going on here in the help page is rather cryptic. It is only in the discussion of resort.by that the abs thing comes up. Furthermore, a better description of what "M" "A", "T", "P" and "B" are would be helpful. They are each discussed in the context of the Values (for M, t, p and b) and in the Arguments (for A), but it would be nice if in the details section, this was recapitulated, perhaps with a description of what an A value is, and if the case options were spelled out. It seems the only t and p are allowed to be lowercase, but it's not clear why. (Combining my question and my gripe, a sort by "m" that didn't do abs(M) would seem useful to me, but perhaps I'm missing something.) Thanks, Cyrus

probe limma probe limma • 1.4k views

ADD COMMENT • link updated 19.1 years ago by Jenny Bryan ▴ 110 • written 19.1 years ago by Cyrus Harmon ▴ 140

score 0 · Answer 1 · 2005-04-05

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

----- Original Message ----- From: "Cyrus Harmon" <ch-bioc@bobobeach.com> To: <bioconductor@stat.math.ethz.ch> Sent: Monday, April 04, 2005 5:22 PM Subject: [BioC] limma topTable question > > Dear bioconductors, > > Forgive me if this is in the limma user's guide, but I'd like to be able > to find the top, say, 100 probe sets in topTable sorted by M, not abs(M). > I understand that I can do this with sort and resort and then filter out > the negative ones, but this seems a bit clunky and in a sense isn't quite > what I've asked, as the top 100 probe sets sorted by M after filtering by > abs(M) aren't guaranteed to be the same thing, unless there are <= 100 > probe sets. Yes, I can take the whole resorted table and take the top 100 > rows, but that seems to be sort of defeating the purpose of topTable. > > Would it be possible to add another option for M instead of abs(M) for > sort.by? The beauty of R and BioConductor is that YOU can modify option M: If you type topTable (without any parentheses or arguments), you will see that topTable is a VERY thin wrapper around a call to the function toptable. If you do the same for toptable (type it without any parentheses or arguments), you will see about half way down the definition of the function a line that looks like: ord <- switchsort.by, M=(order(abs(M), decreasing=TRUE), .... You can make a copy of toptable called my.toptable by: my.toptable <- toptable Then, change the order for 'M' in my.toptable to be: ord <- switchsort.by, M=(order(M, decreasing=TRUE),.... If you make a copy of topTable called my.topTable by: my.topTable <- topTable and change it so that it calls my.toptable instead of toptable, you now have your own function called my.topTable that does what you want. You can of course make any other changes to the functions that you want--add your own options, etc. The simple task of looking at others' code is quite powerful when dealing with issues like the one you bring up. I would encourage all who use bioconductor and R to try it whenever possible; even if it doesn't all make sense, it is a very good way to learn. > (Combining my question and my gripe, a sort by "m" that didn't do abs(M) > would seem useful to me, but perhaps I'm missing something.) If you are not typically a programmer in bioconductor, this seems like a good chance to try your hand at it. If you get something that you like better than what Gordon has offered in Limma, send him the modified code. He, like most of bioconductor/R developers, is remarkably receptive and responsive to criticism/improvements. Hope this helps. Sean

ADD COMMENT • link 19.1 years ago Sean Davis 21k

0

Entering edit mode

On Apr 4, 2005, at 8:14 PM, Sean Davis wrote: > The beauty of R and BioConductor is that YOU can modify option M: > > If you type topTable (without any parentheses or arguments), you will > see that topTable is a VERY thin wrapper around a call to the function > toptable. If you do the same for toptable (type it without any > parentheses or arguments), you will see about half way down the > definition of the function a line that looks like: > > ord <- switchsort.by, M=(order(abs(M), decreasing=TRUE), .... Indeed. Thanks for pointing this out. This is certainly quicker than my usual approach of finding the digging up the source file and finding the function definition in situ. Not quite as handy as M-. in SLIME (an emacs-based IDE/debugger for common lisp) but very nice. > You can make a copy of toptable called my.toptable by: > > my.toptable <- toptable > > Then, change the order for 'M' in my.toptable to be: > > ord <- switchsort.by, M=(order(M, decreasing=TRUE),.... What do you mean by "change the order ... in my.toptable? I get the obvious part, but the question is more of a mechanical one, having done my.toptable <- toptable? How do I edit the ord <- line? Clearly, I can put the function definition in an emacs/ESS buffer and eval the function def, but is there a better way to do this? The REPL is very nice, but the model of eval'ing function defs or regions one at a time in emacs buffers seems somewhat cloddish. Back to the parallel, with slime, is there a nice way to make this change take effect? It seems that the problem is magnified if I'm trying to develop an R extension as I have to do R CMD INSTALL in order to get the change to take effect in the place I eventually want to use it. I realize I've gone totally off of the topic from the original question, but if the preferred model of tweaking packages like this is as you've described, I feel like I must be missing something about the mechanics of writing and eval'ing R code. > If you make a copy of topTable called my.topTable by: > > my.topTable <- topTable > > and change it so that it calls my.toptable instead of toptable, you > now have your own function called my.topTable that does what you want. > You can of course make any other changes to the functions that you > want--add your own options, etc. The simple task of looking at > others' code is quite powerful when dealing with issues like the one > you bring up. I would encourage all who use bioconductor and R to try > it whenever possible; even if it doesn't all make sense, it is a very > good way to learn. Sure, but I'd hope that package maintainers were open to well-written and documented patches that added the functionality to the library itself, rather than having tons of local copies of possibly out of date lying code. I suppose, going back to my previous question, I could store my.toptable.diff and apply the diff on the fly and iff the patch succeeds eval a modified my.toptable, but that seems a bit hokey. > >> (Combining my question and my gripe, a sort by "m" that didn't do >> abs(M) would seem useful to me, but perhaps I'm missing something.) > > If you are not typically a programmer in bioconductor, this seems like > a good chance to try your hand at it. If you get something that you > like better than what Gordon has offered in Limma, send him the > modified code. He, like most of bioconductor/R developers, is > remarkably receptive and responsive to criticism/improvements. This is great to hear. I feel like I'm having trouble figuring out how to develop mid-size projects in R. Clearly, typing R commands straight into the REPL is a nice way to play around, and clearly the R extension mechanism is a great way to package up R extensions for distribution, but for developing my own mid-size R packages, I'm still unclear on reasonable idioms for putting together my own mid-size projects. So far the best I've come up with is local packages that still need to be R CMD INSTALL'ed, but to a local directory, and then library(lib.loc=<some-nice-local-path>) in my scripts, but this topic has probably been previously covered ad nauseum. Time to go digging through the docs and r-help archives. Thanks for your thoughtful reply, Cyrus

ADD REPLY • link 19.1 years ago Cyrus Harmon ▴ 140

0

Entering edit mode

See comments below On Mon, Apr 04, 2005 at 09:37:05PM -0700, Cyrus Harmon wrote: > > On Apr 4, 2005, at 8:14 PM, Sean Davis wrote: > > >The beauty of R and BioConductor is that YOU can modify option M: > > > >If you type topTable (without any parentheses or arguments), you will > >see that topTable is a VERY thin wrapper around a call to the function > >toptable. If you do the same for toptable (type it without any > >parentheses or arguments), you will see about half way down the > >definition of the function a line that looks like: > > > >ord <- switchsort.by, M=(order(abs(M), decreasing=TRUE), .... > > Indeed. Thanks for pointing this out. This is certainly quicker than my > usual approach of finding the digging up the source file and finding > the function definition in situ. Not quite as handy as M-. in SLIME (an > emacs-based IDE/debugger for common lisp) but very nice. > > >You can make a copy of toptable called my.toptable by: > > > >my.toptable <- toptable > > > >Then, change the order for 'M' in my.toptable to be: > > > >ord <- switchsort.by, M=(order(M, decreasing=TRUE),.... > > What do you mean by "change the order ... in my.toptable? I get the > obvious part, but the question is more of a mechanical one, having done > my.toptable <- toptable? How do I edit the ord <- line? Clearly, I can > put the function definition in an emacs/ESS buffer and eval the > function def, but is there a better way to do this? The REPL is very > nice, but the model of eval'ing function defs or regions one at a time What about C-x C-f: eval function? > in emacs buffers seems somewhat cloddish. Back to the parallel, with > slime, is there a nice way to make this change take effect? It seems > that the problem is magnified if I'm trying to develop an R extension > as I have to do R CMD INSTALL in order to get the change to take effect > in the place I eventually want to use it. I realize I've gone totally > off of the topic from the original question, but if the preferred model > of tweaking packages like this is as you've described, I feel like I > must be missing something about the mechanics of writing and eval'ing R > code. I am a bit unsure what your problem is. There are several approaches: 1) make a my.toptable and then edit topTable to call this function instead of toptable 2) make a function called toptable in your global environment. This will override the function in topTable, unless limma is using namespaces. If your problem is more on the line of editing the toptable function, you can do fix(toptable) or toptable <- edit(toptable) (with a suitable choice of options(editor = ...)) or you can do dump(toptable) (creates a file called toptable.R, which you cab edit and then source). In your scripts you can just source this function. > >If you make a copy of topTable called my.topTable by: > > > >my.topTable <- topTable > > > >and change it so that it calls my.toptable instead of toptable, you > >now have your own function called my.topTable that does what you want. > > You can of course make any other changes to the functions that you > >want--add your own options, etc. The simple task of looking at > >others' code is quite powerful when dealing with issues like the one > >you bring up. I would encourage all who use bioconductor and R to try > >it whenever possible; even if it doesn't all make sense, it is a very > >good way to learn. > > Sure, but I'd hope that package maintainers were open to well- written > and documented patches that added the functionality to the library > itself, rather than having tons of local copies of possibly out of date > lying code. I suppose, going back to my previous question, I could > store my.toptable.diff and apply the diff on the fly and iff the patch > succeeds eval a modified my.toptable, but that seems a bit hokey. They generally are, but in my own opinion, a developer has to achive a balance between enough options/arguments in a function and too many. It is not certain that a particular twist on a function is something which will be widely enough used to warrant a host of new arguments (I am not saying this particular case is one of these). > > > > >>(Combining my question and my gripe, a sort by "m" that didn't do > >>abs(M) would seem useful to me, but perhaps I'm missing something.) > > > >If you are not typically a programmer in bioconductor, this seems like > >a good chance to try your hand at it. If you get something that you > >like better than what Gordon has offered in Limma, send him the > >modified code. He, like most of bioconductor/R developers, is > >remarkably receptive and responsive to criticism/improvements. > > This is great to hear. I feel like I'm having trouble figuring out how > to develop mid-size projects in R. Clearly, typing R commands straight > into the REPL is a nice way to play around, and clearly the R extension > mechanism is a great way to package up R extensions for distribution, > but for developing my own mid-size R packages, I'm still unclear on > reasonable idioms for putting together my own mid-size projects. So far > the best I've come up with is local packages that still need to be R > CMD INSTALL'ed, but to a local directory, and then > library(lib.loc=<some-nice-local-path>) in my scripts, but this topic > has probably been previously covered ad nauseum. Time to go digging > through the docs and r-help archives. I think the "local package" paradigm works fine for me. Else keep an R file you can source. You do not need to re-install packages for small changes to take effect while you develop a package, you can just evaluate the appropriate code-snippet. In addition I tend to use a Makefile if I find I re-install often. Kasper

ADD REPLY • link 19.1 years ago Kasper Daniel Hansen ▴ 630

0

Entering edit mode

On Apr 5, 2005, at 12:37 AM, Cyrus Harmon wrote: > > On Apr 4, 2005, at 8:14 PM, Sean Davis wrote: > What do you mean by "change the order ... in my.toptable? I get the > obvious part, but the question is more of a mechanical one, having > done my.toptable <- toptable? How do I edit the ord <- line? Clearly, > I can put the function definition in an emacs/ESS buffer and eval the > function def, but is there a better way to do this? The REPL is very > nice, but the model of eval'ing function defs or regions one at a time > in emacs buffers seems somewhat cloddish. Back to the parallel, with > slime, is there a nice way to make this change take effect? It seems > that the problem is magnified if I'm trying to develop an R extension > as I have to do R CMD INSTALL in order to get the change to take > effect in the place I eventually want to use it. I realize I've gone > totally off of the topic from the original question, but if the > preferred model of tweaking packages like this is as you've described, > I feel like I must be missing something about the mechanics of writing > and eval'ing R code. > If you never, EVER, want the limma topTable and toptable functions back, by all means, just edit the code in the limma package directly (without making the my.topTable and my.toptable copies), but this has obvious risks. I do this only when I have a function from a package that I need to change, but that is called by many other functions within that package. Short of replacing the functions within limma, you have at least three options. R lets you save your workspace, which includes functions that you have defined. Therefore, after you edit your function and eval it in R once (or if you follow the copy route that I suggested), you will have my.topTable and my.toptable in your workspace. Save that workspace and each time you reload it, your functions will be there, ready to use. No need to build a package, install it, load it, etc. Option number 2 is to "save" the functions using save. Something like: save(file='my.limma.replacements.RData',list=c('my.toptable','my.topTa bl e')) will save my.toptable and my.topTable in a file of the name given. Then, you can: load('my.limma.replacements.RData') and your functions are back in your workspace for you to use. Finally, you can go the route of saving your files into your own package. I understand that re-installing a package is a few extra steps, but is it REALLY that hard? I've given at least three systems that R allows for you to make your functions available to you without modifying the limma source code--choose the one you like. >> If you make a copy of topTable called my.topTable by: >> >> my.topTable <- topTable >> >> and change it so that it calls my.toptable instead of toptable, you >> now have your own function called my.topTable that does what you >> want. You can of course make any other changes to the functions that >> you want--add your own options, etc. The simple task of looking at >> others' code is quite powerful when dealing with issues like the one >> you bring up. I would encourage all who use bioconductor and R to >> try it whenever possible; even if it doesn't all make sense, it is a >> very good way to learn. > > Sure, but I'd hope that package maintainers were open to well- written > and documented patches that added the functionality to the library > itself, rather than having tons of local copies of possibly out of > date lying code. I suppose, going back to my previous question, I > could store my.toptable.diff and apply the diff on the fly and iff the > patch succeeds eval a modified my.toptable, but that seems a bit > hokey. > I think the package maintainers ARE open to well-written and documented patches that add functionality to the library. However, I also think that they need to (and generally are very good at) balancing the needs of the few with the needs of the many (not that this applies in your case) and maintaining a flexible but not too "kitchen-sink" approach to general functions. >> >>> (Combining my question and my gripe, a sort by "m" that didn't do >>> abs(M) would seem useful to me, but perhaps I'm missing something.) >> >> If you are not typically a programmer in bioconductor, this seems >> like a good chance to try your hand at it. If you get something that >> you like better than what Gordon has offered in Limma, send him the >> modified code. He, like most of bioconductor/R developers, is >> remarkably receptive and responsive to criticism/improvements. > > This is great to hear. I feel like I'm having trouble figuring out how > to develop mid-size projects in R. Clearly, typing R commands straight > into the REPL is a nice way to play around, and clearly the R > extension mechanism is a great way to package up R extensions for > distribution, but for developing my own mid-size R packages, I'm still > unclear on reasonable idioms for putting together my own mid-size > projects. So far the best I've come up with is local packages that > still need to be R CMD INSTALL'ed, but to a local directory, and then > library(lib.loc=<some-nice-local-path>) in my scripts, but this topic > has probably been previously covered ad nauseum. Time to go digging > through the docs and r-help archives. > You can set R to load whatever libraries you would like at startup, obviating the need to remember to do it yourself. As for developing your own projects, I use ESS for code development (C-c C-d to edit an object in the workspace, C-c C-l to load a buffer back into the workspace). I typically go the "save" route from above for simple functions that stand on their own or "replace" other functions for the life of a project, but if I use it in more than one project, I typically put it into a fairly unstructured library that contains frequently-used functions and little documentation. The next level is a full-blown, well-documented, tested package for public consumption. While "programmers" would argue that every function needs to be fully documented, etc., I don't do that for my little utility packages (and sometimes pay for my laziness), but obviously for public consumption, documentation is ABSOLUTELY necessary. Again, hope this helps. Sean

ADD REPLY • link 19.1 years ago Sean Davis 21k

score 0 · Answer 2 · 2005-04-05

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 4 hours ago

WEHI, Melbourne, Australia

>Date: Mon, 4 Apr 2005 14:22:09 -0700 >From: Cyrus Harmon <ch-bioc@bobobeach.com> >Subject: [BioC] limma topTable question >To: bioconductor@stat.math.ethz.ch > >Dear bioconductors, > >Forgive me if this is in the limma user's guide, but I'd like to be >able to find the top, say, 100 probe sets in topTable sorted by M, not >abs(M). I understand that I can do this with sort and resort and then >filter out the negative ones, but this seems a bit clunky and in a >sense isn't quite what I've asked, as the top 100 probe sets sorted by >M after filtering by abs(M) aren't guaranteed to be the same thing, >unless there are <= 100 probe sets. Yes, I can take the whole resorted >table and take the top 100 rows, but that seems to be sort of defeating >the purpose of topTable. > >Would it be possible to add another option for M instead of abs(M) for >sort.by? I am not convinced that this would be a useful option. Microarray differential expression analyses are virtually always two-sided, for good reason, because researchers need to know about genes moving strongly down as well as up. Hence topTable() is designed to facilitate a two-sided analysis. You are implying that you want to do an anlaysis in which you don't want to even see the genes moving down. Why do you think that this is a generally useful analysis? I haven't seen any microarray problems which I would want to analyse that way. If you have a need which is specific to your own situation, and not likely to be of wide interest, then Sean and Kasper have explained how you can make your own functions. >Also, the description of what is going on here in the help page is >rather cryptic. It is only in the discussion of resort.by that the abs >thing comes up. Furthermore, a better description of what "M" "A", "T", >"P" and "B" are would be helpful. You don't say, but I assume that you are refering to the list of possible values for the 'sort.by' argument to 'topTable'. Confusion about ranking on M vs abs(M) does not normally arise, because microarray analysis are virtually always two-sided and topTable() provides the behaviour which users generally expect. > They are each discussed in the >context of the Values (for M, t, p and b) and in the Arguments (for A), >but it would be nice if in the details section, this was recapitulated, >perhaps with a description of what an A value is, See the User's Guide section "Statistics for differential expression". > and if the case >options were spelled out. It seems the only t and p are allowed to be >lowercase, but it's not clear why. >(Combining my question and my gripe, a sort by "m" that didn't do >abs(M) would seem useful to me, but perhaps I'm missing something.) R is a case-sensitive language, and it doesn't seem such a hardship to use a capital "M" in sort.by="M", as per the documentation. The reason that "M", "A" and "B" are treated strictly as upper-case is because these symbols are always uppercase in the published microarray literature. Gordon >Thanks, > >Cyrus

ADD COMMENT • link 19.1 years ago Gordon Smyth 50k

0

Entering edit mode

Dear Gordon, Thanks for your comments. I have attached some additional remarks below. Please note that these comments all fall in the minor nit category, my immediate problems have been solved, and I'm very pleased with limma. That being said, I have a few comments that I think speak to making limma more useable by novice users such as myself. On Apr 5, 2005, at 4:37 PM, Gordon Smyth wrote: > I am not convinced that this would be a useful option. Microarray > differential expression analyses are virtually always two-sided, for > good reason, because researchers need to know about genes moving > strongly down as well as up. Hence topTable() is designed to > facilitate a two-sided analysis. You are implying that you want to do > an anlaysis in which you don't want to even see the genes moving down. > Why do you think that this is a generally useful analysis? I haven't > seen any microarray problems which I would want to analyse that way. I can certainly see the case for being interested in genes moving up and down. The reason I wanted the particular feature described was that I had hacked together a simple way of looking at the top and bottom N genes for a given expt. relative to the others in a set via either, basically, M or A. Limma, of course, offers a much better way of doing this and more with contrasts and fitting. However, I still wanted to see the top and bottom N for comparison with what I had done before. I agree that both up- and down-regulated genes are interesting, but by sorting by abs(M), we are limiting ourselves to the top N in abs(M). In this case I wanted top N/2 by M and bottom N/2 by M. I certainly don't claim to be an expert on any of this; I do claim to be "surprised" by the fact that sorting on M was sorting on abs(M). (Granted it was easy enough to figure out what was going on, but the principle of least surprise is always good.) > If you have a need which is specific to your own situation, and not > likely to be of wide interest, then Sean and Kasper have explained how > you can make your own functions. Works for me. I now have options 'U' and 'D' which do the obvious thing with M descending and ascending. >> Also, the description of what is going on here in the help page is >> rather cryptic. It is only in the discussion of resort.by that the abs >> thing comes up. Furthermore, a better description of what "M" "A", >> "T", >> "P" and "B" are would be helpful. > > You don't say, but I assume that you are refering to the list of > possible values for the 'sort.by' argument to 'topTable'. Confusion > about ranking on M vs abs(M) does not normally arise, because > microarray analysis are virtually always two-sided and topTable() > provides the behaviour which users generally expect. Perhaps that should read "experienced users" as I expected something different, but clearly I'm new to all of this. >> They are each discussed in the >> context of the Values (for M, t, p and b) and in the Arguments (for >> A), >> but it would be nice if in the details section, this was >> recapitulated, >> perhaps with a description of what an A value is, > > See the User's Guide section "Statistics for differential expression". Fair enough, but a simple recapitulation of what these options mean in the topTable help page would help folks like me. As it is written, the options are enumerated for sort.by and resort.by and M, T, P, and B are described as columns of the return value, but it isn't explicitly stated that the input values to sort.by and resort.by are, basically, but not strictly, column names of the output. I suppose it's obvious, but a clearer description of the argument would help and would provide a place to inform the user that selecting sort.by="M" really means abs(M). >> and if the case >> options were spelled out. It seems the only t and p are allowed to be >> lowercase, but it's not clear why. >> (Combining my question and my gripe, a sort by "m" that didn't do >> abs(M) would seem useful to me, but perhaps I'm missing something.) > > R is a case-sensitive language, and it doesn't seem such a hardship to > use a capital "M" in sort.by="M", as per the documentation. The reason > that "M", "A" and "B" are treated strictly as upper-case is because > these symbols are always uppercase in the published microarray > literature. It's no hardship, I'm just arguing for completeness of the docs. If you accept 'p' and 't' as synonyms, for 'P' and 'T', say so. Thanks for your efforts in creating limma! It's a rather handy tool and I look forward to using it more. Sincerely, Cyrus

ADD REPLY • link 19.1 years ago Cyrus Harmon ▴ 140

0

Entering edit mode

At 11:32 AM 6/04/2005, Cyrus Harmon wrote: >Dear Gordon, > >Thanks for your comments. I have attached some additional remarks below. >Please note that these comments all fall in the minor nit category, my >immediate problems have been solved, and I'm very pleased with limma. That >being said, I have a few comments that I think speak to making limma more >useable by novice users such as myself. > >On Apr 5, 2005, at 4:37 PM, Gordon Smyth wrote: > >>I am not convinced that this would be a useful option. Microarray >>differential expression analyses are virtually always two-sided, for good >>reason, because researchers need to know about genes moving strongly down >>as well as up. Hence topTable() is designed to facilitate a two- sided >>analysis. You are implying that you want to do an anlaysis in which you >>don't want to even see the genes moving down. Why do you think that this >>is a generally useful analysis? I haven't seen any microarray problems >>which I would want to analyse that way. > >I can certainly see the case for being interested in genes moving up and >down. The reason I wanted the particular feature described was that I had >hacked together a simple way of looking at the top and bottom N genes for >a given expt. relative to the others in a set via either, basically, M or >A. Limma, of course, offers a much better way of doing this and more with >contrasts and fitting. However, I still wanted to see the top and bottom N >for comparison with what I had done before. I agree that both up- and >down-regulated genes are interesting, but by sorting by abs(M), we are >limiting ourselves to the top N in abs(M). In this case I wanted top N/2 >by M and bottom N/2 by M. I certainly don't claim to be an expert on any >of this; I do claim to be "surprised" by the fact that sorting on M was >sorting on abs(M). (Granted it was easy enough to figure out what was >going on, but the principle of least surprise is always good.) I tend to aim the limma package at wet-lab biological collaborators, and tend to put into the package only things that I feel confident about recommending to them. (This isn't true of everything on bioconductor -- much of bioconductor is aimed more at computational biologists or software developers.) The trouble with looking for a fixed number of genes up and a fixed number down is that the cutoff threshold, in terms of evidence for differential expression, will be different for the two lists. This is why I am reluctant to recommend it to biological collaborators and to include it as an option in topTable(). This isn't to say that getting N/2 genes up and down isn't useful for some purposes, e.g., comparisons of analysis methodologies. It is just that this sort of work (meta analysis) tends to be done by experienced programmers, probably including yourself, rather than by wet-lab biologists, and these people tend to want modular tools rather than ready-made solutions. What I would like to add is a facility in limma to get a topTable like summary for any specified subset of genes rather than by ranking. This might be added to the topTable() function, but there are other ways as well. One can already select a subset of an MArrayLM object. E.g., to reduce to only the top 100 genes in terms of fold change up for a linear model with two coefs: o <- order(fit$coef[,2], decreasing=TRUE) fit.top100 <- fit[o[1:100],] although this facility still has some bugs in the current limma version. You can also select only the coefficient of interest: fit.top100 <- fit.top100[,2] In the next limma version you will be able to coerce a fitted model object to a data.frame, e.g., tab <- as.data.frame(fit.top100) which gives you something very much like the topTable() summary. which gives you something very >>If you have a need which is specific to your own situation, and not >>likely to be of wide interest, then Sean and Kasper have explained how >>you can make your own functions. > >Works for me. I now have options 'U' and 'D' which do the obvious thing >with M descending and ascending. > >>>Also, the description of what is going on here in the help page is >>>rather cryptic. It is only in the discussion of resort.by that the abs >>>thing comes up. Furthermore, a better description of what "M" "A", "T", >>>"P" and "B" are would be helpful. >> >>You don't say, but I assume that you are refering to the list of possible >>values for the 'sort.by' argument to 'topTable'. Confusion about ranking >>on M vs abs(M) does not normally arise, because microarray analysis are >>virtually always two-sided and topTable() provides the behaviour which >>users generally expect. > >Perhaps that should read "experienced users" as I expected something >different, but clearly I'm new to all of this. I am guessing that you come from a programming background, and programmers would naturally expect the behaviour to be procedural. My experience with non-mathematical biologists is that they expect the behaviour to be two-sided. Anyway, I will make it more explicit in the documentation. >>> They are each discussed in the >>>context of the Values (for M, t, p and b) and in the Arguments (for A), >>>but it would be nice if in the details section, this was recapitulated, >>>perhaps with a description of what an A value is, >> >>See the User's Guide section "Statistics for differential expression". > >Fair enough, but a simple recapitulation of what these options mean in the >topTable help page would help folks like me. As it is written, the options >are enumerated for sort.by and resort.by and M, T, P, and B are described >as columns of the return value, but it isn't explicitly stated that the >input values to sort.by and resort.by are, basically, but not strictly, >column names of the output. I suppose it's obvious, but a clearer >description of the argument would help and would provide a place to inform >the user that selecting sort.by="M" really means abs(M). > >>> and if the case >>>options were spelled out. It seems the only t and p are allowed to be >>>lowercase, but it's not clear why. >>>(Combining my question and my gripe, a sort by "m" that didn't do >>>abs(M) would seem useful to me, but perhaps I'm missing something.) >> >>R is a case-sensitive language, and it doesn't seem such a hardship to >>use a capital "M" in sort.by="M", as per the documentation. The reason >>that "M", "A" and "B" are treated strictly as upper-case is because these >>symbols are always uppercase in the published microarray literature. > >It's no hardship, I'm just arguing for completeness of the docs. If you >accept 'p' and 't' as synonyms, for 'P' and 'T', say so. I will make it more explicit. Gordon >Thanks for your efforts in creating limma! It's a rather handy tool and I >look forward to using it more. > >Sincerely, > >Cyrus >

ADD REPLY • link 19.1 years ago Gordon Smyth 50k

score 0 · Answer 3 · 2005-04-06

On Apr 5, 2005, at 4:37 PM, Gordon Smyth wrote: > I am not convinced that this would be a useful option. Microarray > differential expression analyses are virtually always two-sided, for > good reason, because researchers need to know about genes moving > strongly down as well as up. Hence topTable() is designed to > facilitate a two-sided analysis. You are implying that you want to do > an anlaysis in which you don't want to even see the genes moving down. > Why do you think that this is a generally useful analysis? I haven't > seen any microarray problems which I would want to analyse that way. Although my own thinking is the same as Gordon's w.r.t. two-sided analyses, I have actually noticed this phenomenon several times in collaboration. That is, a much greater interest in up-regulated genes than down-regulated genes. I gather there are at least two reasonably good explanations for this: 1) If the next step is some sort of intervention aimed at changing transcriptional activity, it is easier and/or more likely to be an attempt to decrease or completely suppress transcription than to increase it. Hence, the preference for finding overexpressed genes. 2) First, take it as given that the microarray technology only has hope (under the usual protocols for preparing the mRNA) of measuring expression changes in a gene's *share* of a typical mRNA pool, not absolute changes on a copies per cell basis. Then, to construct a highly artificial example that makes the point, imagine that the treatment induces substantially higher absolute expression of some genes, but leaves the expression of most everything else practically the same (this *is* on an absolute, copies per cell basis). Maybe some type of stress response is being provoked and studied. Then, the true relative share fold-changes we hope to estimate with array data will evidence a mix of up and down regulation. But we only care about identifying the genes that are up-regulated. I make no claims about these one-sided analyses being generally advisable, but since it does come up, these are my attempt to understand why. Jenny Bryan