Question

ReportingTools error

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 5 hours ago

United States

Hi, I am getting an error when trying to create HTML pages with ReportingTools, using an MArrayLM object as input. The error I get is Error in expression.dat[probe, ] : subscript out of bounds which appears to come from .make.gene.plots(), specifically here: for (probe in rownames(df)) { if ("Symbol" %in% colnames(df)) { ylab <- paste(df[probe, "Symbol"], ylab.type) } else { ylab <- paste(probe, ylab.type) } bigplot <- stripplot(expression.dat[probe, ] ~ factor, The problem being that the rownames for a topTable object will be the row numbers of the MArrayLM object from whence the data came (this was recently harmonized by Gordon Smyth, so the row.names will always be the row number, regardless of using topTable() or topTableF()). In other words, it appears that probe is assumed to be the row name, when in fact it will be the row number. So something like for(probe in as.numeric(rownames(df))){ should do the trick. Best, Jim -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

probe probe • 970 views

ADD COMMENT • link updated 11.0 years ago by Jason Hackney ▴ 160 • written 11.0 years ago by James W. MacDonald 65k

score 0 · Answer 1 · 2013-04-11

0

Entering edit mode

Jason Hackney ▴ 160

@jason-hackney-5882

Last seen 9.6 years ago

Hi Jim, Could you send me your sessionInfo? I'm having trouble replicating this bug. I'm still getting probe names for topTableF and row numbers for topTable, as of limma_3.16.1. I'll pop in a bug fix to the ReportingTools trunk tomorrow, once I get the limma version sorted. Thanks, Jason On Thu, Apr 11, 2013 at 11:05 AM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi, > > I am getting an error when trying to create HTML pages with > ReportingTools, using an MArrayLM object as input. The error I get is > > Error in expression.dat[probe, ] : subscript out of bounds > > which appears to come from .make.gene.plots(), specifically here: > > for (probe in rownames(df)) { > if ("Symbol" %in% colnames(df)) { > ylab <- paste(df[probe, "Symbol"], ylab.type) > } > else { > ylab <- paste(probe, ylab.type) > } > bigplot <- stripplot(expression.dat[**probe, ] ~ factor, > > The problem being that the rownames for a topTable object will be the row > numbers of the MArrayLM object from whence the data came (this was recently > harmonized by Gordon Smyth, so the row.names will always be the row number, > regardless of using topTable() or topTableF()). > > In other words, it appears that probe is assumed to be the row name, when > in fact it will be the row number. So something like > > for(probe in as.numeric(rownames(df))){ > > should do the trick. > > Best, > > Jim > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- Jason A. Hackney, Ph.D. Bioinformatics and Computational Biology Genentech hackney.jason@gene.com 650-467-5084 [[alternative HTML version deleted]]

ADD COMMENT • link 11.0 years ago Jason Hackney ▴ 160

0

Entering edit mode

Hi Jason, I see the same thing - I had an email exchange with Gordon back in February and he agreed that the row.names of the output from topTable and topTableF should be the same thing, and it looked like he was leaning towards using the row numbers. Given the speed with which he updates things in limma, I assumed this happened approximately 13 nanoseconds later, but evidently it either fell through the cracks or he had a change of mind (Gordon is cc'ed). But I wonder if the ID column is a better way to go anyway. Gordon - what is the safest way to use data from either topTable or topTableF to extract the corresponding raw data from the input object? Is the ID column guaranteed to always correspond to the row.names or featureNames of the data passed into lmFit? Best, Jim > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] ReportingTools_2.1.2 knitr_1.2 [3] lattice_0.20-15 affycoretools_1.32.0 [5] KEGG.db_2.9.0 GO.db_2.9.0 [7] AnnotationDbi_1.22.1 affy_1.38.0 [9] pd.ragene.1.0.st.v1_3.8.0 RSQLite_0.11.2 [11] DBI_0.2-5 limma_3.16.1 [13] oligo_1.24.0 Biobase_2.20.0 [15] oligoClasses_1.22.0 BiocGenerics_0.6.0 [snip] On 4/11/2013 9:24 PM, Jason Hackney wrote: > Hi Jim, > > Could you send me your sessionInfo? I'm having trouble replicating > this bug. I'm still getting probe names for topTableF and row numbers > for topTable, as of limma_3.16.1. I'll pop in a bug fix to the > ReportingTools trunk tomorrow, once I get the limma version sorted. > > Thanks, > > Jason > > On Thu, Apr 11, 2013 at 11:05 AM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi, > > I am getting an error when trying to create HTML pages with > ReportingTools, using an MArrayLM object as input. The error I get is > > Error in expression.dat[probe, ] : subscript out of bounds > > which appears to come from .make.gene.plots(), specifically here: > > for (probe in rownames(df)) { > if ("Symbol" %in% colnames(df)) { > ylab <- paste(df[probe, "Symbol"], ylab.type) > } > else { > ylab <- paste(probe, ylab.type) > } > bigplot <- stripplot(expression.dat[probe, ] ~ factor, > > The problem being that the rownames for a topTable object will be > the row numbers of the MArrayLM object from whence the data came > (this was recently harmonized by Gordon Smyth, so the row.names > will always be the row number, regardless of using topTable() or > topTableF()). > > In other words, it appears that probe is assumed to be the row > name, when in fact it will be the row number. So something like > > for(probe in as.numeric(rownames(df))){ > > should do the trick. > > Best, > > Jim > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > > > -- > Jason A. Hackney, Ph.D. > Bioinformatics and Computational Biology > Genentech > hackney.jason at gene.com <mailto:hackney.jason at="" gene.com=""> > 650-467-5084 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.0 years ago James W. MacDonald 65k

0

Entering edit mode

Hi Jim, My concern about relying on the ID field is that it isn't always there. For instance, when I add featureData to an eSet, I almost always specify that the ID is a ProbeID for a microarray or a GeneID if I'm using some other identifier as my featureNames. When lmFit is called, the genes data.frame now doesn't have an ID column. What I might do is try to detect the ID column in either case, and use it if it's present. I expect that when/if topTableF and topTable are concordant in their row.names I'll know about because one of my unit tests will fail because they are expected to be discordant. Cheers, Jason On Fri, Apr 12, 2013 at 6:33 AM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Jason, > > I see the same thing - I had an email exchange with Gordon back in > February and he agreed that the row.names of the output from topTable and > topTableF should be the same thing, and it looked like he was leaning > towards using the row numbers. Given the speed with which he updates things > in limma, I assumed this happened approximately 13 nanoseconds later, but > evidently it either fell through the cracks or he had a change of mind > (Gordon is cc'ed). > > But I wonder if the ID column is a better way to go anyway. > > Gordon - what is the safest way to use data from either topTable or > topTableF to extract the corresponding raw data from the input object? Is > the ID column guaranteed to always correspond to the row.names or > featureNames of the data passed into lmFit? > > Best, > > Jim > > > sessionInfo() > R version 3.0.0 (2013-04-03) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] ReportingTools_2.1.2 knitr_1.2 > [3] lattice_0.20-15 affycoretools_1.32.0 > [5] KEGG.db_2.9.0 GO.db_2.9.0 > [7] AnnotationDbi_1.22.1 affy_1.38.0 > [9] pd.ragene.1.0.st.v1_3.8.0 RSQLite_0.11.2 > [11] DBI_0.2-5 limma_3.16.1 > [13] oligo_1.24.0 Biobase_2.20.0 > [15] oligoClasses_1.22.0 BiocGenerics_0.6.0 > > [snip] > > > > > On 4/11/2013 9:24 PM, Jason Hackney wrote: > >> Hi Jim, >> >> Could you send me your sessionInfo? I'm having trouble replicating this >> bug. I'm still getting probe names for topTableF and row numbers for >> topTable, as of limma_3.16.1. I'll pop in a bug fix to the ReportingTools >> trunk tomorrow, once I get the limma version sorted. >> >> Thanks, >> >> Jason >> >> On Thu, Apr 11, 2013 at 11:05 AM, James W. MacDonald <jmacdon@uw.edu<mailto:>> jmacdon@uw.edu>> wrote: >> >> Hi, >> >> I am getting an error when trying to create HTML pages with >> ReportingTools, using an MArrayLM object as input. The error I get is >> >> Error in expression.dat[probe, ] : subscript out of bounds >> >> which appears to come from .make.gene.plots(), specifically here: >> >> for (probe in rownames(df)) { >> if ("Symbol" %in% colnames(df)) { >> ylab <- paste(df[probe, "Symbol"], ylab.type) >> } >> else { >> ylab <- paste(probe, ylab.type) >> } >> bigplot <- stripplot(expression.dat[**probe, ] ~ factor, >> >> The problem being that the rownames for a topTable object will be >> the row numbers of the MArrayLM object from whence the data came >> (this was recently harmonized by Gordon Smyth, so the row.names >> will always be the row number, regardless of using topTable() or >> topTableF()). >> >> In other words, it appears that probe is assumed to be the row >> name, when in fact it will be the row number. So something like >> >> for(probe in as.numeric(rownames(df))){ >> >> should do the trick. >> >> Best, >> >> Jim >> >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> >> >> -- >> Jason A. Hackney, Ph.D. >> Bioinformatics and Computational Biology >> Genentech >> hackney.jason@gene.com <mailto:hackney.jason@gene.com**> >> 650-467-5084 >> >> >> > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- Jason A. Hackney, Ph.D. Bioinformatics and Computational Biology Genentech hackney.jason@gene.com 650-467-5084 [[alternative HTML version deleted]]

ADD REPLY • link 11.0 years ago Jason Hackney ▴ 160

0

Entering edit mode

Hi Jim and Jason, As Jim says, I didn't intend topTable() and topTableF() to be different in terms of row.names. The topTableF() function was written later, and I was more careful to carry through the official row names of the data object than when I wrote topTable(). I haven't fixed this discordance because it isn't clear to me which way is actually better. If lmFit() worked only on ExpressionSet objects, then it would be straightforward. All ExpressionSet objects have unique row names, which could be percolated through to the fitted MArrayLM object and to the topTable output. However lmFit() is designed to work on plain matrices and other types of objects. A matrix might not have row names set and, if it has, the row names do not have to be unique. Hence it is impossible to require in general that the MArrayLM and the topTable objects have the same row names. If the matrix has duplicate row names, which is perfectly allowed, then the MarrayLM will do so also, but topTable cannot have duplicate row names because it is a data.frame. When, lmFit() gets a matrix without rownames, it leaves the row names NULL. When the fit object gets to topTable(), topTable() adds rownames 1:N (where N is the number of rows in the fit object). When lmFit() gets a matrix with rownames, then it creates a data.frame 'genes' with column ID. This then becomes a column in the topTable() output. The rownames are set to 1:N as well. Note that ID can have duplicate values. If lmFit() gets an ExpressionSet object with featureData, it copies all the featureData into the 'genes' data.frame, which becomes part of the topTable output. If lmFit() gets an ExpressionSet object without featureData, then it creates a data.frame with a column ID containing the row.names, just as it does for matrices. This time ID cannot have duplicate values. Note that lmFit() creates a column ID in the 'genes' data.frame only when row.names are provided but no other valid probe annotation is available. If a data.frame of annotation is provided, then lmFit() simply uses the columns that are provided. This is why a column ID is sometimes but not always available. Here is my tentative solution: 1. lmFit() will start enforcing unique row names in MArrayLM objects. If no row names are provided, then row names will be set to 1:N. If the row names of the expression matrix are not unique, then they will be made unique by makeUnique(). 2. lmFit() will no longer create an annotation column called 'ID'. Instead the relevant information will be held in the row.names. 2. topTable() will preserve the row.names found in the MArrayLM fit object, same as topTableF does currently. How does that sound? Possible cons: when rownames are non-unique, the makeUnique() function will create rownames previously unknown to the user. Also I wonder how many people with miss the numerical row numbers currently output by topTable? Regards Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Tel: (03) 9345 2326, Fax (03) 9347 0852, http://www.statsci.org/smyth On Fri, 12 Apr 2013, Jason Hackney wrote: > Hi Jim, > > My concern about relying on the ID field is that it isn't always there. For > instance, when I add featureData to an eSet, I almost always specify that > the ID is a ProbeID for a microarray or a GeneID if I'm using some other > identifier as my featureNames. When lmFit is called, the genes data.frame > now doesn't have an ID column. > > What I might do is try to detect the ID column in either case, and use it > if it's present. > > I expect that when/if topTableF and topTable are concordant in their > row.names I'll know about because one of my unit tests will fail because > they are expected to be discordant. > > Cheers, > > Jason > > On Fri, Apr 12, 2013 at 6:33 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > >> Hi Jason, >> >> I see the same thing - I had an email exchange with Gordon back in >> February and he agreed that the row.names of the output from topTable and >> topTableF should be the same thing, and it looked like he was leaning >> towards using the row numbers. Given the speed with which he updates things >> in limma, I assumed this happened approximately 13 nanoseconds later, but >> evidently it either fell through the cracks or he had a change of mind >> (Gordon is cc'ed). >> >> But I wonder if the ID column is a better way to go anyway. >> >> Gordon - what is the safest way to use data from either topTable or >> topTableF to extract the corresponding raw data from the input object? Is >> the ID column guaranteed to always correspond to the row.names or >> featureNames of the data passed into lmFit? >> >> Best, >> >> Jim >> >>> sessionInfo() >> R version 3.0.0 (2013-04-03) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] ReportingTools_2.1.2 knitr_1.2 >> [3] lattice_0.20-15 affycoretools_1.32.0 >> [5] KEGG.db_2.9.0 GO.db_2.9.0 >> [7] AnnotationDbi_1.22.1 affy_1.38.0 >> [9] pd.ragene.1.0.st.v1_3.8.0 RSQLite_0.11.2 >> [11] DBI_0.2-5 limma_3.16.1 >> [13] oligo_1.24.0 Biobase_2.20.0 >> [15] oligoClasses_1.22.0 BiocGenerics_0.6.0 >> >> [snip] >> >> >> >> >> On 4/11/2013 9:24 PM, Jason Hackney wrote: >> >>> Hi Jim, >>> >>> Could you send me your sessionInfo? I'm having trouble replicating this >>> bug. I'm still getting probe names for topTableF and row numbers for >>> topTable, as of limma_3.16.1. I'll pop in a bug fix to the ReportingTools >>> trunk tomorrow, once I get the limma version sorted. >>> >>> Thanks, >>> >>> Jason >>> >>> On Thu, Apr 11, 2013 at 11:05 AM, James W. MacDonald <jmacdon at="" uw.edu<mailto:="">>> jmacdon at uw.edu>> wrote: >>> >>> Hi, >>> >>> I am getting an error when trying to create HTML pages with >>> ReportingTools, using an MArrayLM object as input. The error I get is >>> >>> Error in expression.dat[probe, ] : subscript out of bounds >>> >>> which appears to come from .make.gene.plots(), specifically here: >>> >>> for (probe in rownames(df)) { >>> if ("Symbol" %in% colnames(df)) { >>> ylab <- paste(df[probe, "Symbol"], ylab.type) >>> } >>> else { >>> ylab <- paste(probe, ylab.type) >>> } >>> bigplot <- stripplot(expression.dat[**probe, ] ~ factor, >>> >>> The problem being that the rownames for a topTable object will be >>> the row numbers of the MArrayLM object from whence the data came >>> (this was recently harmonized by Gordon Smyth, so the row.names >>> will always be the row number, regardless of using topTable() or >>> topTableF()). >>> >>> In other words, it appears that probe is assumed to be the row >>> name, when in fact it will be the row number. So something like >>> >>> for(probe in as.numeric(rownames(df))){ >>> >>> should do the trick. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> -- James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> >>> >>> >>> >>> -- >>> Jason A. Hackney, Ph.D. >>> Bioinformatics and Computational Biology >>> Genentech >>> hackney.jason at gene.com <mailto:hackney.jason at="" gene.com**=""> >>> 650-467-5084 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 11.0 years ago Gordon Smyth 50k

0

Entering edit mode

Dear Jason, In my own use of limma, I never subset topTable output by rowname, so it isn't clear to me why ReportTools needs to do this. To me, topTable already does the desired subsetting, so further subsetting should never be required. More generally, I am unclear why ReportTools needs to operate on a MArrayLM object. Why not operate on the topTable() output directly? Regards Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Fri, 12 Apr 2013, Jason Hackney wrote: > Hi Jim, > > My concern about relying on the ID field is that it isn't always there. For > instance, when I add featureData to an eSet, I almost always specify that > the ID is a ProbeID for a microarray or a GeneID if I'm using some other > identifier as my featureNames. When lmFit is called, the genes data.frame > now doesn't have an ID column. > > What I might do is try to detect the ID column in either case, and use it > if it's present. > > I expect that when/if topTableF and topTable are concordant in their > row.names I'll know about because one of my unit tests will fail because > they are expected to be discordant. > > Cheers, > > Jason > > On Fri, Apr 12, 2013 at 6:33 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > >> Hi Jason, >> >> I see the same thing - I had an email exchange with Gordon back in >> February and he agreed that the row.names of the output from topTable and >> topTableF should be the same thing, and it looked like he was leaning >> towards using the row numbers. Given the speed with which he updates things >> in limma, I assumed this happened approximately 13 nanoseconds later, but >> evidently it either fell through the cracks or he had a change of mind >> (Gordon is cc'ed). >> >> But I wonder if the ID column is a better way to go anyway. >> >> Gordon - what is the safest way to use data from either topTable or >> topTableF to extract the corresponding raw data from the input object? Is >> the ID column guaranteed to always correspond to the row.names or >> featureNames of the data passed into lmFit? >> >> Best, >> >> Jim >> >>> sessionInfo() >> R version 3.0.0 (2013-04-03) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] ReportingTools_2.1.2 knitr_1.2 >> [3] lattice_0.20-15 affycoretools_1.32.0 >> [5] KEGG.db_2.9.0 GO.db_2.9.0 >> [7] AnnotationDbi_1.22.1 affy_1.38.0 >> [9] pd.ragene.1.0.st.v1_3.8.0 RSQLite_0.11.2 >> [11] DBI_0.2-5 limma_3.16.1 >> [13] oligo_1.24.0 Biobase_2.20.0 >> [15] oligoClasses_1.22.0 BiocGenerics_0.6.0 >> >> [snip] >> >> >> >> >> On 4/11/2013 9:24 PM, Jason Hackney wrote: >> >>> Hi Jim, >>> >>> Could you send me your sessionInfo? I'm having trouble replicating this >>> bug. I'm still getting probe names for topTableF and row numbers for >>> topTable, as of limma_3.16.1. I'll pop in a bug fix to the ReportingTools >>> trunk tomorrow, once I get the limma version sorted. >>> >>> Thanks, >>> >>> Jason >>> >>> On Thu, Apr 11, 2013 at 11:05 AM, James W. MacDonald <jmacdon at="" uw.edu<mailto:="">>> jmacdon at uw.edu>> wrote: >>> >>> Hi, >>> >>> I am getting an error when trying to create HTML pages with >>> ReportingTools, using an MArrayLM object as input. The error I get is >>> >>> Error in expression.dat[probe, ] : subscript out of bounds >>> >>> which appears to come from .make.gene.plots(), specifically here: >>> >>> for (probe in rownames(df)) { >>> if ("Symbol" %in% colnames(df)) { >>> ylab <- paste(df[probe, "Symbol"], ylab.type) >>> } >>> else { >>> ylab <- paste(probe, ylab.type) >>> } >>> bigplot <- stripplot(expression.dat[**probe, ] ~ factor, >>> >>> The problem being that the rownames for a topTable object will be >>> the row numbers of the MArrayLM object from whence the data came >>> (this was recently harmonized by Gordon Smyth, so the row.names >>> will always be the row number, regardless of using topTable() or >>> topTableF()). >>> >>> In other words, it appears that probe is assumed to be the row >>> name, when in fact it will be the row number. So something like >>> >>> for(probe in as.numeric(rownames(df))){ >>> >>> should do the trick. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> -- James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> >>> >>> >>> >>> -- >>> Jason A. Hackney, Ph.D. >>> Bioinformatics and Computational Biology >>> Genentech >>> hackney.jason at gene.com <mailto:hackney.jason at="" gene.com**=""> >>> 650-467-5084 >>> >>> >>> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> > > > -- > Jason A. Hackney, Ph.D. > Bioinformatics and Computational Biology > Genentech > hackney.jason at gene.com > 650-467-5084 > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 11.0 years ago Gordon Smyth 50k

0

Entering edit mode

Hi Gordon, Thanks for the very thorough comments. In my own use of limma, I never subset topTable output by rowname, so it > isn't clear to me why ReportTools needs to do this. To me, topTable > already does the desired subsetting, so further subsetting should never be > required. > We actually don't do any subsetting of the topTable or topTableF results, and are instead taking those results and further decorating them, with e.g. some simple plots of expression levels in the different treatment groups. > More generally, I am unclear why ReportTools needs to operate on a > MArrayLM object. Why not operate on the topTable() output directly? > > In prior versions, it was nice to be able to do dispatch on the object that represented the statistical test that was being done. So, for microarray differential expression we would dispatch on the MArrayLM object. This allowed the methods to have some basic functionality for the different types of analysis that we were using. Currently, it should be possible to dispatch on the topTable results directly, though with the addition of a couple of methods on our side. This is likely more in line with the direction we've been headed, since so many of the results we'd like to dispatch on don't actually have explicitly classed outputs. Now as to your suggestions from your previous e-mail: Here is my tentative solution: > 1. lmFit() will start enforcing unique row names in MArrayLM objects. If > no row names are provided, then row names will be set to 1:N. If the row > names of the expression matrix are not unique, then they will be made > unique by makeUnique(). > 2. lmFit() will no longer create an annotation column called 'ID'. Instead > the relevant information will be held in the row.names. > 2. topTable() will preserve the row.names found in the MArrayLM fit > object, same as topTableF does currently. > How does that sound? This sounds totally reasonable to me. I'm currently just tracking what you're doing with topTable and topTableF, and have no particularly strong preference on whether you rewrite them to be consistent in row.names or not. It simplifies our code a small amount, but I'm happy to either maintain as is, or to move forward with this solution. Our use cases tend to be heavily eSet-based, so our main concerns are how to provide annotation for genes when it's not available in the fData or 'genes' data.frame. > Possible cons: when rownames are non-unique, the makeUnique() function > will create rownames previously unknown to the user. Also I wonder how > many people with miss the numerical row numbers currently output by > topTable? I can see how this might be problematic when dealing with array designs with replicated features, but since we're so heavily eSet-based, it's more of a theoretical concern for us, since you can't have duplicate row.names for assay data in an eSet. Thanks again for your comments, and let me know when/if you decide to make these changes. We'll likely follow suit fairly quickly, since our package unit tests should fail once this is changed. Cheers, Jason -- Jason A. Hackney, Ph.D. Bioinformatics and Computational Biology Genentech hackney.jason@gene.com 650-467-5084 [[alternative HTML version deleted]]

ADD REPLY • link 11.0 years ago Jason Hackney ▴ 160