Hi,
I am getting an error when trying to create HTML pages with
ReportingTools, using an MArrayLM object as input. The error I get is
Error in expression.dat[probe, ] : subscript out of bounds
which appears to come from .make.gene.plots(), specifically here:
for (probe in rownames(df)) {
if ("Symbol" %in% colnames(df)) {
ylab <- paste(df[probe, "Symbol"], ylab.type)
}
else {
ylab <- paste(probe, ylab.type)
}
bigplot <- stripplot(expression.dat[probe, ] ~ factor,
The problem being that the rownames for a topTable object will be the
row numbers of the MArrayLM object from whence the data came (this was
recently harmonized by Gordon Smyth, so the row.names will always be
the
row number, regardless of using topTable() or topTableF()).
In other words, it appears that probe is assumed to be the row name,
when in fact it will be the row number. So something like
for(probe in as.numeric(rownames(df))){
should do the trick.
Best,
Jim
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
Hi Jim,
Could you send me your sessionInfo? I'm having trouble replicating
this
bug. I'm still getting probe names for topTableF and row numbers for
topTable, as of limma_3.16.1. I'll pop in a bug fix to the
ReportingTools
trunk tomorrow, once I get the limma version sorted.
Thanks,
Jason
On Thu, Apr 11, 2013 at 11:05 AM, James W. MacDonald <jmacdon@uw.edu>
wrote:
> Hi,
>
> I am getting an error when trying to create HTML pages with
> ReportingTools, using an MArrayLM object as input. The error I get
is
>
> Error in expression.dat[probe, ] : subscript out of bounds
>
> which appears to come from .make.gene.plots(), specifically here:
>
> for (probe in rownames(df)) {
> if ("Symbol" %in% colnames(df)) {
> ylab <- paste(df[probe, "Symbol"], ylab.type)
> }
> else {
> ylab <- paste(probe, ylab.type)
> }
> bigplot <- stripplot(expression.dat[**probe, ] ~ factor,
>
> The problem being that the rownames for a topTable object will be
the row
> numbers of the MArrayLM object from whence the data came (this was
recently
> harmonized by Gordon Smyth, so the row.names will always be the row
number,
> regardless of using topTable() or topTableF()).
>
> In other words, it appears that probe is assumed to be the row name,
when
> in fact it will be the row number. So something like
>
> for(probe in as.numeric(rownames(df))){
>
> should do the trick.
>
> Best,
>
> Jim
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
--
Jason A. Hackney, Ph.D.
Bioinformatics and Computational Biology
Genentech
hackney.jason@gene.com
650-467-5084
[[alternative HTML version deleted]]
Hi Jason,
I see the same thing - I had an email exchange with Gordon back in
February and he agreed that the row.names of the output from topTable
and topTableF should be the same thing, and it looked like he was
leaning towards using the row numbers. Given the speed with which he
updates things in limma, I assumed this happened approximately 13
nanoseconds later, but evidently it either fell through the cracks or
he
had a change of mind (Gordon is cc'ed).
But I wonder if the ID column is a better way to go anyway.
Gordon - what is the safest way to use data from either topTable or
topTableF to extract the corresponding raw data from the input object?
Is the ID column guaranteed to always correspond to the row.names or
featureNames of the data passed into lmFit?
Best,
Jim
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] ReportingTools_2.1.2 knitr_1.2
[3] lattice_0.20-15 affycoretools_1.32.0
[5] KEGG.db_2.9.0 GO.db_2.9.0
[7] AnnotationDbi_1.22.1 affy_1.38.0
[9] pd.ragene.1.0.st.v1_3.8.0 RSQLite_0.11.2
[11] DBI_0.2-5 limma_3.16.1
[13] oligo_1.24.0 Biobase_2.20.0
[15] oligoClasses_1.22.0 BiocGenerics_0.6.0
[snip]
On 4/11/2013 9:24 PM, Jason Hackney wrote:
> Hi Jim,
>
> Could you send me your sessionInfo? I'm having trouble replicating
> this bug. I'm still getting probe names for topTableF and row
numbers
> for topTable, as of limma_3.16.1. I'll pop in a bug fix to the
> ReportingTools trunk tomorrow, once I get the limma version sorted.
>
> Thanks,
>
> Jason
>
> On Thu, Apr 11, 2013 at 11:05 AM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote:
>
> Hi,
>
> I am getting an error when trying to create HTML pages with
> ReportingTools, using an MArrayLM object as input. The error I
get is
>
> Error in expression.dat[probe, ] : subscript out of bounds
>
> which appears to come from .make.gene.plots(), specifically
here:
>
> for (probe in rownames(df)) {
> if ("Symbol" %in% colnames(df)) {
> ylab <- paste(df[probe, "Symbol"], ylab.type)
> }
> else {
> ylab <- paste(probe, ylab.type)
> }
> bigplot <- stripplot(expression.dat[probe, ] ~ factor,
>
> The problem being that the rownames for a topTable object will
be
> the row numbers of the MArrayLM object from whence the data came
> (this was recently harmonized by Gordon Smyth, so the row.names
> will always be the row number, regardless of using topTable() or
> topTableF()).
>
> In other words, it appears that probe is assumed to be the row
> name, when in fact it will be the row number. So something like
>
> for(probe in as.numeric(rownames(df))){
>
> should do the trick.
>
> Best,
>
> Jim
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
>
>
> --
> Jason A. Hackney, Ph.D.
> Bioinformatics and Computational Biology
> Genentech
> hackney.jason at gene.com <mailto:hackney.jason at="" gene.com="">
> 650-467-5084
>
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
Hi Jim,
My concern about relying on the ID field is that it isn't always
there. For
instance, when I add featureData to an eSet, I almost always specify
that
the ID is a ProbeID for a microarray or a GeneID if I'm using some
other
identifier as my featureNames. When lmFit is called, the genes
data.frame
now doesn't have an ID column.
What I might do is try to detect the ID column in either case, and use
it
if it's present.
I expect that when/if topTableF and topTable are concordant in their
row.names I'll know about because one of my unit tests will fail
because
they are expected to be discordant.
Cheers,
Jason
On Fri, Apr 12, 2013 at 6:33 AM, James W. MacDonald <jmacdon@uw.edu>
wrote:
> Hi Jason,
>
> I see the same thing - I had an email exchange with Gordon back in
> February and he agreed that the row.names of the output from
topTable and
> topTableF should be the same thing, and it looked like he was
leaning
> towards using the row numbers. Given the speed with which he updates
things
> in limma, I assumed this happened approximately 13 nanoseconds
later, but
> evidently it either fell through the cracks or he had a change of
mind
> (Gordon is cc'ed).
>
> But I wonder if the ID column is a better way to go anyway.
>
> Gordon - what is the safest way to use data from either topTable or
> topTableF to extract the corresponding raw data from the input
object? Is
> the ID column guaranteed to always correspond to the row.names or
> featureNames of the data passed into lmFit?
>
> Best,
>
> Jim
>
> > sessionInfo()
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods
> [8] base
>
> other attached packages:
> [1] ReportingTools_2.1.2 knitr_1.2
> [3] lattice_0.20-15 affycoretools_1.32.0
> [5] KEGG.db_2.9.0 GO.db_2.9.0
> [7] AnnotationDbi_1.22.1 affy_1.38.0
> [9] pd.ragene.1.0.st.v1_3.8.0 RSQLite_0.11.2
> [11] DBI_0.2-5 limma_3.16.1
> [13] oligo_1.24.0 Biobase_2.20.0
> [15] oligoClasses_1.22.0 BiocGenerics_0.6.0
>
> [snip]
>
>
>
>
> On 4/11/2013 9:24 PM, Jason Hackney wrote:
>
>> Hi Jim,
>>
>> Could you send me your sessionInfo? I'm having trouble replicating
this
>> bug. I'm still getting probe names for topTableF and row numbers
for
>> topTable, as of limma_3.16.1. I'll pop in a bug fix to the
ReportingTools
>> trunk tomorrow, once I get the limma version sorted.
>>
>> Thanks,
>>
>> Jason
>>
>> On Thu, Apr 11, 2013 at 11:05 AM, James W. MacDonald
<jmacdon@uw.edu<mailto:>> jmacdon@uw.edu>> wrote:
>>
>> Hi,
>>
>> I am getting an error when trying to create HTML pages with
>> ReportingTools, using an MArrayLM object as input. The error I
get is
>>
>> Error in expression.dat[probe, ] : subscript out of bounds
>>
>> which appears to come from .make.gene.plots(), specifically
here:
>>
>> for (probe in rownames(df)) {
>> if ("Symbol" %in% colnames(df)) {
>> ylab <- paste(df[probe, "Symbol"], ylab.type)
>> }
>> else {
>> ylab <- paste(probe, ylab.type)
>> }
>> bigplot <- stripplot(expression.dat[**probe, ] ~
factor,
>>
>> The problem being that the rownames for a topTable object will
be
>> the row numbers of the MArrayLM object from whence the data
came
>> (this was recently harmonized by Gordon Smyth, so the row.names
>> will always be the row number, regardless of using topTable()
or
>> topTableF()).
>>
>> In other words, it appears that probe is assumed to be the row
>> name, when in fact it will be the row number. So something like
>>
>> for(probe in as.numeric(rownames(df))){
>>
>> should do the trick.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> -- James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
>>
>>
>> --
>> Jason A. Hackney, Ph.D.
>> Bioinformatics and Computational Biology
>> Genentech
>> hackney.jason@gene.com <mailto:hackney.jason@gene.com**>
>> 650-467-5084
>>
>>
>>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
--
Jason A. Hackney, Ph.D.
Bioinformatics and Computational Biology
Genentech
hackney.jason@gene.com
650-467-5084
[[alternative HTML version deleted]]
Hi Jim and Jason,
As Jim says, I didn't intend topTable() and topTableF() to be
different in
terms of row.names. The topTableF() function was written later, and I
was
more careful to carry through the official row names of the data
object
than when I wrote topTable(). I haven't fixed this discordance
because it
isn't clear to me which way is actually better.
If lmFit() worked only on ExpressionSet objects, then it would be
straightforward. All ExpressionSet objects have unique row names,
which
could be percolated through to the fitted MArrayLM object and to the
topTable output.
However lmFit() is designed to work on plain matrices and other types
of
objects. A matrix might not have row names set and, if it has, the
row
names do not have to be unique. Hence it is impossible to require in
general that the MArrayLM and the topTable objects have the same row
names. If the matrix has duplicate row names, which is perfectly
allowed,
then the MarrayLM will do so also, but topTable cannot have duplicate
row
names because it is a data.frame.
When, lmFit() gets a matrix without rownames, it leaves the row names
NULL. When the fit object gets to topTable(), topTable() adds
rownames
1:N (where N is the number of rows in the fit object).
When lmFit() gets a matrix with rownames, then it creates a data.frame
'genes' with column ID. This then becomes a column in the topTable()
output. The rownames are set to 1:N as well. Note that ID can have
duplicate values.
If lmFit() gets an ExpressionSet object with featureData, it copies
all
the featureData into the 'genes' data.frame, which becomes part of the
topTable output.
If lmFit() gets an ExpressionSet object without featureData, then it
creates a data.frame with a column ID containing the row.names, just
as it
does for matrices. This time ID cannot have duplicate values.
Note that lmFit() creates a column ID in the 'genes' data.frame only
when
row.names are provided but no other valid probe annotation is
available.
If a data.frame of annotation is provided, then lmFit() simply uses
the
columns that are provided. This is why a column ID is sometimes but
not
always available.
Here is my tentative solution:
1. lmFit() will start enforcing unique row names in MArrayLM objects.
If
no row names are provided, then row names will be set to 1:N. If the
row
names of the expression matrix are not unique, then they will be made
unique by makeUnique().
2. lmFit() will no longer create an annotation column called 'ID'.
Instead the relevant information will be held in the row.names.
2. topTable() will preserve the row.names found in the MArrayLM fit
object, same as topTableF does currently.
How does that sound?
Possible cons: when rownames are non-unique, the makeUnique()
function
will create rownames previously unknown to the user. Also I wonder
how
many people with miss the numerical row numbers currently output by
topTable?
Regards
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Tel: (03) 9345 2326, Fax (03) 9347 0852,
http://www.statsci.org/smyth
On Fri, 12 Apr 2013, Jason Hackney wrote:
> Hi Jim,
>
> My concern about relying on the ID field is that it isn't always
there. For
> instance, when I add featureData to an eSet, I almost always specify
that
> the ID is a ProbeID for a microarray or a GeneID if I'm using some
other
> identifier as my featureNames. When lmFit is called, the genes
data.frame
> now doesn't have an ID column.
>
> What I might do is try to detect the ID column in either case, and
use it
> if it's present.
>
> I expect that when/if topTableF and topTable are concordant in their
> row.names I'll know about because one of my unit tests will fail
because
> they are expected to be discordant.
>
> Cheers,
>
> Jason
>
> On Fri, Apr 12, 2013 at 6:33 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote:
>
>> Hi Jason,
>>
>> I see the same thing - I had an email exchange with Gordon back in
>> February and he agreed that the row.names of the output from
topTable and
>> topTableF should be the same thing, and it looked like he was
leaning
>> towards using the row numbers. Given the speed with which he
updates things
>> in limma, I assumed this happened approximately 13 nanoseconds
later, but
>> evidently it either fell through the cracks or he had a change of
mind
>> (Gordon is cc'ed).
>>
>> But I wonder if the ID column is a better way to go anyway.
>>
>> Gordon - what is the safest way to use data from either topTable or
>> topTableF to extract the corresponding raw data from the input
object? Is
>> the ID column guaranteed to always correspond to the row.names or
>> featureNames of the data passed into lmFit?
>>
>> Best,
>>
>> Jim
>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
methods
>> [8] base
>>
>> other attached packages:
>> [1] ReportingTools_2.1.2 knitr_1.2
>> [3] lattice_0.20-15 affycoretools_1.32.0
>> [5] KEGG.db_2.9.0 GO.db_2.9.0
>> [7] AnnotationDbi_1.22.1 affy_1.38.0
>> [9] pd.ragene.1.0.st.v1_3.8.0 RSQLite_0.11.2
>> [11] DBI_0.2-5 limma_3.16.1
>> [13] oligo_1.24.0 Biobase_2.20.0
>> [15] oligoClasses_1.22.0 BiocGenerics_0.6.0
>>
>> [snip]
>>
>>
>>
>>
>> On 4/11/2013 9:24 PM, Jason Hackney wrote:
>>
>>> Hi Jim,
>>>
>>> Could you send me your sessionInfo? I'm having trouble replicating
this
>>> bug. I'm still getting probe names for topTableF and row numbers
for
>>> topTable, as of limma_3.16.1. I'll pop in a bug fix to the
ReportingTools
>>> trunk tomorrow, once I get the limma version sorted.
>>>
>>> Thanks,
>>>
>>> Jason
>>>
>>> On Thu, Apr 11, 2013 at 11:05 AM, James W. MacDonald <jmacdon at="" uw.edu<mailto:="">>> jmacdon at uw.edu>> wrote:
>>>
>>> Hi,
>>>
>>> I am getting an error when trying to create HTML pages with
>>> ReportingTools, using an MArrayLM object as input. The error I
get is
>>>
>>> Error in expression.dat[probe, ] : subscript out of bounds
>>>
>>> which appears to come from .make.gene.plots(), specifically
here:
>>>
>>> for (probe in rownames(df)) {
>>> if ("Symbol" %in% colnames(df)) {
>>> ylab <- paste(df[probe, "Symbol"], ylab.type)
>>> }
>>> else {
>>> ylab <- paste(probe, ylab.type)
>>> }
>>> bigplot <- stripplot(expression.dat[**probe, ] ~
factor,
>>>
>>> The problem being that the rownames for a topTable object will
be
>>> the row numbers of the MArrayLM object from whence the data
came
>>> (this was recently harmonized by Gordon Smyth, so the
row.names
>>> will always be the row number, regardless of using topTable()
or
>>> topTableF()).
>>>
>>> In other words, it appears that probe is assumed to be the row
>>> name, when in fact it will be the row number. So something
like
>>>
>>> for(probe in as.numeric(rownames(df))){
>>>
>>> should do the trick.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>> -- James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>>>
>>>
>>>
>>> --
>>> Jason A. Hackney, Ph.D.
>>> Bioinformatics and Computational Biology
>>> Genentech
>>> hackney.jason at gene.com <mailto:hackney.jason at="" gene.com**="">
>>> 650-467-5084
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Dear Jason,
In my own use of limma, I never subset topTable output by rowname, so
it
isn't clear to me why ReportTools needs to do this. To me, topTable
already does the desired subsetting, so further subsetting should
never be
required.
More generally, I am unclear why ReportTools needs to operate on a
MArrayLM object. Why not operate on the topTable() output directly?
Regards
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth
On Fri, 12 Apr 2013, Jason Hackney wrote:
> Hi Jim,
>
> My concern about relying on the ID field is that it isn't always
there. For
> instance, when I add featureData to an eSet, I almost always specify
that
> the ID is a ProbeID for a microarray or a GeneID if I'm using some
other
> identifier as my featureNames. When lmFit is called, the genes
data.frame
> now doesn't have an ID column.
>
> What I might do is try to detect the ID column in either case, and
use it
> if it's present.
>
> I expect that when/if topTableF and topTable are concordant in their
> row.names I'll know about because one of my unit tests will fail
because
> they are expected to be discordant.
>
> Cheers,
>
> Jason
>
> On Fri, Apr 12, 2013 at 6:33 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote:
>
>> Hi Jason,
>>
>> I see the same thing - I had an email exchange with Gordon back in
>> February and he agreed that the row.names of the output from
topTable and
>> topTableF should be the same thing, and it looked like he was
leaning
>> towards using the row numbers. Given the speed with which he
updates things
>> in limma, I assumed this happened approximately 13 nanoseconds
later, but
>> evidently it either fell through the cracks or he had a change of
mind
>> (Gordon is cc'ed).
>>
>> But I wonder if the ID column is a better way to go anyway.
>>
>> Gordon - what is the safest way to use data from either topTable or
>> topTableF to extract the corresponding raw data from the input
object? Is
>> the ID column guaranteed to always correspond to the row.names or
>> featureNames of the data passed into lmFit?
>>
>> Best,
>>
>> Jim
>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
methods
>> [8] base
>>
>> other attached packages:
>> [1] ReportingTools_2.1.2 knitr_1.2
>> [3] lattice_0.20-15 affycoretools_1.32.0
>> [5] KEGG.db_2.9.0 GO.db_2.9.0
>> [7] AnnotationDbi_1.22.1 affy_1.38.0
>> [9] pd.ragene.1.0.st.v1_3.8.0 RSQLite_0.11.2
>> [11] DBI_0.2-5 limma_3.16.1
>> [13] oligo_1.24.0 Biobase_2.20.0
>> [15] oligoClasses_1.22.0 BiocGenerics_0.6.0
>>
>> [snip]
>>
>>
>>
>>
>> On 4/11/2013 9:24 PM, Jason Hackney wrote:
>>
>>> Hi Jim,
>>>
>>> Could you send me your sessionInfo? I'm having trouble replicating
this
>>> bug. I'm still getting probe names for topTableF and row numbers
for
>>> topTable, as of limma_3.16.1. I'll pop in a bug fix to the
ReportingTools
>>> trunk tomorrow, once I get the limma version sorted.
>>>
>>> Thanks,
>>>
>>> Jason
>>>
>>> On Thu, Apr 11, 2013 at 11:05 AM, James W. MacDonald <jmacdon at="" uw.edu<mailto:="">>> jmacdon at uw.edu>> wrote:
>>>
>>> Hi,
>>>
>>> I am getting an error when trying to create HTML pages with
>>> ReportingTools, using an MArrayLM object as input. The error I
get is
>>>
>>> Error in expression.dat[probe, ] : subscript out of bounds
>>>
>>> which appears to come from .make.gene.plots(), specifically
here:
>>>
>>> for (probe in rownames(df)) {
>>> if ("Symbol" %in% colnames(df)) {
>>> ylab <- paste(df[probe, "Symbol"], ylab.type)
>>> }
>>> else {
>>> ylab <- paste(probe, ylab.type)
>>> }
>>> bigplot <- stripplot(expression.dat[**probe, ] ~
factor,
>>>
>>> The problem being that the rownames for a topTable object will
be
>>> the row numbers of the MArrayLM object from whence the data
came
>>> (this was recently harmonized by Gordon Smyth, so the
row.names
>>> will always be the row number, regardless of using topTable()
or
>>> topTableF()).
>>>
>>> In other words, it appears that probe is assumed to be the row
>>> name, when in fact it will be the row number. So something
like
>>>
>>> for(probe in as.numeric(rownames(df))){
>>>
>>> should do the trick.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>> -- James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>>>
>>>
>>>
>>> --
>>> Jason A. Hackney, Ph.D.
>>> Bioinformatics and Computational Biology
>>> Genentech
>>> hackney.jason at gene.com <mailto:hackney.jason at="" gene.com**="">
>>> 650-467-5084
>>>
>>>
>>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
>
>
> --
> Jason A. Hackney, Ph.D.
> Bioinformatics and Computational Biology
> Genentech
> hackney.jason at gene.com
> 650-467-5084
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Hi Gordon,
Thanks for the very thorough comments.
In my own use of limma, I never subset topTable output by rowname, so
it
> isn't clear to me why ReportTools needs to do this. To me, topTable
> already does the desired subsetting, so further subsetting should
never be
> required.
>
We actually don't do any subsetting of the topTable or topTableF
results,
and are instead taking those results and further decorating them, with
e.g.
some simple plots of expression levels in the different treatment
groups.
> More generally, I am unclear why ReportTools needs to operate on a
> MArrayLM object. Why not operate on the topTable() output directly?
>
>
In prior versions, it was nice to be able to do dispatch on the object
that
represented the statistical test that was being done. So, for
microarray
differential expression we would dispatch on the MArrayLM object. This
allowed the methods to have some basic functionality for the different
types of analysis that we were using. Currently, it should be possible
to
dispatch on the topTable results directly, though with the addition of
a
couple of methods on our side. This is likely more in line with the
direction we've been headed, since so many of the results we'd like to
dispatch on don't actually have explicitly classed outputs.
Now as to your suggestions from your previous e-mail:
Here is my tentative solution:
> 1. lmFit() will start enforcing unique row names in MArrayLM
objects. If
> no row names are provided, then row names will be set to 1:N. If
the row
> names of the expression matrix are not unique, then they will be
made
> unique by makeUnique().
> 2. lmFit() will no longer create an annotation column called 'ID'.
Instead
> the relevant information will be held in the row.names.
> 2. topTable() will preserve the row.names found in the MArrayLM fit
> object, same as topTableF does currently.
> How does that sound?
This sounds totally reasonable to me. I'm currently just tracking what
you're doing with topTable and topTableF, and have no particularly
strong
preference on whether you rewrite them to be consistent in row.names
or
not. It simplifies our code a small amount, but I'm happy to either
maintain as is, or to move forward with this solution. Our use cases
tend
to be heavily eSet-based, so our main concerns are how to provide
annotation for genes when it's not available in the fData or 'genes'
data.frame.
> Possible cons: when rownames are non-unique, the makeUnique()
function
> will create rownames previously unknown to the user. Also I wonder
how
> many people with miss the numerical row numbers currently output by
> topTable?
I can see how this might be problematic when dealing with array
designs
with replicated features, but since we're so heavily eSet-based, it's
more
of a theoretical concern for us, since you can't have duplicate
row.names
for assay data in an eSet.
Thanks again for your comments, and let me know when/if you decide to
make
these changes. We'll likely follow suit fairly quickly, since our
package
unit tests should fail once this is changed.
Cheers,
Jason
--
Jason A. Hackney, Ph.D.
Bioinformatics and Computational Biology
Genentech
hackney.jason@gene.com
650-467-5084
[[alternative HTML version deleted]]