Hi netter!
In most microarray slides a single gene will be represented by
multiple
items. Sometimes it's unforseable because they have different genbank
accession numbers and you will not find them until you get a unigene
list
for all your gene items.
Now I have a dataframe . The rows are gene records(accession number,
unigene ID and expression values in different conditions) ; the 1st
column
is genbank accession numbers, the 2nd column is unigene IDs, from 3rd
column on are different conditions). All the accession numbers are
unique,
but through unigene IDs i can find that some items, though with
different
accession numbers, are in fact sharing the same unigene ID. I would
like to
find the gene records containing replicate unigene IDs and merge them
into
one record by averaging different expression values in the same
condition.
Could anyone give me a clue about how to write the code? Or are there
any
contributed functions can do this stuff?
Thanks a lot!
I'm not entirely sure this will work in it's current form. I've
adapted it from a routine I use to do this with expression sets, so
maybe some typecasting or transformation to the proper classtypes is
needed. Your data is in the dataf variable
mean.row<-function(rows) {if (length(rows)==1) ex[rows,] else
apply(ex[rows,],2,mean,na.rm=TRUE)}
# Select Vector of unigene ids that are in data and have correct
(non-empty) mapping
geneIds<-dataf[rownames(dataf),2]
geneIds<-geneIds[geneIds!=""]
# subset the expression values
ex<-dataf[,c(-1,-2)]
# make a list that contains combined rownames for each unigene id
newrows<-split(names(geneIds),geneIds)
# the t() is needed because the dimensions seem to come out wrong of
sapply
exn<-t(sapply(newrows,mean.row))
# Put the unigene Ids in the result
cbind(names(newrows),exn) # or rownames(exn)<-names(newrows)
Jan Oosting
> -----Original Message-----
> From: bioconductor-bounces@stat.math.ethz.ch
> [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of zhihua
li
> Sent: woensdag 16 maart 2005 08:33
> To: bioconductor@stat.math.ethz.ch
> Subject: [BioC] finding and averaging replicate gene records
>
>
> Hi netter!
>
> In most microarray slides a single gene will be represented
> by multiple
> items. Sometimes it's unforseable because they have different
genbank
> accession numbers and you will not find them until you get a
> unigene list
> for all your gene items.
>
> Now I have a dataframe . The rows are gene records(accession number,
> unigene ID and expression values in different conditions) ;
> the 1st column
> is genbank accession numbers, the 2nd column is unigene IDs, from
3rd
> column on are different conditions). All the accession
> numbers are unique,
> but through unigene IDs i can find that some items, though
> with different
> accession numbers, are in fact sharing the same unigene ID. I
> would like to
> find the gene records containing replicate unigene IDs and
> merge them into
> one record by averaging different expression values in the
> same condition.
>
> Could anyone give me a clue about how to write the code? Or
> are there any
> contributed functions can do this stuff?
>
> Thanks a lot!
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
Try aggregate() or tapply(). See example below where "A" is repeated
twice.
m <- cbind.data.frame( ID=c("A", "B", "A", "C"), array1=1:4,
array2=5:8 )
m
ID array1 array2
1 A 1 5
2 B 2 6
3 A 3 7
4 C 4 8
aggregate(m[ ,-1], list(GENE=m$ID), mean, na.rm=TRUE)
GENE array1 array2
1 A 2 6
2 B 2 6
3 C 4 8
On Wed, 2005-03-16 at 09:33 +0100, Oosting, J. (PATH) wrote:
> I'm not entirely sure this will work in it's current form. I've
adapted it from a routine I use to do this with expression sets, so
maybe some typecasting or transformation to the proper classtypes is
needed. Your data is in the dataf variable
>
>
> mean.row<-function(rows) {if (length(rows)==1) ex[rows,] else
apply(ex[rows,],2,mean,na.rm=TRUE)}
> # Select Vector of unigene ids that are in data and have correct
(non-empty) mapping
> geneIds<-dataf[rownames(dataf),2]
> geneIds<-geneIds[geneIds!=""]
> # subset the expression values
> ex<-dataf[,c(-1,-2)]
> # make a list that contains combined rownames for each unigene id
> newrows<-split(names(geneIds),geneIds)
> # the t() is needed because the dimensions seem to come out wrong
of sapply
> exn<-t(sapply(newrows,mean.row))
> # Put the unigene Ids in the result
> cbind(names(newrows),exn) # or rownames(exn)<-names(newrows)
>
> Jan Oosting
>
>
> > -----Original Message-----
> > From: bioconductor-bounces@stat.math.ethz.ch
> > [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of zhihua
li
> > Sent: woensdag 16 maart 2005 08:33
> > To: bioconductor@stat.math.ethz.ch
> > Subject: [BioC] finding and averaging replicate gene records
> >
> >
> > Hi netter!
> >
> > In most microarray slides a single gene will be represented
> > by multiple
> > items. Sometimes it's unforseable because they have different
genbank
> > accession numbers and you will not find them until you get a
> > unigene list
> > for all your gene items.
> >
> > Now I have a dataframe . The rows are gene records(accession
number,
> > unigene ID and expression values in different conditions) ;
> > the 1st column
> > is genbank accession numbers, the 2nd column is unigene IDs, from
3rd
> > column on are different conditions). All the accession
> > numbers are unique,
> > but through unigene IDs i can find that some items, though
> > with different
> > accession numbers, are in fact sharing the same unigene ID. I
> > would like to
> > find the gene records containing replicate unigene IDs and
> > merge them into
> > one record by averaging different expression values in the
> > same condition.
> >
> > Could anyone give me a clue about how to write the code? Or
> > are there any
> > contributed functions can do this stuff?
> >
> > Thanks a lot!
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
On Mar 16, 2005, at 2:33 AM, zhihua li wrote:
> Hi netter!
>
> In most microarray slides a single gene will be represented by
> multiple items. Sometimes it's unforseable because they have
different
> genbank accession numbers and you will not find them until you get a
> unigene list for all your gene items.
>
> Now I have a dataframe . The rows are gene records(accession number,
> unigene ID and expression values in different conditions) ; the 1st
> column is genbank accession numbers, the 2nd column is unigene IDs,
> from 3rd column on are different conditions). All the accession
> numbers are unique, but through unigene IDs i can find that some
> items, though with different accession numbers, are in fact sharing
> the same unigene ID. I would like to find the gene records
containing
> replicate unigene IDs and merge them into one record by averaging
> different expression values in the same condition.
>
> Could anyone give me a clue about how to write the code? Or are
there
> any contributed functions can do this stuff?
>
I generally do NOT do this. While it seems that there should be one
gene/one value, we know that this isn't generally true in practice.
You gain little by averaging by having a few fewer genes to go into
multiple-testing correction, but you stand to lose a huge amount. In
the worst-case scenario, you take a "differentially-expressed" probe
and average it with a poor-performing probe, and end up not finding
the
gene of interest. If you do not merge those probes, you find one
probe
representing the gene IS differentially-expressed and the other is
not.
You, of course, have to determine why the two probes for the same
gene
behave differently, but there are many explanations including things
like probe sequence contamination, transcript variants, array-specific
effects (like non-uniform background, etc.), and faulty bioinformatics
(Unigene may place two sequences for different genes into the same
cluster, for example).
In short, you probably agree that you want to find ALL genes of
interest and then use biologic validation where necessary to determine
the relevance of your found genes. However, veraging expression
values
per gene nearly guarantees that you will sometimes miss genes of
interest and so is, in my opinion, not warranted.
Sean
On Mar 16, 2005, at 2:33 AM, zhihua li wrote:
> Hi netter!
>
> In most microarray slides a single gene will be represented by
> multiple items. Sometimes it's unforseable because they have
different
> genbank accession numbers and you will not find them until you get a
> unigene list for all your gene items.
>
> Now I have a dataframe . The rows are gene records(accession number,
> unigene ID and expression values in different conditions) ; the 1st
> column is genbank accession numbers, the 2nd column is unigene IDs,
> from 3rd column on are different conditions). All the accession
> numbers are unique, but through unigene IDs i can find that some
> items, though with different accession numbers, are in fact sharing
> the same unigene ID. I would like to find the gene records
containing
> replicate unigene IDs and merge them into one record by averaging
> different expression values in the same condition.
>
> Could anyone give me a clue about how to write the code? Or are
there
> any contributed functions can do this stuff?
If, after my last email, you still want to do this, look at
?aggregate.
#set up example
> df <-
data.frame(unigene=rep(c(letters[1:20]),5),matrix(rnorm(500),ncol=5))
> dim(df)
[1] 100 6
> df[1:5,]
unigene X1 X2 X3 X4 X5
1 a 0.30812107 -0.5310621 -0.9040957 0.7344379 -0.3356904
2 b -0.02764356 0.6196045 -1.2049073 1.3074086 1.7878118
3 c 0.79936647 -0.3430772 1.3319157 -0.1716195 1.5824703
4 d -1.52298039 0.7400511 1.6654934 -0.4796782 -1.6517931
5 e 0.20252950 0.6735963 -0.8631246 -1.2338265 0.8597014
# Aggregate the array values by "unigene" using mean.
> df.unigene <- aggregate(df[,2:6],by=list(df$unigene),mean)
> df.unigene
Group.1 X1 X2 X3 X4
X5
1 a 0.27894974 0.3096306 -0.157369445 -0.02390716 -0.79865210
2 b -0.04005511 0.2069963 0.058276319 0.37695956 0.58892920
3 c 0.53853115 -0.7227620 0.542803169 0.72844079 0.33116364
4 d 0.04374438 -0.3302130 1.492462908 -0.19048229 -0.90463987
5 e -0.22403553 0.5079245 0.627224848 -1.30206042 -0.16849414
6 f -0.41708465 -0.9070749 0.133871146 -0.21337473 -0.20061087
7 g -0.38204229 0.6069678 0.050874510 -0.29334777 -0.11172384
8 h 0.58768574 -0.4863774 0.120376561 -0.31349966 -0.23951493
9 i -0.80005434 -0.3891139 -0.001995542 -0.17148142 0.06971404
10 j -0.35626038 0.8415595 -0.207348416 0.03932772 -0.09372701
11 k -0.30889392 -1.0870044 -0.447545956 -0.48184160 -0.10491062
12 l -0.47169100 -0.1602827 1.084106985 -0.26736429 0.08239815
13 m -0.12285248 -0.4367895 0.354743839 0.10013901 0.42580119
14 n -0.17691859 -0.8934232 0.399016113 0.73876068 0.61432185
15 o -0.08250122 0.6402547 0.029047584 -0.30060666 0.36726071
16 p -0.20336659 0.2853576 -0.272979841 -0.57747797 0.24284977
17 q 0.00947679 -0.3849657 -0.198965209 -0.38048787 -0.87557376
18 r 0.30445158 0.4110414 0.181761757 -0.21715431 0.23009438
19 s -0.30325431 -0.1010338 -0.298426526 -1.23178516 -0.37827590
20 t -0.30316005 -0.4389324 -1.050242565 0.12818715 -0.31785596
> dim(df.unigene)
[1] 20 6
Agreeing with Sean here, in my last experience where I had to reduce
each
gene to a single metric, using Affy data I found that taking the probe
set
with the maximum average value across all chips in the dataset worked
well
[e.g. in two group situations the resulting choices tended to be probe
sets
with smaller (if not the smallest) P values].
Tom
----- Original Message -----
From: "Sean Davis" <sdavis2@mail.nih.gov>
To: "zhihua li" <lzhtom@hotmail.com>
Cc: <bioconductor@stat.math.ethz.ch>
Sent: Wednesday, March 16, 2005 6:51 AM
Subject: Re: [BioC] finding and averaging replicate gene records
>
> On Mar 16, 2005, at 2:33 AM, zhihua li wrote:
>
>> Hi netter!
>>
>> In most microarray slides a single gene will be represented by
multiple
>> items. Sometimes it's unforseable because they have different
genbank
>> accession numbers and you will not find them until you get a
unigene list
>> for all your gene items.
>>
>> Now I have a dataframe . The rows are gene records(accession
number,
>> unigene ID and expression values in different conditions) ; the 1st
>> column is genbank accession numbers, the 2nd column is unigene IDs,
from
>> 3rd column on are different conditions). All the accession numbers
are
>> unique, but through unigene IDs i can find that some items, though
with
>> different accession numbers, are in fact sharing the same unigene
ID. I
>> would like to find the gene records containing replicate unigene
IDs and
>> merge them into one record by averaging different expression values
in
>> the same condition.
>>
>> Could anyone give me a clue about how to write the code? Or are
there any
>> contributed functions can do this stuff?
>>
>
> I generally do NOT do this. While it seems that there should be one
> gene/one value, we know that this isn't generally true in practice.
You
> gain little by averaging by having a few fewer genes to go into
> multiple-testing correction, but you stand to lose a huge amount.
In the
> worst-case scenario, you take a "differentially-expressed" probe and
> average it with a poor-performing probe, and end up not finding the
gene
> of interest. If you do not merge those probes, you find one probe
> representing the gene IS differentially-expressed and the other is
not.
> You, of course, have to determine why the two probes for the same
gene
> behave differently, but there are many explanations including things
like
> probe sequence contamination, transcript variants, array-specific
effects
> (like non-uniform background, etc.), and faulty bioinformatics
(Unigene
> may place two sequences for different genes into the same cluster,
for
> example).
>
> In short, you probably agree that you want to find ALL genes of
interest
> and then use biologic validation where necessary to determine the
> relevance of your found genes. However, veraging expression values
per
> gene nearly guarantees that you will sometimes miss genes of
interest and
> so is, in my opinion, not warranted.
>
> Sean
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
On Mar 16, 2005, at 8:31 AM, Tomas Radivoyevitch wrote:
> Agreeing with Sean here, in my last experience where I had to reduce
> each gene to a single metric, using Affy data I found that taking
the
> probe set with the maximum average value across all chips in the
> dataset worked well [e.g. in two group situations the resulting
> choices tended to be probe sets with smaller (if not the smallest) P
> values].
This may work well with Affy, where lower values are perhaps less
"stable" than higher values, but I'm not sure it would work in every
situation. For example, on other platforms, the maximum average spot
may signify scanner saturation. Moving to ratios, choosing the genes
with the highest (or lowest) ratio may signify lack of expression (or
saturation for lowest ratio) in the reference sample; in neither case
would these genes be "believable" and perhaps another probe for the
same gene might point that out.
Seeing Tomas's point, if one does go ahead and summarize probes into
genes, caution must be exercised to choose the appropriate summary
measure and note should be made that such summaries might produce bias
in the genes found (and more importantly, validated, or not).
Sean
Agreed, my statements are strictly for Affy data.
As a separate remark, one thing I liked about using the maximum
average,
rather than say a P value to pick out the probe set to focus on, is
that the
rule can be applied across different designs without concerns of
statistical
assumptions and choices of tests. For example, I also used maximum
averages
to pick out "useful" probe sets for one group time course data.
Tom
----- Original Message -----
From: "Sean Davis" <sdavis2@mail.nih.gov>
To: "Tomas Radivoyevitch" <radivot@hal.epbi.cwru.edu>
Cc: <bioconductor@stat.math.ethz.ch>
Sent: Wednesday, March 16, 2005 8:48 AM
Subject: Re: [BioC] finding and averaging replicate gene records
>
> On Mar 16, 2005, at 8:31 AM, Tomas Radivoyevitch wrote:
>
>> Agreeing with Sean here, in my last experience where I had to
reduce each
>> gene to a single metric, using Affy data I found that taking the
probe
>> set with the maximum average value across all chips in the dataset
worked
>> well [e.g. in two group situations the resulting choices tended to
be
>> probe sets with smaller (if not the smallest) P values].
>
> This may work well with Affy, where lower values are perhaps less
"stable"
> than higher values, but I'm not sure it would work in every
situation.
> For example, on other platforms, the maximum average spot may
signify
> scanner saturation. Moving to ratios, choosing the genes with the
highest
> (or lowest) ratio may signify lack of expression (or saturation for
lowest
> ratio) in the reference sample; in neither case would these genes be
> "believable" and perhaps another probe for the same gene might point
that
> out.
>
> Seeing Tomas's point, if one does go ahead and summarize probes into
> genes, caution must be exercised to choose the appropriate summary
measure
> and note should be made that such summaries might produce bias in
the
> genes found (and more importantly, validated, or not).
>
> Sean
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
Thanks to all your reply.
It is true that by averaging expression values for (putatively) the
same
gene we will lose some information. But sometimes it's the reduction
of the
data size that is more favorable. Especially when one is trying to
perform
a computation-consuming algorithm to one's data. So I think maybe
sometimes
it's worthy to do averaging.
Thanks again!
>From: "Tomas Radivoyevitch" <radivot@hal.epbi.cwru.edu>
>To: "Sean Davis" <sdavis2@mail.nih.gov>, "zhihua li"
<lzhtom@hotmail.com>
>CC: <bioconductor@stat.math.ethz.ch>
>Subject: Re: [BioC] finding and averaging replicate gene records
>Date: Wed, 16 Mar 2005 08:31:14 -0500
>
>Agreeing with Sean here, in my last experience where I had to reduce
>each gene to a single metric, using Affy data I found that taking
>the probe set with the maximum average value across all chips in the
>dataset worked well [e.g. in two group situations the resulting
>choices tended to be probe sets with smaller (if not the smallest) P
>values].
>
>Tom
>
>----- Original Message ----- From: "Sean Davis"
><sdavis2@mail.nih.gov>
>To: "zhihua li" <lzhtom@hotmail.com>
>Cc: <bioconductor@stat.math.ethz.ch>
>Sent: Wednesday, March 16, 2005 6:51 AM
>Subject: Re: [BioC] finding and averaging replicate gene records
>
>
>>
>>On Mar 16, 2005, at 2:33 AM, zhihua li wrote:
>>
>>>Hi netter!
>>>
>>>In most microarray slides a single gene will be represented by
>>>multiple items. Sometimes it's unforseable because they have
>>>different genbank accession numbers and you will not find them
>>>until you get a unigene list for all your gene items.
>>>
>>>Now I have a dataframe . The rows are gene records(accession
>>>number, unigene ID and expression values in different conditions)
>>>; the 1st column is genbank accession numbers, the 2nd column is
>>>unigene IDs, from 3rd column on are different conditions). All the
>>>accession numbers are unique, but through unigene IDs i can find
>>>that some items, though with different accession numbers, are in
>>>fact sharing the same unigene ID. I would like to find the gene
>>>records containing replicate unigene IDs and merge them into one
>>>record by averaging different expression values in the same
>>>condition.
>>>
>>>Could anyone give me a clue about how to write the code? Or are
>>>there any contributed functions can do this stuff?
>>>
>>
>>I generally do NOT do this. While it seems that there should be
>>one gene/one value, we know that this isn't generally true in
>>practice. You gain little by averaging by having a few fewer genes
>>to go into multiple-testing correction, but you stand to lose a
>>huge amount. In the worst-case scenario, you take a
>>"differentially-expressed" probe and average it with a
>>poor-performing probe, and end up not finding the gene of interest.
>> If you do not merge those probes, you find one probe representing
>>the gene IS differentially-expressed and the other is not. You, of
>>course, have to determine why the two probes for the same gene
>>behave differently, but there are many explanations including
>>things like probe sequence contamination, transcript variants,
>>array-specific effects (like non-uniform background, etc.), and
>>faulty bioinformatics (Unigene may place two sequences for
>>different genes into the same cluster, for example).
>>
>>In short, you probably agree that you want to find ALL genes of
>>interest and then use biologic validation where necessary to
>>determine the relevance of your found genes. However, veraging
>>expression values per gene nearly guarantees that you will
>>sometimes miss genes of interest and so is, in my opinion, not
>>warranted.
>>
>>Sean
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor@stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>
>
Not only will you lose information but you might obtain the wrong
information ! If one has a foot in a bucket of freezing ice and the
other in a bucket of boiling water, then he _should_ be comfortable at
50 degree Celsius on average.
I had a look into the HGU-133A plus 2 CDF which has 54675 probesets of
which 47297 had unigene ID mapping. These were the distribution of
unigene ID occurrence.
1 2 3 4 5 6 7 8
12590 5501 2815 1508 741 384 170 106
9 10 11 12 13 14 15 19
45 27 18 8 7 3 4 1
( That means 12590 probesets are represented once on the arrays, 5501
probesets represented twice, ..., 1 probeset is represent 19 times. )
In short you can reduce from 47297 to 23929 unique genes. Add the 7378
without unigene ID and your final reduced dataset has 31307 rows.
I do think that the computational savings for working with 31307 rows
instead of 54675 rows justifies the possibility of average important
genes with noisy ones. Besides, unigene ID changes every couple of
months and you may have to do your analysis over and over again
thereby
diminishing any computational savings you may have had.
I am in favour of approaches that works on the summary statistics
(e.g.
minimum p-value for a unigene ID).
Regards, Adai
On Thu, 2005-03-17 at 03:19 +0000, zhihua li wrote:
> Thanks to all your reply.
>
> It is true that by averaging expression values for (putatively) the
same
> gene we will lose some information. But sometimes it's the reduction
of the
> data size that is more favorable. Especially when one is trying to
perform
> a computation-consuming algorithm to one's data. So I think maybe
sometimes
> it's worthy to do averaging.
>
> Thanks again!
>
> >From: "Tomas Radivoyevitch" <radivot@hal.epbi.cwru.edu>
> >To: "Sean Davis" <sdavis2@mail.nih.gov>, "zhihua li"
<lzhtom@hotmail.com>
> >CC: <bioconductor@stat.math.ethz.ch>
> >Subject: Re: [BioC] finding and averaging replicate gene records
> >Date: Wed, 16 Mar 2005 08:31:14 -0500
> >
> >Agreeing with Sean here, in my last experience where I had to
reduce
> >each gene to a single metric, using Affy data I found that taking
> >the probe set with the maximum average value across all chips in
the
> >dataset worked well [e.g. in two group situations the resulting
> >choices tended to be probe sets with smaller (if not the smallest)
P
> >values].
> >
> >Tom
> >
> >----- Original Message ----- From: "Sean Davis"
> ><sdavis2@mail.nih.gov>
> >To: "zhihua li" <lzhtom@hotmail.com>
> >Cc: <bioconductor@stat.math.ethz.ch>
> >Sent: Wednesday, March 16, 2005 6:51 AM
> >Subject: Re: [BioC] finding and averaging replicate gene records
> >
> >
> >>
> >>On Mar 16, 2005, at 2:33 AM, zhihua li wrote:
> >>
> >>>Hi netter!
> >>>
> >>>In most microarray slides a single gene will be represented by
> >>>multiple items. Sometimes it's unforseable because they have
> >>>different genbank accession numbers and you will not find them
> >>>until you get a unigene list for all your gene items.
> >>>
> >>>Now I have a dataframe . The rows are gene records(accession
> >>>number, unigene ID and expression values in different conditions)
> >>>; the 1st column is genbank accession numbers, the 2nd column is
> >>>unigene IDs, from 3rd column on are different conditions). All
the
> >>>accession numbers are unique, but through unigene IDs i can find
> >>>that some items, though with different accession numbers, are in
> >>>fact sharing the same unigene ID. I would like to find the gene
> >>>records containing replicate unigene IDs and merge them into one
> >>>record by averaging different expression values in the same
> >>>condition.
> >>>
> >>>Could anyone give me a clue about how to write the code? Or are
> >>>there any contributed functions can do this stuff?
> >>>
> >>
> >>I generally do NOT do this. While it seems that there should be
> >>one gene/one value, we know that this isn't generally true in
> >>practice. You gain little by averaging by having a few fewer
genes
> >>to go into multiple-testing correction, but you stand to lose a
> >>huge amount. In the worst-case scenario, you take a
> >>"differentially-expressed" probe and average it with a
> >>poor-performing probe, and end up not finding the gene of
interest.
> >> If you do not merge those probes, you find one probe
representing
> >>the gene IS differentially-expressed and the other is not. You, of
> >>course, have to determine why the two probes for the same gene
> >>behave differently, but there are many explanations including
> >>things like probe sequence contamination, transcript variants,
> >>array-specific effects (like non-uniform background, etc.), and
> >>faulty bioinformatics (Unigene may place two sequences for
> >>different genes into the same cluster, for example).
> >>
> >>In short, you probably agree that you want to find ALL genes of
> >>interest and then use biologic validation where necessary to
> >>determine the relevance of your found genes. However, veraging
> >>expression values per gene nearly guarantees that you will
> >>sometimes miss genes of interest and so is, in my opinion, not
> >>warranted.
> >>
> >>Sean
> >>
> >>_______________________________________________
> >>Bioconductor mailing list
> >>Bioconductor@stat.math.ethz.ch
> >>https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>
> >
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>