Analysis HumanHT-12 Illumina
1
0
Entering edit mode
@stefanie-figura-3569
Last seen 10.4 years ago
Dear all, I have some problems with the analysis of the HumanHT-12 Chip from Illumina and hope somebody can help me. I have been analysing the data using the GenomeStudio Software until now. Due to the fact, that some of the bead types are underrepresented on the array, illumina implemented a so called “imputing function”. The Techsupport told me that it would not make a big difference if the imputing function is used or not. While comparing the results using both methods (no imputing vs imputing), I found that the “imputing function” leads to more than twice as many differentially expressed genes (167 vs 396). I was wondering, if there is any analog function or package implemented in R & Bioconductor ? Any kind of advice is welcome. Thank you very much in advance! Kind regards, Stefanie ---------------------------------------------------------------------- ------ --- Dipl.Chem. Stefanie Figura Leibniz-Institut für Arterioskleroseforschung Department Genetische Epidemiologie vaskulärer Erkrankungen Domagkstrasse 3 48149 Münster ---------------------------------------------------------------------- ------ --- [[alternative HTML version deleted]]
• 1.3k views
ADD COMMENT
0
Entering edit mode
@stefanie-figura-3569
Last seen 10.4 years ago
Dear all, I have some problems with the analysis of the HumanHT-12 Chip from Illumina and hope somebody can help me. I have been analysing the data using the GenomeStudio Software until now. Due to the fact, that some of the bead types are underrepresented on the array, illumina implemented a so called “imputing function”. The Techsupport told me that it would not make a big difference if the imputing function is used or not. While comparing the results using both methods (no imputing vs imputing), I found that the “imputing function” leads to more than twice as many differentially expressed genes (167 vs 396). I was wondering, if there is any analog function or package implemented in R & Bioconductor ? Any kind of advice is welcome. Thank you very much in advance! Kind regards, Stefanie ---------------------------------------------------------------------- ------ --- Dipl.Chem. Stefanie Figura Leibniz-Institut für Arterioskleroseforschung Department Genetische Epidemiologie vaskulärer Erkrankungen Domagkstrasse 3 48149 Münster ---------------------------------------------------------------------- ------ --- [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Dear all, I am using topGO (elim) to find overrepresented GO terms (below is 3 example output terms). I am confused with the output "significant" and "expected". So if "significant" < "expected" (in the case of " cellular process" and "cellular metabolic proces") , does it mean the GO terms are "under-represented"? whereas if "significant" < "expected" (cell cycle) means those terms are overrepresented? If so, does topGO always reports "under-represented" GO terms as well? or I completely misunderstand it... Biological Process: GO.ID Term Annotated Significant Expected elim GO:0009987 cellular process 8686 42 50.55 2.1e-06 GO:0007049 cell cycle 278 5 1.62 0.00031 GO:0044237 cellular metabolic process 4985 19 29.01 0.01706 Could some one help me to explain it. Thank you very much, Jean On Jul 15, 2009, at 1:22 PM, Stefanie Figura wrote: > Dear all, > > > > I have some problems with the analysis of the HumanHT-12 Chip from > Illumina > and hope somebody can help me. > > > > I have been analysing the data using the GenomeStudio Software until > now. > Due to the fact, that some of the bead types are underrepresented on > the > array, illumina implemented a so called ?imputing function?. The > Techsupport > told me that it would not make a big difference if the imputing > function is > used or not. > > While comparing the results using both methods (no imputing vs > imputing), I > found that the ?imputing function? leads to more than twice as many > differentially expressed genes (167 vs 396). > > > > I was wondering, if there is any analog function or package > implemented in R > & Bioconductor ? > > > > Any kind of advice is welcome. > > Thank you very much in advance! > > > > Kind regards, > > Stefanie > > > > -------------------------------------------------------------------- -------- > --- > > Dipl.Chem. Stefanie Figura > > Leibniz-Institut f?r Arterioskleroseforschung > > Department Genetische Epidemiologie vaskul?rer Erkrankungen > > Domagkstrasse 3 > > 48149 M?nster > > -------------------------------------------------------------------- -------- > --- > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Jean, first of all, by default the functions from topGO are testing for over-represented GO-terms. One can define a test statistic which will test for under-representation however. Your case is a bit strange, or better said its a bit extreme, since based on your table you have very few significant genes, and thus the results look a bit strange. However, your confusion comes from miss-interpreting the columns of the table. The "Annotated", "Significant" and "Expected" columns show statistics computed for each GO term based on the complete annotations, meaning that the "true path rule" is used to annotated the genes to higher level terms. The "Expected" column shows an estimate of the number of genes, anode of size "Annotated" will have if the significant genes would be randomly selected from the gene universe. Now, if you would use the "classic" algorithm for testing for over-representation, then all GO terms with significant values will have the "Significant" < "Expected". However this is not the case when using methods like "elim" or "weight" which remove or weight genes annotated to GO terms when computing the significance. This happens because when you "remove" the genes the counts for the specific GO term change and the ratio between "Significant" and "Expected" changes. Thus the confusion. It might seem a bit counter intuitive the way the results are displayed in the table, but I'm using the table more to compare the results between different methods and the there columns mentioned above help me in getting an overview of the GO annotations. Also it won't be trivial to put the statistics updated after the genes are removed or weighted. And I think they will be even more confusing. The resulted p-value are used for that. I hope things are more clear now. Kind regards, Adrian On Wed, Jul 15, 2009 at 2:46 PM, jiayu wen<jiayu.jean.wen at="" gmail.com=""> wrote: > Dear all, > > I am using topGO (elim) to find overrepresented GO terms (below is 3 example > output terms). I am confused with the output "significant" and "expected". > So if "significant" ?< "expected" (in the case of " cellular process" and > "cellular metabolic proces") , does it mean the GO terms are > "under-represented"? whereas if "significant" ?< "expected" (cell cycle) > means those terms are overrepresented? If so, does topGO always reports > "under-represented" GO terms as well? or I completely misunderstand it... > > Biological Process: > ? ?GO.ID ? ? ? ? ? ? ? ? ? ? ? ? ? ? Term > ? ? ? ? ? ?Annotated ? ? ? Significant ? ? Expected ? ? ? ?elim > ? ?GO:0009987 ? ? ? ? ? ? ? ? cellular process ? ? ? ? ? ? ? ? ? ? ? ? 8686 > ? ? ? ? ? ?42 ? ? ? ? ? ? ? ? ? ? ? ? ? 50.55 ? ? ? ? ? 2.1e-06 > ? ?GO:0007049 ? ? ? ? ? ? ? ? cell cycle > ? 278 ? ? ? ? ? ? ? ? ? ? 5 ? ? ? ? ? ? ? ? ? ? ? 1.62 ? ? ? ? ? ?0.00031 > ? ?GO:0044237 ? ? ? ? ? ? ? ? cellular metabolic process ? ? ? 4985 > ? ?19 ? ? ? ? ? ? ? ? ? ? ?29.01 ? ? ? ? ? 0.01706 > > Could some one help me to explain it. > Thank you very much, > > Jean > > > On Jul 15, 2009, at 1:22 PM, Stefanie Figura wrote: > >> Dear all, >> >> >> >> I have some problems with the analysis of the HumanHT-12 Chip from >> Illumina >> and hope somebody can help me. >> >> >> >> I have been analysing the data using the GenomeStudio Software until now. >> Due to the fact, that some of the bead types are underrepresented on the >> array, illumina implemented a so called ?imputing function?. The >> Techsupport >> told me that it would not make a big difference if the imputing function >> is >> used or not. >> >> While comparing the results using both methods (no imputing vs imputing), >> I >> found that the ?imputing ?function? leads to more than twice as many >> differentially expressed genes (167 vs 396). >> >> >> >> I was wondering, if there is any analog function or package implemented in >> R >> & Bioconductor ? >> >> >> >> Any kind of advice is welcome. >> >> Thank you very much in advance! >> >> >> >> Kind regards, >> >> Stefanie >> >> >> >> >> ------------------------------------------------------------------- --------- >> --- >> >> Dipl.Chem. Stefanie Figura >> >> Leibniz-Institut f?r Arterioskleroseforschung >> >> Department Genetische Epidemiologie vaskul?rer Erkrankungen >> >> Domagkstrasse 3 >> >> 48149 M?nster >> >> >> ------------------------------------------------------------------- --------- >> --- >> >> >> >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Dear Stefanie, I cannot comment directly on that "imputing function", but it could also be that the method by which you decide that one method produces 167 differentially expressed genes and the other 396 is problematic: namely if this is a thresholding artifact (in one case, a ~140 genes just make over your threshold and in the other, they don't); or if your dataset is in fact dominated by noise, in both cases. Of course, it could also be that this "imputing function" introduces substantial distortions (I cannot judge this). You will need to look at your data more carefully to figure this out. Best wishes Wolfgang Stefanie Figura wrote: > Dear all, > > > > I have some problems with the analysis of the HumanHT-12 Chip from Illumina > and hope somebody can help me. > > > > I have been analysing the data using the GenomeStudio Software until now. > Due to the fact, that some of the bead types are underrepresented on the > array, illumina implemented a so called ?imputing function?. The Techsupport > told me that it would not make a big difference if the imputing function is > used or not. > > While comparing the results using both methods (no imputing vs imputing), I > found that the ?imputing function? leads to more than twice as many > differentially expressed genes (167 vs 396). > > > > I was wondering, if there is any analog function or package implemented in R > & Bioconductor ? > > > > Any kind of advice is welcome. > > Thank you very much in advance! > > > > Kind regards, > > Stefanie > > > > -------------------------------------------------------------------- -------- > --- > > Dipl.Chem. Stefanie Figura > > Leibniz-Institut f?r Arterioskleroseforschung > > Department Genetische Epidemiologie vaskul?rer Erkrankungen > > Domagkstrasse 3 > > 48149 M?nster > __________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang ------------------------------------------------------- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD REPLY

Login before adding your answer.

Traffic: 827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6