Affymetrix Intronic Normalization Control Probes Differentially Expressed?
2
0
Entering edit mode
@alexandra-munoz-5174
Last seen 10.2 years ago
Hi Jim et. al, I am attempting to analyze data from humans exposed to arsenic in vivo from an affymetrix Human Gene 1.0 ST Array. I have generated differentially expressed genes lists using LIMMA and an ANOVA. I am encountering a high number of control probes as top genes in both lists and am not sure if I should be ignoring this information, removing it, or utilizing it. I found an earlier post which seemed to be related and which directed the user to identify the type of probe in order to determine if its differential expression may have been an error resulting from batch effects ( http://article.gmane.org/gmane.science.biology.informatics.conductor/2 8952/match=control+probes) though based on the category of my probes I'm not sure how to proceed. NetAffx online tool my control probes fall into the category of "intronic normalization control". It doesn't make sense to me that they would be in the top genes list, and I would appreciate any help as to how to interpret their presence and if necessary about how to remove them from the analysis prior to the list generation. Here is an example of some the probe numbers I am getting 7892503 7892505 7892551 7892558 7892571 7892581 7892589 7892633 7892675 7892676 7892689 7892729 7892738 7892753 7892757 7892788 Thank you, Alexandra Munoz NYU PhD Student - Molecular and genetic toxicology [[alternative HTML version deleted]]
probe limma Category probe limma Category • 2.0k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
Hi Alexandra, On 3/18/2012 11:36 PM, Alexandra Mu?oz wrote: > Hi Jim et. al, > > I am attempting to analyze data from humans exposed to arsenic in vivo from > an affymetrix Human Gene 1.0 ST Array. I have generated differentially > expressed genes lists using LIMMA and an ANOVA. In vivo arsenic exposure? Wow. > > I am encountering a high number of control probes as top genes in both > lists and am not sure if I should be ignoring this information, removing > it, or utilizing it. I found an earlier post which seemed to be related and > which directed the user to identify the type of probe in order to determine > if its differential expression may have been an error resulting from batch > effects ( > http://article.gmane.org/gmane.science.biology.informatics.conductor /28952/match=control+probes) > though based on the category of my probes I'm not sure how to proceed. > NetAffx online tool my control probes fall into the category of "intronic > normalization control". It doesn't make sense to me that they would be > in the top genes list, and I would appreciate any help as to how to > interpret their presence and if necessary about how to remove them from the > analysis prior to the list generation. > > Here is an example of some the probe numbers I am getting > 7892503 7892505 7892551 7892558 7892571 7892581 7892589 7892633 > 7892675 7892676 7892689 7892729 7892738 7892753 7892757 7892788 You might want to talk with the folks who processed the samples and arrays to see if there is anything that might explain this. I would normally not worry if there were just a few control probes in the differential gene list, but if there are lots of them it may indicate some technical artifact that isn't being controlled for by the normalization procedure. Although I wouldn't advocate simply removing them without figuring out why they are showing up, it is easy to do. The pd.hugene.1.0.st.v1 package has the information you need: > library(pd.hugene.1.0.st.v1) > con <- db(pd.hugene.1.0.st.v1) > dbGetQuery(con, "select * from type_dict;") 1 1 main 2 2 control->affx 3 3 control->chip 4 4 control->bgp->antigenomic 5 5 control->bgp->genomic 6 6 normgene->exon 7 7 normgene->intron 8 8 rescue->FLmRNA->unmapped so you just want the 'main' probes. You can get the probeset IDs from the featureSet table. > mains <- dbGetQuery(con, "select fsetid from featureSet where type='1';")[,1] Then you can subset out any non-main probesets using this vector, and e.g., the %in% function. Best, Jim > > Thank you, > Alexandra Munoz > NYU PhD Student - Molecular and genetic toxicology > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
On Mon, Mar 19, 2012 at 9:59 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Alexandra, > > On 3/18/2012 11:36 PM, Alexandra Mu?oz wrote: >> >> Hi Jim et. al, >> >> I am attempting to analyze data from humans exposed to arsenic in vivo >> from >> an affymetrix Human Gene 1.0 ST Array. I have generated differentially >> expressed genes lists using LIMMA and an ANOVA. > > In vivo arsenic exposure? Wow. Don't worry ... they were just grad students ... -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
Paul Geeleher ★ 1.3k
@paul-geeleher-2679
Last seen 10.2 years ago
Its possible that there is a real difference, see this paper for something similar: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.001 1657 Hard to say though without knowing more about the biological question and just how significant the p-values are. Paul On Mon, Mar 19, 2012 at 3:36 AM, Alexandra Muñoz <abm362@nyu.edu> wrote: > Hi Jim et. al, > > I am attempting to analyze data from humans exposed to arsenic in vivo from > an affymetrix Human Gene 1.0 ST Array. I have generated differentially > expressed genes lists using LIMMA and an ANOVA. > > I am encountering a high number of control probes as top genes in both > lists and am not sure if I should be ignoring this information, removing > it, or utilizing it. I found an earlier post which seemed to be related and > which directed the user to identify the type of probe in order to determine > if its differential expression may have been an error resulting from batch > effects ( > > http://article.gmane.org/gmane.science.biology.informatics.conductor /28952/match=control+probes > ) > though based on the category of my probes I'm not sure how to proceed. > NetAffx online tool my control probes fall into the category of "intronic > normalization control". It doesn't make sense to me that they would be > in the top genes list, and I would appreciate any help as to how to > interpret their presence and if necessary about how to remove them from the > analysis prior to the list generation. > > Here is an example of some the probe numbers I am getting > 7892503 7892505 7892551 7892558 7892571 7892581 7892589 7892633 > 7892675 7892676 7892689 7892729 7892738 7892753 7892757 7892788 > > Thank you, > Alexandra Munoz > NYU PhD Student - Molecular and genetic toxicology > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Paul Geeleher (PhD Student) School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 508 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6