Affymetrix Intronic Normalization Control Probes Differentially Expressed?

0

Entering edit mode

Alexandra Muñoz ▴ 10

@alexandra-munoz-5174

Last seen 9.6 years ago

Hi Jim et. al, I am attempting to analyze data from humans exposed to arsenic in vivo from an affymetrix Human Gene 1.0 ST Array. I have generated differentially expressed genes lists using LIMMA and an ANOVA. I am encountering a high number of control probes as top genes in both lists and am not sure if I should be ignoring this information, removing it, or utilizing it. I found an earlier post which seemed to be related and which directed the user to identify the type of probe in order to determine if its differential expression may have been an error resulting from batch effects ( http://article.gmane.org/gmane.science.biology.informatics.conductor/2 8952/match=control+probes) though based on the category of my probes I'm not sure how to proceed. NetAffx online tool my control probes fall into the category of "intronic normalization control". It doesn't make sense to me that they would be in the top genes list, and I would appreciate any help as to how to interpret their presence and if necessary about how to remove them from the analysis prior to the list generation. Here is an example of some the probe numbers I am getting 7892503 7892505 7892551 7892558 7892571 7892581 7892589 7892633 7892675 7892676 7892689 7892729 7892738 7892753 7892757 7892788 Thank you, Alexandra Munoz NYU PhD Student - Molecular and genetic toxicology [[alternative HTML version deleted]]

probe limma Category probe limma Category • 1.9k views

ADD COMMENT • link updated 12.1 years ago by James W. MacDonald 65k • written 12.1 years ago by Alexandra Muñoz ▴ 10

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 10 hours ago

United States

Hi Alexandra, On 3/18/2012 11:36 PM, Alexandra Mu?oz wrote: > Hi Jim et. al, > > I am attempting to analyze data from humans exposed to arsenic in vivo from > an affymetrix Human Gene 1.0 ST Array. I have generated differentially > expressed genes lists using LIMMA and an ANOVA. In vivo arsenic exposure? Wow. > > I am encountering a high number of control probes as top genes in both > lists and am not sure if I should be ignoring this information, removing > it, or utilizing it. I found an earlier post which seemed to be related and > which directed the user to identify the type of probe in order to determine > if its differential expression may have been an error resulting from batch > effects ( > http://article.gmane.org/gmane.science.biology.informatics.conductor /28952/match=control+probes) > though based on the category of my probes I'm not sure how to proceed. > NetAffx online tool my control probes fall into the category of "intronic > normalization control". It doesn't make sense to me that they would be > in the top genes list, and I would appreciate any help as to how to > interpret their presence and if necessary about how to remove them from the > analysis prior to the list generation. > > Here is an example of some the probe numbers I am getting > 7892503 7892505 7892551 7892558 7892571 7892581 7892589 7892633 > 7892675 7892676 7892689 7892729 7892738 7892753 7892757 7892788 You might want to talk with the folks who processed the samples and arrays to see if there is anything that might explain this. I would normally not worry if there were just a few control probes in the differential gene list, but if there are lots of them it may indicate some technical artifact that isn't being controlled for by the normalization procedure. Although I wouldn't advocate simply removing them without figuring out why they are showing up, it is easy to do. The pd.hugene.1.0.st.v1 package has the information you need: > library(pd.hugene.1.0.st.v1) > con <- db(pd.hugene.1.0.st.v1) > dbGetQuery(con, "select * from type_dict;") 1 1 main 2 2 control->affx 3 3 control->chip 4 4 control->bgp->antigenomic 5 5 control->bgp->genomic 6 6 normgene->exon 7 7 normgene->intron 8 8 rescue->FLmRNA->unmapped so you just want the 'main' probes. You can get the probeset IDs from the featureSet table. > mains <- dbGetQuery(con, "select fsetid from featureSet where type='1';")[,1] Then you can subset out any non-main probesets using this vector, and e.g., the %in% function. Best, Jim > > Thank you, > Alexandra Munoz > NYU PhD Student - Molecular and genetic toxicology > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 12.1 years ago James W. MacDonald 65k

0

Entering edit mode

On Mon, Mar 19, 2012 at 9:59 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Alexandra, > > On 3/18/2012 11:36 PM, Alexandra Mu?oz wrote: >> >> Hi Jim et. al, >> >> I am attempting to analyze data from humans exposed to arsenic in vivo >> from >> an affymetrix Human Gene 1.0 ST Array. I have generated differentially >> expressed genes lists using LIMMA and an ANOVA. > > In vivo arsenic exposure? Wow. Don't worry ... they were just grad students ... -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 12.1 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Paul Geeleher ★ 1.3k

@paul-geeleher-2679

Last seen 9.6 years ago

Its possible that there is a real difference, see this paper for something similar: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.001 1657 Hard to say though without knowing more about the biological question and just how significant the p-values are. Paul On Mon, Mar 19, 2012 at 3:36 AM, Alexandra Muñoz <abm362@nyu.edu> wrote: > Hi Jim et. al, > > I am attempting to analyze data from humans exposed to arsenic in vivo from > an affymetrix Human Gene 1.0 ST Array. I have generated differentially > expressed genes lists using LIMMA and an ANOVA. > > I am encountering a high number of control probes as top genes in both > lists and am not sure if I should be ignoring this information, removing > it, or utilizing it. I found an earlier post which seemed to be related and > which directed the user to identify the type of probe in order to determine > if its differential expression may have been an error resulting from batch > effects ( > > http://article.gmane.org/gmane.science.biology.informatics.conductor /28952/match=control+probes > ) > though based on the category of my probes I'm not sure how to proceed. > NetAffx online tool my control probes fall into the category of "intronic > normalization control". It doesn't make sense to me that they would be > in the top genes list, and I would appreciate any help as to how to > interpret their presence and if necessary about how to remove them from the > analysis prior to the list generation. > > Here is an example of some the probe numbers I am getting > 7892503 7892505 7892551 7892558 7892571 7892581 7892589 7892633 > 7892675 7892676 7892689 7892729 7892738 7892753 7892757 7892788 > > Thank you, > Alexandra Munoz > NYU PhD Student - Molecular and genetic toxicology > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Paul Geeleher (PhD Student) School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com [[alternative HTML version deleted]]

ADD COMMENT • link 12.1 years ago Paul Geeleher ★ 1.3k

Login before adding your answer.