Analyzing expression Affymetrix Hugene1.0.st array
2
0
Entering edit mode
@juan-fernandez-tajes-5273
Last seen 10.2 years ago
Dear List, I´m working with expression data obtained from Affymetrix HuGene 1.0 st array. I´m interested in knowing how many genes are expressed in chromosome 16. Surprisingly, all the genes included (808) in the array and mapped to chromosome have expression values (from 2.01 to 12.4), can I conclude that all these genes are expressed in this tissue? Many thanks in advance Here is my code: geneCELs.N <- list.celfiles(getwd(), full.names=T) affyGeneFS.N <- read.celfiles(geneCELs.N) myAB.N <- affyGeneFS.N sampleNames(myAB.N) <- sub("\\.CEL$", "", sampleNames(myAB.N)) sampleNames(myAB.N) <- sub("\\.CEL$", "", sampleNames(myAB.N)) metadata_array.N <- read.delim(file="metadata.txt", header=T, sep="\t") rownames(metadata_array.N) <- metadata_array.N$Sample_ID phenoData(myAB.N) <- new("AnnotatedDataFrame", data=metadata_array.N) myAB.N_rma <- rma(myAB.N, target="core") annotation(myAB.N_rma) <- "hugene10sttranscriptcluster.db" ppc <- function(x) paste("^", x, sep="") myFindMap <- function(mapEnv, which){ myg <- ppc(which) a1 = eapply(mapEnv, function(x) grep(myg, x, value=T)) unlist(a1) } chr16.N <- myFindMap(hugene10sttranscriptclusterCHR, 16) chr16.N <- as.data.frame(chr16.N) chr16.N$probes <- rownames(chr16.N) probes.chr16.N <- chr16.N$probes sel.N <- match(probes.chr16.N, featureNames(myAB.N_rma), nomatch=0) es2_chr16.N <- myAB.N_rma[sel.N,] data.exprs.N <- as.data.frame(exprs(es2_chr16.N)) g.N <- featureNames(es2_chr16.N) linked.N <- links(hugene10sttranscriptclusterSYMBOL) data.exprs.N.symbol <- merge(data.exprs.N, linked.N, by.x="row.names", by.y="probe_id") row.names(data.exprs.N.symbol) <- data.exprs.N.symbol[[1]] data.exprs.N.symbol <- data.exprs.N.symbol[, -1] data.exprs.N.symbol$Mean.Exprs <- rowMeans(data.exprs.N.symbol[, 1:12]) Juan --------------------------------------------------------------- Juan Fernandez Tajes, ph. D Grupo XENOMAR Departamento de Biología Celular y Molecular Facultad de Ciencias-Universidade da Coruña Tlf. +34 981 167000 ext 2030 e-mail: jfernandezt@udc.es ---------------------------------------------------------------- [[alternative HTML version deleted]]
• 1.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
Hi Juan, On 9/28/2012 6:10 AM, Juan Fern?ndez Tajes wrote: > Dear List, > > I??m working with expression data obtained from Affymetrix HuGene 1.0 st array. I??m interested in knowing how many genes are expressed in chromosome 16. Surprisingly, all the genes included (808) in the array and mapped to chromosome have expression values (from 2.01 to 12.4), can I conclude that all these genes are expressed in this tissue? Not really. Microarrays are not suitable for determining if a gene is being expressed or not. The only use IMO of microarray data is to determine if a gene is *differentially* expressed. This is what Benilton is getting at in his response to your question. The expression values we generate from a set of microarrays are very far removed from the actual amount of mRNA that existed in the samples we are measuring, and have undergone quite a bit of manipulation. In addition, there is quite a bit of technical noise introduced in each step of the process. So the best we can hope for is that the expression value for a given gene is proportional to the amount of mRNA that existed in the original sample, but not that we are quantifying the amount of mRNA. In addition, the expression values are based off of data from a 16 bit TIFF image. So the values have a maximum range from 2^0 - 2^16, or 1-65535 on the natural scale. Given that fact, do you really want to contend that a gene with an expression of 2^2.01 is being expressed? That expression level is likely not distinguishable from noise. So one more difficulty in deciding if a gene is expressed is deciding at which point you can distinguish signal from underlying noise. Best, Jim > > Many thanks in advance > > Here is my code: > > > geneCELs.N<- list.celfiles(getwd(), full.names=T) > affyGeneFS.N<- read.celfiles(geneCELs.N) > myAB.N<- affyGeneFS.N > sampleNames(myAB.N)<- sub("\\.CEL$", "", sampleNames(myAB.N)) > sampleNames(myAB.N)<- sub("\\.CEL$", "", sampleNames(myAB.N)) > metadata_array.N<- read.delim(file="metadata.txt", header=T, sep="\t") > rownames(metadata_array.N)<- metadata_array.N$Sample_ID > phenoData(myAB.N)<- new("AnnotatedDataFrame", data=metadata_array.N) > myAB.N_rma<- rma(myAB.N, target="core") > annotation(myAB.N_rma)<- "hugene10sttranscriptcluster.db" > > ppc<- function(x) paste("^", x, sep="") > myFindMap<- function(mapEnv, which){ > myg<- ppc(which) > a1 = eapply(mapEnv, function(x) > grep(myg, x, value=T)) > unlist(a1) > } > chr16.N<- myFindMap(hugene10sttranscriptclusterCHR, 16) > chr16.N<- as.data.frame(chr16.N) > chr16.N$probes<- rownames(chr16.N) > probes.chr16.N<- chr16.N$probes > sel.N<- match(probes.chr16.N, featureNames(myAB.N_rma), nomatch=0) > es2_chr16.N<- myAB.N_rma[sel.N,] > data.exprs.N<- as.data.frame(exprs(es2_chr16.N)) > g.N<- featureNames(es2_chr16.N) > linked.N<- links(hugene10sttranscriptclusterSYMBOL) > data.exprs.N.symbol<- merge(data.exprs.N, linked.N, by.x="row.names", by.y="probe_id") > row.names(data.exprs.N.symbol)<- data.exprs.N.symbol[[1]] > data.exprs.N.symbol<- data.exprs.N.symbol[, -1] > data.exprs.N.symbol$Mean.Exprs<- rowMeans(data.exprs.N.symbol[, 1:12]) > > > Juan > > > --------------------------------------------------------------- > Juan Fernandez Tajes, ph. D > Grupo XENOMAR > Departamento de Biolog??a Celular y Molecular > Facultad de Ciencias-Universidade da Coru??a > Tlf. +34 981 167000 ext 2030 > e-mail: jfernandezt at udc.es > ---------------------------------------------------------------- > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Dear James, Many thanks for your quick and easy understable question. I would like to ask you if you could recommend me a method to determine which point could be considered as level for distinguishing expression values from noise? Juan --------------------------------------------------------------- Juan Fernandez Tajes, ph. D Grupo XENOMAR Departamento de Biología Celular y Molecular Facultad de Ciencias-Universidade da Coruña Tlf. +34 981 167000 ext 2030 e-mail: jfernandezt@udc.es ---------------------------------------------------------------- De: "James W. MacDonald" <jmacdon@uw.edu> Para: "Juan Fernández Tajes" <jfernandezt@udc.es> CC: "bioconductor" <bioconductor@r-project.org> Enviados: Viernes, 28 de Septiembre 2012 16:48:10 Asunto: Re: [BioC] Analyzing expression Affymetrix Hugene1.0.st array Hi Juan, On 9/28/2012 6:10 AM, Juan Fernández Tajes wrote: > Dear List, > > I´m working with expression data obtained from Affymetrix HuGene 1.0 st array. I´m interested in knowing how many genes are expressed in chromosome 16. Surprisingly, all the genes included (808) in the array and mapped to chromosome have expression values (from 2.01 to 12.4), can I conclude that all these genes are expressed in this tissue? Not really. Microarrays are not suitable for determining if a gene is being expressed or not. The only use IMO of microarray data is to determine if a gene is *differentially* expressed. This is what Benilton is getting at in his response to your question. The expression values we generate from a set of microarrays are very far removed from the actual amount of mRNA that existed in the samples we are measuring, and have undergone quite a bit of manipulation. In addition, there is quite a bit of technical noise introduced in each step of the process. So the best we can hope for is that the expression value for a given gene is proportional to the amount of mRNA that existed in the original sample, but not that we are quantifying the amount of mRNA. In addition, the expression values are based off of data from a 16 bit TIFF image. So the values have a maximum range from 2^0 - 2^16, or 1-65535 on the natural scale. Given that fact, do you really want to contend that a gene with an expression of 2^2.01 is being expressed? That expression level is likely not distinguishable from noise. So one more difficulty in deciding if a gene is expressed is deciding at which point you can distinguish signal from underlying noise. Best, Jim > > Many thanks in advance > > Here is my code: > > > geneCELs.N<- list.celfiles(getwd(), full.names=T) > affyGeneFS.N<- read.celfiles(geneCELs.N) > myAB.N<- affyGeneFS.N > sampleNames(myAB.N)<- sub("\\.CEL$", "", sampleNames(myAB.N)) > sampleNames(myAB.N)<- sub("\\.CEL$", "", sampleNames(myAB.N)) > metadata_array.N<- read.delim(file="metadata.txt", header=T, sep="\t") > rownames(metadata_array.N)<- metadata_array.N$Sample_ID > phenoData(myAB.N)<- new("AnnotatedDataFrame", data=metadata_array.N) > myAB.N_rma<- rma(myAB.N, target="core") > annotation(myAB.N_rma)<- "hugene10sttranscriptcluster.db" > > ppc<- function(x) paste("^", x, sep="") > myFindMap<- function(mapEnv, which){ > myg<- ppc(which) > a1 = eapply(mapEnv, function(x) > grep(myg, x, value=T)) > unlist(a1) > } > chr16.N<- myFindMap(hugene10sttranscriptclusterCHR, 16) > chr16.N<- as.data.frame(chr16.N) > chr16.N$probes<- rownames(chr16.N) > probes.chr16.N<- chr16.N$probes > sel.N<- match(probes.chr16.N, featureNames(myAB.N_rma), nomatch=0) > es2_chr16.N<- myAB.N_rma[sel.N,] > data.exprs.N<- as.data.frame(exprs(es2_chr16.N)) > g.N<- featureNames(es2_chr16.N) > linked.N<- links(hugene10sttranscriptclusterSYMBOL) > data.exprs.N.symbol<- merge(data.exprs.N, linked.N, by.x="row.names", by.y="probe_id") > row.names(data.exprs.N.symbol)<- data.exprs.N.symbol[[1]] > data.exprs.N.symbol<- data.exprs.N.symbol[, -1] > data.exprs.N.symbol$Mean.Exprs<- rowMeans(data.exprs.N.symbol[, 1:12]) > > > Juan > > > --------------------------------------------------------------- > Juan Fernandez Tajes, ph. D > Grupo XENOMAR > Departamento de Biología Celular y Molecular > Facultad de Ciencias-Universidade da Coruña > Tlf. +34 981 167000 ext 2030 > e-mail: jfernandezt@udc.es > ---------------------------------------------------------------- > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Juan, On 9/28/2012 11:05 AM, Juan Fern?ndez Tajes wrote: > Dear James, > > Many thanks for your quick and easy understable question. I would like > to ask you if you could recommend me a method to determine which point > could be considered as level for distinguishing expression values from > noise? I don't know if there is a method that has been developed that purports to do this. In the past I have seen people recommending things like using the negative controls as a lower bound (since they are supposedly not expressed). I think this becomes a bit more difficult with the Gene ST arrays, as the negative controls have a nasty habit of looking not only expressed, but differentially expressed. A lot of these controls are supposed to target introns, which makes me wonder how much of the total RNA extracted from a cell is mRNA for which the introns have yet to be excised. Anyway, to me it looks like a chicken-egg problem. You want to see what sort of expression values you get for genes that are almost surely not expressed, but given the data it is hard to decide which of the things that aren't supposed to be expressed are actually not expressed (or alternatively, which of the controls have absolutely no cross-hybridization with transcripts that are expressed). So if you are still trying to come up with a method of saying if a gene is expressed or not, I don't think you can do that with microarray data unless you are willing to make a bunch of (likely unfounded) assumptions. Best, Jim > > Juan > > --------------------------------------------------------------- > Juan Fernandez Tajes, ph. D > Grupo XENOMAR > Departamento de Biolog?a Celular y Molecular > Facultad de Ciencias-Universidade da Coru?a > Tlf. +34 981 167000 ext 2030 > e-mail: jfernandezt at udc.es > ---------------------------------------------------------------- > > > -------------------------------------------------------------------- ---- > *De: *"James W. MacDonald" <jmacdon at="" uw.edu=""> > *Para: *"Juan Fern?ndez Tajes" <jfernandezt at="" udc.es=""> > *CC: *"bioconductor" <bioconductor at="" r-project.org=""> > *Enviados: *Viernes, 28 de Septiembre 2012 16:48:10 > *Asunto: *Re: [BioC] Analyzing expression Affymetrix Hugene1.0.st array > > Hi Juan, > > On 9/28/2012 6:10 AM, Juan Fern?ndez Tajes wrote: > > Dear List, > > > > I??m working with expression data obtained from Affymetrix HuGene > 1.0 st array. I??m interested in knowing how many genes are expressed > in chromosome 16. Surprisingly, all the genes included (808) in the > array and mapped to chromosome have expression values (from 2.01 to > 12.4), can I conclude that all these genes are expressed in this tissue? > > Not really. Microarrays are not suitable for determining if a gene is > being expressed or not. The only use IMO of microarray data is to > determine if a gene is *differentially* expressed. This is what Benilton > is getting at in his response to your question. > > The expression values we generate from a set of microarrays are very far > removed from the actual amount of mRNA that existed in the samples we > are measuring, and have undergone quite a bit of manipulation. In > addition, there is quite a bit of technical noise introduced in each > step of the process. So the best we can hope for is that the expression > value for a given gene is proportional to the amount of mRNA that > existed in the original sample, but not that we are quantifying the > amount of mRNA. > > In addition, the expression values are based off of data from a 16 bit > TIFF image. So the values have a maximum range from 2^0 - 2^16, or > 1-65535 on the natural scale. Given that fact, do you really want to > contend that a gene with an expression of 2^2.01 is being expressed? > That expression level is likely not distinguishable from noise. So one > more difficulty in deciding if a gene is expressed is deciding at which > point you can distinguish signal from underlying noise. > > Best, > > Jim > > > > > > Many thanks in advance > > > > Here is my code: > > > > > > geneCELs.N<- list.celfiles(getwd(), full.names=T) > > affyGeneFS.N<- read.celfiles(geneCELs.N) > > myAB.N<- affyGeneFS.N > > sampleNames(myAB.N)<- sub("\\.CEL$", "", sampleNames(myAB.N)) > > sampleNames(myAB.N)<- sub("\\.CEL$", "", sampleNames(myAB.N)) > > metadata_array.N<- read.delim(file="metadata.txt", header=T, sep="\t") > > rownames(metadata_array.N)<- metadata_array.N$Sample_ID > > phenoData(myAB.N)<- new("AnnotatedDataFrame", data=metadata_array.N) > > myAB.N_rma<- rma(myAB.N, target="core") > > annotation(myAB.N_rma)<- "hugene10sttranscriptcluster.db" > > > > ppc<- function(x) paste("^", x, sep="") > > myFindMap<- function(mapEnv, which){ > > myg<- ppc(which) > > a1 = eapply(mapEnv, function(x) > > grep(myg, x, value=T)) > > unlist(a1) > > } > > chr16.N<- myFindMap(hugene10sttranscriptclusterCHR, 16) > > chr16.N<- as.data.frame(chr16.N) > > chr16.N$probes<- rownames(chr16.N) > > probes.chr16.N<- chr16.N$probes > > sel.N<- match(probes.chr16.N, featureNames(myAB.N_rma), nomatch=0) > > es2_chr16.N<- myAB.N_rma[sel.N,] > > data.exprs.N<- as.data.frame(exprs(es2_chr16.N)) > > g.N<- featureNames(es2_chr16.N) > > linked.N<- links(hugene10sttranscriptclusterSYMBOL) > > data.exprs.N.symbol<- merge(data.exprs.N, linked.N, > by.x="row.names", by.y="probe_id") > > row.names(data.exprs.N.symbol)<- data.exprs.N.symbol[[1]] > > data.exprs.N.symbol<- data.exprs.N.symbol[, -1] > > data.exprs.N.symbol$Mean.Exprs<- rowMeans(data.exprs.N.symbol[, 1:12]) > > > > > > Juan > > > > > > --------------------------------------------------------------- > > Juan Fernandez Tajes, ph. D > > Grupo XENOMAR > > Departamento de Biolog??a Celular y Molecular > > Facultad de Ciencias-Universidade da Coru??a > > Tlf. +34 981 167000 ext 2030 > > e-mail: jfernandezt at udc.es > > ---------------------------------------------------------------- > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Hi, Totally agree w/ everything Jim has said (this is usually a smart thing to do), but just wanted to comment on: On Fri, Sep 28, 2012 at 11:21 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: [snip] > I think this becomes a bit more difficult with the Gene ST arrays, as the > negative controls have a nasty habit of looking not only expressed, but > differentially expressed. A lot of these controls are supposed to target > introns, which makes me wonder how much of the total RNA extracted from a > cell is mRNA for which the introns have yet to be excised. Instead of such negative control probes, perhaps you (Juan) might know something about the types of cells you have data from. In particular, perhaps you can justify identifying a multitude of genes that you know not to be expressed in these cell types and use some statistics over their probe expression to rig up a lower bound of your detection limit. If you don't know this info, and have no expert to ask, maybe you can find rna-seq data in cells "close" (using some definition of "close" that makes you comfortable) and use that to find such non-transcribed genes. I guess there's also going to probe (sequence content) effects that affect the expression readout of these "silent" probes and what not, but ... if you're going for some heuristic thing that you're not using as the lynchpin of your study, then perhaps this is passable. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
On 9/28/2012 11:41 AM, Steve Lianoglou wrote: > Hi, > > Totally agree w/ everything Jim has said (this is usually a smart > thing to do), but just wanted to comment on: > On Fri, Sep 28, 2012 at 11:21 AM, James W. MacDonald<jmacdon at="" uw.edu=""> wrote: > [snip] > >> I think this becomes a bit more difficult with the Gene ST arrays, as the >> negative controls have a nasty habit of looking not only expressed, but >> differentially expressed. A lot of these controls are supposed to target >> introns, which makes me wonder how much of the total RNA extracted from a >> cell is mRNA for which the introns have yet to be excised. > Instead of such negative control probes, perhaps you (Juan) might know > something about the types of cells you have data from. In particular, > perhaps you can justify identifying a multitude of genes that you know > not to be expressed in these cell types and use some statistics over > their probe expression to rig up a lower bound of your detection > limit. > > If you don't know this info, and have no expert to ask, maybe you can > find rna-seq data in cells "close" (using some definition of "close" > that makes you comfortable) and use that to find such non- transcribed > genes. > > I guess there's also going to probe (sequence content) effects that > affect the expression readout of these "silent" probes and what not, > but ... if you're going for some heuristic thing that you're not using > as the lynchpin of your study, then perhaps this is passable. And this in a nutshell is why I am uncomfortable with trying to discern expressed from not expressed genes using microarray data. There are any number of variables that affect the fluorescence of a given spot on a microarray, only one of which is the binding of the target cDNA, and even the binding of that cDNA is predicated on GC content, and various other stoichiometric variables that I don't think we know or can appreciate. So I agree with Steve here, that if this is some tangential aspect of the study maybe it is OK. But if it is the main thrust of the analysis, then caveat emptor. Best, Jim > > -steve > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Dear Jim and Steve, Many thanks for your answers and for your tips. I´ve learn a lot and I´ll try to tell all this information to my supervisor. Many thanks again Juan --------------------------------------------------------------- Juan Fernandez Tajes, ph. D Grupo XENOMAR Departamento de Biología Celular y Molecular Facultad de Ciencias-Universidade da Coruña Tlf. +34 981 167000 ext 2030 e-mail: jfernandezt@udc.es ---------------------------------------------------------------- De: "Steve Lianoglou" <mailinglist.honeypot@gmail.com> Para: "James W. MacDonald" <jmacdon@uw.edu> CC: "Juan Fernández Tajes" <jfernandezt@udc.es>, "bioconductor" <bioconductor@r-project.org> Enviados: Viernes, 28 de Septiembre 2012 17:41:31 Asunto: Re: [BioC] Analyzing expression Affymetrix Hugene1.0.st array Hi, Totally agree w/ everything Jim has said (this is usually a smart thing to do), but just wanted to comment on: On Fri, Sep 28, 2012 at 11:21 AM, James W. MacDonald <jmacdon@uw.edu> wrote: [snip] > I think this becomes a bit more difficult with the Gene ST arrays, as the > negative controls have a nasty habit of looking not only expressed, but > differentially expressed. A lot of these controls are supposed to target > introns, which makes me wonder how much of the total RNA extracted from a > cell is mRNA for which the introns have yet to be excised. Instead of such negative control probes, perhaps you (Juan) might know something about the types of cells you have data from. In particular, perhaps you can justify identifying a multitude of genes that you know not to be expressed in these cell types and use some statistics over their probe expression to rig up a lower bound of your detection limit. If you don't know this info, and have no expert to ask, maybe you can find rna-seq data in cells "close" (using some definition of "close" that makes you comfortable) and use that to find such non-transcribed genes. I guess there's also going to probe (sequence content) effects that affect the expression readout of these "silent" probes and what not, but ... if you're going for some heuristic thing that you're not using as the lynchpin of your study, then perhaps this is passable. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@benilton-carvalho-1375
Last seen 4.7 years ago
Brazil/Campinas/UNICAMP
Juan, To call a gene (differentially) 'expressed', you need to compare its expression to some baseline. The most basic workflow for such task starts by defining a control group and the group that you want to analyse (which I'll call here "group of interest"). After preprocessing all the samples, you prepare a design matrix and fit linear models to assess the hypothesis of differential expression (ie. you compare the expression of the group of interest to the expression of the control group). This gives you (variants of) t-tests, which combined with a threshold gives you a set of candidates for differential expression. That said, you need to define what is the control group for your experiment and proceed with the statistical procedures for hypothesis testing. benilton On 28 September 2012 11:10, Juan Fern?ndez Tajes <jfernandezt at="" udc.es=""> wrote: > Dear List, > > I?m working with expression data obtained from Affymetrix HuGene 1.0 st array. I?m interested in knowing how many genes are expressed in chromosome 16. Surprisingly, all the genes included (808) in the array and mapped to chromosome have expression values (from 2.01 to 12.4), can I conclude that all these genes are expressed in this tissue? > > Many thanks in advance > > Here is my code: > > > geneCELs.N <- list.celfiles(getwd(), full.names=T) > affyGeneFS.N <- read.celfiles(geneCELs.N) > myAB.N <- affyGeneFS.N > sampleNames(myAB.N) <- sub("\\.CEL$", "", sampleNames(myAB.N)) > sampleNames(myAB.N) <- sub("\\.CEL$", "", sampleNames(myAB.N)) > metadata_array.N <- read.delim(file="metadata.txt", header=T, sep="\t") > rownames(metadata_array.N) <- metadata_array.N$Sample_ID > phenoData(myAB.N) <- new("AnnotatedDataFrame", data=metadata_array.N) > myAB.N_rma <- rma(myAB.N, target="core") > annotation(myAB.N_rma) <- "hugene10sttranscriptcluster.db" > > ppc <- function(x) paste("^", x, sep="") > myFindMap <- function(mapEnv, which){ > myg <- ppc(which) > a1 = eapply(mapEnv, function(x) > grep(myg, x, value=T)) > unlist(a1) > } > chr16.N <- myFindMap(hugene10sttranscriptclusterCHR, 16) > chr16.N <- as.data.frame(chr16.N) > chr16.N$probes <- rownames(chr16.N) > probes.chr16.N <- chr16.N$probes > sel.N <- match(probes.chr16.N, featureNames(myAB.N_rma), nomatch=0) > es2_chr16.N <- myAB.N_rma[sel.N,] > data.exprs.N <- as.data.frame(exprs(es2_chr16.N)) > g.N <- featureNames(es2_chr16.N) > linked.N <- links(hugene10sttranscriptclusterSYMBOL) > data.exprs.N.symbol <- merge(data.exprs.N, linked.N, by.x="row.names", by.y="probe_id") > row.names(data.exprs.N.symbol) <- data.exprs.N.symbol[[1]] > data.exprs.N.symbol <- data.exprs.N.symbol[, -1] > data.exprs.N.symbol$Mean.Exprs <- rowMeans(data.exprs.N.symbol[, 1:12]) > > > Juan > > > --------------------------------------------------------------- > Juan Fernandez Tajes, ph. D > Grupo XENOMAR > Departamento de Biolog?a Celular y Molecular > Facultad de Ciencias-Universidade da Coru?a > Tlf. +34 981 167000 ext 2030 > e-mail: jfernandezt at udc.es > ---------------------------------------------------------------- > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Benilton , Many thanks for your response, the problem is that I´m trying to get levels of gene expression (but whithout comparing different samples) from arrays and I think it is not a good approach. I´m already posted to Bioconductor list the question about possibility to obtain that kind of information from arrays. Juan --------------------------------------------------------------- Juan Fernandez Tajes, ph. D Grupo XENOMAR Departamento de Biología Celular y Molecular Facultad de Ciencias-Universidade da Coruña Tlf. +34 981 167000 ext 2030 e-mail: jfernandezt@udc.es ---------------------------------------------------------------- De: "Benilton Carvalho" <beniltoncarvalho@gmail.com> Para: "Juan Fernández Tajes" <jfernandezt@udc.es> CC: "bioconductor" <bioconductor@r-project.org> Enviados: Viernes, 28 de Septiembre 2012 12:34:23 Asunto: Re: [BioC] Analyzing expression Affymetrix Hugene1.0.st array Juan, To call a gene (differentially) 'expressed', you need to compare its expression to some baseline. The most basic workflow for such task starts by defining a control group and the group that you want to analyse (which I'll call here "group of interest"). After preprocessing all the samples, you prepare a design matrix and fit linear models to assess the hypothesis of differential expression (ie. you compare the expression of the group of interest to the expression of the control group). This gives you (variants of) t-tests, which combined with a threshold gives you a set of candidates for differential expression. That said, you need to define what is the control group for your experiment and proceed with the statistical procedures for hypothesis testing. benilton On 28 September 2012 11:10, Juan Fernández Tajes <jfernandezt@udc.es> wrote: > Dear List, > > I´m working with expression data obtained from Affymetrix HuGene 1.0 st array. I´m interested in knowing how many genes are expressed in chromosome 16. Surprisingly, all the genes included (808) in the array and mapped to chromosome have expression values (from 2.01 to 12.4), can I conclude that all these genes are expressed in this tissue? > > Many thanks in advance > > Here is my code: > > > geneCELs.N <- list.celfiles(getwd(), full.names=T) > affyGeneFS.N <- read.celfiles(geneCELs.N) > myAB.N <- affyGeneFS.N > sampleNames(myAB.N) <- sub("\\.CEL$", "", sampleNames(myAB.N)) > sampleNames(myAB.N) <- sub("\\.CEL$", "", sampleNames(myAB.N)) > metadata_array.N <- read.delim(file="metadata.txt", header=T, sep="\t") > rownames(metadata_array.N) <- metadata_array.N$Sample_ID > phenoData(myAB.N) <- new("AnnotatedDataFrame", data=metadata_array.N) > myAB.N_rma <- rma(myAB.N, target="core") > annotation(myAB.N_rma) <- "hugene10sttranscriptcluster.db" > > ppc <- function(x) paste("^", x, sep="") > myFindMap <- function(mapEnv, which){ > myg <- ppc(which) > a1 = eapply(mapEnv, function(x) > grep(myg, x, value=T)) > unlist(a1) > } > chr16.N <- myFindMap(hugene10sttranscriptclusterCHR, 16) > chr16.N <- as.data.frame(chr16.N) > chr16.N$probes <- rownames(chr16.N) > probes.chr16.N <- chr16.N$probes > sel.N <- match(probes.chr16.N, featureNames(myAB.N_rma), nomatch=0) > es2_chr16.N <- myAB.N_rma[sel.N,] > data.exprs.N <- as.data.frame(exprs(es2_chr16.N)) > g.N <- featureNames(es2_chr16.N) > linked.N <- links(hugene10sttranscriptclusterSYMBOL) > data.exprs.N.symbol <- merge(data.exprs.N, linked.N, by.x="row.names", by.y="probe_id") > row.names(data.exprs.N.symbol) <- data.exprs.N.symbol[[1]] > data.exprs.N.symbol <- data.exprs.N.symbol[, -1] > data.exprs.N.symbol$Mean.Exprs <- rowMeans(data.exprs.N.symbol[, 1:12]) > > > Juan > > > --------------------------------------------------------------- > Juan Fernandez Tajes, ph. D > Grupo XENOMAR > Departamento de Biología Celular y Molecular > Facultad de Ciencias-Universidade da Coruña > Tlf. +34 981 167000 ext 2030 > e-mail: jfernandezt@udc.es > ---------------------------------------------------------------- > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 601 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6