Getting probes id for particular probeset id

0

Entering edit mode

marek piatek BI ▴ 90

@marek-piatek-bi-3927

Last seen 9.7 years ago

Hi all, I'm trying to get probes for particular probset id from my MoGene arrays. From experiment description file (dabg.summary.txt) I can see that there are around 241,500 probset ids for my 12 arrays. When loading .CEL files into bioconductor I see 1,102,500 values for my 12 arrays. Thus I think there should be around 4 probes per 1 probeset on average. However, when I load an experiment description file into an AnnotatedDataFrame object: Affy.Expt <- read.AnnotatedDataFrame("dabg.summary.txt", header=TRUE, row.names=1, sep="\t") and try to use it as my phenoData when loading .CEL files into Affybatch object : Affy.Data <- ReadAffy(filenames=colnames(pData(Affy.Expt)), phenoData=Affy.Expt, verbose=TRUE) I get an error: Warning message: In read.affybatch(filenames = l$filenames, phenoData = l$phenoData, : Incompatible phenoData object. Created a new one. I understand that as a not consistent number of rows between my experiment description file (241,500 probset ids) and number of rows in .CEL files (1,102,500 probes). When it does that it resets the probsets id and starts numbering the rows from 1 to 1,102,500 and thus loosing track of probset ids. The point is that I need to know which probes belong to which probeset id and have their values stored. I looked at CDF file but it looks strange and I can't get anything useful from there. I thought that maybe looking into rma algorithm will help me out somehow, but it calls external function, which I don't understand. Is there some easy way to get that information? Thank you in advance, Mark [[alternative HTML version deleted]]

cdf cdf • 1.7k views

ADD COMMENT • link 14.2 years ago marek piatek BI ▴ 90

0

Entering edit mode

marek piatek BI ▴ 90

@marek-piatek-bi-3927

Last seen 9.7 years ago

Hi all, I?m trying to get probes for particular probset id from my MoGene arrays. From experiment description file (dabg.summary.txt) I can see that there are around 241,500 probset ids for my 12 arrays. When loading .CEL files into bioconductor I see 1,102,500 values for my 12 arrays. Thus I think there should be around 4 probes per 1 probeset on average. However, when I load an experiment description file into an AnnotatedDataFrame object: Affy.Expt <- read.AnnotatedDataFrame("dabg.summary.txt", header=TRUE, row.names=1, sep="\t") and try to use it as my phenoData when loading .CEL files into Affybatch object : Affy.Data <- ReadAffy(filenames=colnames(pData(Affy.Expt)), phenoData=Affy.Expt, verbose=TRUE) I get an error: Warning message: In read.affybatch(filenames = l$filenames, phenoData = l$phenoData, : Incompatible phenoData object. Created a new one. I understand that as a not consistent number of rows between my experiment description file (241,500 probset ids) and number of rows in .CEL files (1,102,500 probes). When it does that it resets the probsets id and starts numbering the rows from 1 to 1,102,500 and thus loosing track of probset ids. The point is that I need to know which probes belong to which probeset id and have their values stored. I looked at CDF file but it looks strange and I can?t get anything useful from there. I thought that maybe looking into rma algorithm will help me out somehow, but it calls external function, which I don?t understand. Is there some easy way to get that information? Thank you in advance, Mark

ADD COMMENT • link 14.2 years ago marek piatek BI ▴ 90

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 45 minutes ago

United States

Hi Marek, marek piatek (BI) wrote: > Hi all, > I'm trying to get probes for particular probset id from my MoGene arrays. From experiment description file (dabg.summary.txt) I can see that there are around 241,500 probset ids for my 12 arrays. When loading .CEL files into bioconductor I see 1,102,500 values for my 12 arrays. Thus I think there should be around 4 probes per 1 probeset on average. > However, when I load an experiment description file into an AnnotatedDataFrame object: > Affy.Expt <- read.AnnotatedDataFrame("dabg.summary.txt", header=TRUE, row.names=1, sep="\t") > and try to use it as my phenoData when loading .CEL files into Affybatch object : > Affy.Data <- ReadAffy(filenames=colnames(pData(Affy.Expt)), phenoData=Affy.Expt, verbose=TRUE) > I get an error: > Warning message: > In read.affybatch(filenames = l$filenames, phenoData = l$phenoData, : > Incompatible phenoData object. Created a new one. > I understand that as a not consistent number of rows between my experiment description file (241,500 probset ids) and number of rows in .CEL files (1,102,500 probes). When it does that it resets the probsets id and starts numbering the rows from 1 to 1,102,500 and thus loosing track of probset ids. > > The point is that I need to know which probes belong to which probeset id and have their values stored. > I looked at CDF file but it looks strange and I can't get anything useful from there. I thought that maybe looking into rma algorithm will help me out somehow, but it calls external function, which I don't understand. > Is there some easy way to get that information? Yes, use the functions in the affy package that were designed to do this sort of thing. Let's say you want the probe values from a few probesets: probesets <- c("10338001","10338003","10338004") probelist <- pm(Affy.Data, probesets, TRUE) will give you a list of length 3, containing the probe values for these probesets. Best, Jim > > Thank you in advance, > Mark > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 14.2 years ago James W. MacDonald 65k

0

Entering edit mode

Hi Jim and all, Thanks for your help! I also found yesterday that this task can be completed in an alternative way as well. You just have to load the "oligo" package and then basically do something like: myIndexes <- oligo:::getFidProbeset(myData) # where myData stores .CEL files data and then: myProbes <- exprs(myData[idx[,1],]) That should do the trick. Thanks for your help once again. Mark -----Original Message----- From: James W. MacDonald [mailto:jmacdon@med.umich.edu] Sent: 10 February 2010 16:23 To: marek piatek (BI) Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Getting probes id for particular probeset id Hi Marek, marek piatek (BI) wrote: > Hi all, > I'm trying to get probes for particular probset id from my MoGene arrays. From experiment description file (dabg.summary.txt) I can see that there are around 241,500 probset ids for my 12 arrays. When loading .CEL files into bioconductor I see 1,102,500 values for my 12 arrays. Thus I think there should be around 4 probes per 1 probeset on average. > However, when I load an experiment description file into an AnnotatedDataFrame object: > Affy.Expt <- read.AnnotatedDataFrame("dabg.summary.txt", header=TRUE, row.names=1, sep="\t") > and try to use it as my phenoData when loading .CEL files into Affybatch object : > Affy.Data <- ReadAffy(filenames=colnames(pData(Affy.Expt)), phenoData=Affy.Expt, verbose=TRUE) > I get an error: > Warning message: > In read.affybatch(filenames = l$filenames, phenoData = l$phenoData, : > Incompatible phenoData object. Created a new one. > I understand that as a not consistent number of rows between my experiment description file (241,500 probset ids) and number of rows in .CEL files (1,102,500 probes). When it does that it resets the probsets id and starts numbering the rows from 1 to 1,102,500 and thus loosing track of probset ids. > > The point is that I need to know which probes belong to which probeset id and have their values stored. > I looked at CDF file but it looks strange and I can't get anything useful from there. I thought that maybe looking into rma algorithm will help me out somehow, but it calls external function, which I don't understand. > Is there some easy way to get that information? Yes, use the functions in the affy package that were designed to do this sort of thing. Let's say you want the probe values from a few probesets: probesets <- c("10338001","10338003","10338004") probelist <- pm(Affy.Data, probesets, TRUE) will give you a list of length 3, containing the probe values for these probesets. Best, Jim > > Thank you in advance, > Mark > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 14.2 years ago marek piatek BI ▴ 90

0

Entering edit mode

marek piatek BI ▴ 90

@marek-piatek-bi-3927

Last seen 9.7 years ago

Hi all, I'm trying to get probes for particular probset id from my MoGene arrays. From experiment description file (dabg.summary.txt) I can see that there are around 241,500 probset ids for my 12 arrays. When loading .CEL files into bioconductor I see 1,102,500 values for my 12 arrays. Thus I think there should be around 4 probes per 1 probeset on average. However, when I load an experiment description file into an AnnotatedDataFrame object: Affy.Expt <- read.AnnotatedDataFrame("dabg.summary.txt", header=TRUE, row.names=1, sep="\t") and try to use it as my phenoData when loading .CEL files into Affybatch object : Affy.Data <- ReadAffy(filenames=colnames(pData(Affy.Expt)), phenoData=Affy.Expt, verbose=TRUE) I get an error: Warning message: In read.affybatch(filenames = l$filenames, phenoData = l$phenoData, : Incompatible phenoData object. Created a new one. I understand that as a not consistent number of rows between my experiment description file (241,500 probset ids) and number of rows in .CEL files (1,102,500 probes). When it does that it resets the probsets id and starts numbering the rows from 1 to 1,102,500 and thus loosing track of probset ids. The point is that I need to know which probes belong to which probeset id and have their values stored. I looked at CDF file but it looks strange and I can't get anything useful from there. I thought that maybe looking into rma algorithm will help me out somehow, but it calls external function, which I don't understand. Is there some easy way to get that information? Thank you in advance, Mark [[alternative HTML version deleted]]

ADD COMMENT • link 14.2 years ago marek piatek BI ▴ 90

Login before adding your answer.