Read single channel GenePix in limma [was: Analyze miRNA experiment in Bioconductor]
1
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia
Dear Paul, The limma User's Guide doesn't discuss how to read single channel data, but how to do this has been described half a dozen times on this mailing list. Since limma is designed for two colours, you can fool it by giving two column names and ignoring the one you don't need. If you only have the Cy3 channel foreground for example you might use Cy3 <- "F532 Mean" RG <- read.maimages(source="genepix",columns=list(R=Cy3,G=Cy3)) then RG$R <- NULL to remove the extraneous values. Then RG$G could be given as input to vsnMatrix() and the output analysed with lmFit(). Please don't edit your GenePix files manually, there's no need. It's prone to introducing errors and is non-reproducible. The error message "number of items to replace is not a multiple of replacement length" is not caused by having only one channel. limma gives a far more informative message in that case. The most likely explanation is that your GenePix files are not of equal lengths. If that is indeed the problem, then the limma package doesn't offer any easy solution. Your only approach would be to read the files in individually, then align the expression values yourself. You cannot use read.maimages() with source="imagene" because you do not have ImaGene files. Best wishes Gordon > Date: Fri, 9 May 2008 15:54:39 +0100 > From: "Paul Geeleher" <paulgeeleher at="" gmail.com=""> > Subject: Re: [BioC] Analyze miRNA experiment in Bioconductor > To: "Wolfgang Huber" <huber at="" ebi.ac.uk=""> > Cc: bioconductor at stat.math.ethz.ch > > Doesn't seem to be anything in the users guide specific to this kind > of analysis unfortunately. > > -Paul > > On Thu, May 8, 2008 at 10:31 AM, Wolfgang Huber <huber at="" ebi.ac.uk=""> wrote: >> Dear Paul, >> >>> Hmm interesting. I might try introducing the extra columns into the >>> files and specifying all the values as 0. I can't see why that >>> shouldn't work? >> >> It might, but Narendra's suggestion of reading the limma users guide is a >> worthwhile option to consider. >> >> Best wishes >> Wolfgang >> >> ------------------------------------------------------------------ >> Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber >> >>> >>> -Paul >>> >>> On Wed, May 7, 2008 at 1:39 PM, Narendra Kaushik >>> <kaushiknk at="" cardiff.ac.uk=""> wrote: >>>> >>>> You can specify your red channel like this: >>>> >>>> RG <- read.maimages(files,source="genepix", columns=list(R="F635 >>>> Median",G="F532 >>>> Median",Rb="B635",Gb="B532")) >>>> >>>> I will suggest you read limma guide. >>>> >>>> But I think your have data from Imagene package which gives one file for >>>> each channel, you can: >>>> >>>> files <- targets[,c("FileNameCy3","FileNameCy5")] >>>> RG <- read.maimages(files, source="imagene") >>>> >>>> Hope, this helps >>>> >>>> Narendra >>>> >>>> >>> "Paul Geeleher" <paulgeeleher at="" gmail.com=""> 07/05/2008 13:24:01 >>> >>>> >>>> >>>> Hi Deepayan, >>>> >>>> Thanks for your reply. I suppose my main concern is how I should read >>>> in the data initially in order to be able to use the normal tools to >>>> analyze the data. Reading the data normally like this: >>>> >>>> RG <- read.maimages( files, source="genepix") >>>> >>>> Gives the following error: >>>> >>>> Error in RG[[a]][, i] <- obj[, columns[[a]]] : >>>> number of items to replace is not a multiple of replacement length >>>> >>>> >>>> I'm assuming this is down to the fact that the files only contain >>>> intensity data for one color rather than two? >>>> >>>> How should I go about reading the data? >>>> >>>> Thanks alot, >>>> >>>> -Paul. >>>> >>>> On Tue, May 6, 2008 at 10:15 PM, Deepayan Sarkar >>>> <deepayan.sarkar at="" gmail.com=""> wrote: >>>> > On 5/6/08, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: >>>> > > Dear Members, >>>> > > >>>> > > I've inherited a bunch of GenePix files from an miRNA experiment. >>>> They >>>> > > are single color arrays, ( as opposed to 2 color as is the norm >>>> for >>>> > > GenePix I think). There is a subset of 7 arrays and I wish to >>>> compare >>>> > > a group of 4 of these to the other group of 3 and analyze >>>> differential >>>> > > expression between the two groups. I was hoping somebody could >>>> point >>>> > > me in the right direction of how I'd go about doing this with >>>> > > Bioconductor? Is it possible using the Limma package? Is there any >>>> > > code out there to assist me? >>>> > > >>>> > > I've experience in analyzing Affymetrix data using Limma and PUMA, >>>> but >>>> > > not GenePix, and the Limma Users Guide seems to focus on analyzing >>>> two >>>> > > dye experiments. >>>> > >>>> > Any analysis ultimately boils down to some sort of normalization, and >>>> > the actual differential expression analysis. The second part in limma >>>> > (lmFit, etc.) can work with any expression matrix, irrespective of >>>> > whether it's 2-color or 1-color (or affy). >>>> > >>>> > We have been working with a miRNA array dataset recently, and we used >>>> > limma to read in the GPR files and do the differential expression >>>> > analysis (on one channel). For normalization, many of the standard >>>> > microarray algorithms probably don't make much sense, but VSN seems >>>> to >>>> > work fine. >>>> > >>>> > We don't really have code (beyond what's already in limma and vsn) >>>> > that is generally useful; most of the work is in figuring out which >>>> > rows are of interest (i.e., those representing human miRNAs), >>>> > combining the replicates (you seem to have four of each), etc. I'm >>>> > happy to give you more details if you are interested. >>>> > >>>> > -Deepayan
0
Entering edit mode
Paul Geeleher ★ 1.3k
@paul-geeleher-2679
Last seen 7.1 years ago
Hi Gordon, Thanks for you email. I've followed your steps and am getting some output now. One problem though. When should the summarization step occur? What I mean is that, between miRNA and control signals, my GPR file contains about 3000 entries and when I am done with analysis topTable will return all of these individually. But many of the miRNAs have multiple entries in the ".gpr" file. So how, and when, should I go about combining these into one value? Thanks in advance, -Paul On Sun, May 11, 2008 at 4:59 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > Dear Paul, > > The limma User's Guide doesn't discuss how to read single channel data, but > how to do this has been described half a dozen times on this mailing list. > Since limma is designed for two colours, you can fool it by giving two > column names and ignoring the one you don't need. If you only have the Cy3 > channel foreground for example you might use > > Cy3 <- "F532 Mean" > RG <- read.maimages(source="genepix",columns=list(R=Cy3,G=Cy3)) > > then > > RG$R <- NULL > > to remove the extraneous values. > > Then RG$G could be given as input to vsnMatrix() and the output analysed > with lmFit(). > > Please don't edit your GenePix files manually, there's no need. It's prone > to introducing errors and is non-reproducible. > > The error message "number of items to replace is not a multiple of > replacement length" is not caused by having only one channel. limma gives a > far more informative message in that case. The most likely explanation is > that your GenePix files are not of equal lengths. If that is indeed the > problem, then the limma package doesn't offer any easy solution. Your only > approach would be to read the files in individually, then align the > expression values yourself. > > You cannot use read.maimages() with source="imagene" because you do not > have ImaGene files. > > Best wishes > Gordon > > > > > Date: Fri, 9 May 2008 15:54:39 +0100 > > From: "Paul Geeleher" <paulgeeleher at="" gmail.com=""> > > Subject: Re: [BioC] Analyze miRNA experiment in Bioconductor > > To: "Wolfgang Huber" <huber at="" ebi.ac.uk=""> > > Cc: bioconductor at stat.math.ethz.ch > > > > Doesn't seem to be anything in the users guide specific to this kind > > of analysis unfortunately. > > > > -Paul > > > > On Thu, May 8, 2008 at 10:31 AM, Wolfgang Huber <huber at="" ebi.ac.uk=""> wrote: > > > > > Dear Paul, > > > > > > > > > > Hmm interesting. I might try introducing the extra columns into the > > > > files and specifying all the values as 0. I can't see why that > > > > shouldn't work? > > > > > > > > > > It might, but Narendra's suggestion of reading the limma users guide is > a > > > worthwhile option to consider. > > > > > > Best wishes > > > Wolfgang > > > > > > ------------------------------------------------------------------ > > > Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber > > > > > > > > > > > > > > -Paul > > > > > > > > On Wed, May 7, 2008 at 1:39 PM, Narendra Kaushik > > > > <kaushiknk at="" cardiff.ac.uk=""> wrote: > > > > > > > > > > > > > > You can specify your red channel like this: > > > > > > > > > > RG <- read.maimages(files,source="genepix", columns=list(R="F635 > > > > > Median",G="F532 > > > > > Median",Rb="B635",Gb="B532")) > > > > > > > > > > I will suggest you read limma guide. > > > > > > > > > > But I think your have data from Imagene package which gives one > file for > > > > > each channel, you can: > > > > > > > > > > files <- targets[,c("FileNameCy3","FileNameCy5")] > > > > > RG <- read.maimages(files, source="imagene") > > > > > > > > > > Hope, this helps > > > > > > > > > > Narendra > > > > > > > > > > >>> "Paul Geeleher" <paulgeeleher at="" gmail.com=""> 07/05/2008 13:24:01 >>> > > > > > > > > > > > > > > > Hi Deepayan, > > > > > > > > > > Thanks for your reply. I suppose my main concern is how I should > read > > > > > in the data initially in order to be able to use the normal tools > to > > > > > analyze the data. Reading the data normally like this: > > > > > > > > > > RG <- read.maimages( files, source="genepix") > > > > > > > > > > Gives the following error: > > > > > > > > > > Error in RG[[a]][, i] <- obj[, columns[[a]]] : > > > > > number of items to replace is not a multiple of replacement length > > > > > > > > > > > > > > > I'm assuming this is down to the fact that the files only contain > > > > > intensity data for one color rather than two? > > > > > > > > > > How should I go about reading the data? > > > > > > > > > > Thanks alot, > > > > > > > > > > -Paul. > > > > > > > > > > On Tue, May 6, 2008 at 10:15 PM, Deepayan Sarkar > > > > > <deepayan.sarkar at="" gmail.com=""> wrote: > > > > > > On 5/6/08, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: > > > > > > > Dear Members, > > > > > > > > > > > > > > I've inherited a bunch of GenePix files from an miRNA > experiment. > > > > > They > > > > > > > are single color arrays, ( as opposed to 2 color as is the norm > > > > > for > > > > > > > GenePix I think). There is a subset of 7 arrays and I wish to > > > > > compare > > > > > > > a group of 4 of these to the other group of 3 and analyze > > > > > differential > > > > > > > expression between the two groups. I was hoping somebody could > > > > > point > > > > > > > me in the right direction of how I'd go about doing this with > > > > > > > Bioconductor? Is it possible using the Limma package? Is there > any > > > > > > > code out there to assist me? > > > > > > > > > > > > > > I've experience in analyzing Affymetrix data using Limma and > PUMA, > > > > > but > > > > > > > not GenePix, and the Limma Users Guide seems to focus on > analyzing > > > > > two > > > > > > > dye experiments. > > > > > > > > > > > > Any analysis ultimately boils down to some sort of normalization, > and > > > > > > the actual differential expression analysis. The second part in > limma > > > > > > (lmFit, etc.) can work with any expression matrix, irrespective > of > > > > > > whether it's 2-color or 1-color (or affy). > > > > > > > > > > > > We have been working with a miRNA array dataset recently, and we > used > > > > > > limma to read in the GPR files and do the differential expression > > > > > > analysis (on one channel). For normalization, many of the > standard > > > > > > microarray algorithms probably don't make much sense, but VSN > seems > > > > > to > > > > > > work fine. > > > > > > > > > > > > We don't really have code (beyond what's already in limma and > vsn) > > > > > > that is generally useful; most of the work is in figuring out > which > > > > > > rows are of interest (i.e., those representing human miRNAs), > > > > > > combining the replicates (you seem to have four of each), etc. > I'm > > > > > > happy to give you more details if you are interested. > > > > > > > > > > > > -Deepayan > > > > > > > > > > > > > > >
0
Entering edit mode
On Wed, May 14, 2008 at 7:54 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: > Hi Gordon, > > Thanks for you email. I've followed your steps and am getting some output now. > > One problem though. When should the summarization step occur? What I > mean is that, between miRNA and control signals, my GPR file contains > about 3000 entries and when I am done with analysis topTable will > return all of these individually. But many of the miRNAs have multiple > entries in the ".gpr" file. So how, and when, should I go about > combining these into one value? Paul, What is the manufacturer of these arrays? The summarization method may depend on that somewhat. Sean > On Sun, May 11, 2008 at 4:59 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> Dear Paul, >> >> The limma User's Guide doesn't discuss how to read single channel data, but >> how to do this has been described half a dozen times on this mailing list. >> Since limma is designed for two colours, you can fool it by giving two >> column names and ignoring the one you don't need. If you only have the Cy3 >> channel foreground for example you might use >> >> Cy3 <- "F532 Mean" >> RG <- read.maimages(source="genepix",columns=list(R=Cy3,G=Cy3)) >> >> then >> >> RG$R <- NULL >> >> to remove the extraneous values. >> >> Then RG$G could be given as input to vsnMatrix() and the output analysed >> with lmFit(). >> >> Please don't edit your GenePix files manually, there's no need. It's prone >> to introducing errors and is non-reproducible. >> >> The error message "number of items to replace is not a multiple of >> replacement length" is not caused by having only one channel. limma gives a >> far more informative message in that case. The most likely explanation is >> that your GenePix files are not of equal lengths. If that is indeed the >> problem, then the limma package doesn't offer any easy solution. Your only >> approach would be to read the files in individually, then align the >> expression values yourself. >> >> You cannot use read.maimages() with source="imagene" because you do not >> have ImaGene files. >> >> Best wishes >> Gordon >> >> >> >> > Date: Fri, 9 May 2008 15:54:39 +0100 >> > From: "Paul Geeleher" <paulgeeleher at="" gmail.com=""> >> > Subject: Re: [BioC] Analyze miRNA experiment in Bioconductor >> > To: "Wolfgang Huber" <huber at="" ebi.ac.uk=""> >> > Cc: bioconductor at stat.math.ethz.ch >> > >> > Doesn't seem to be anything in the users guide specific to this kind >> > of analysis unfortunately. >> > >> > -Paul >> > >> > On Thu, May 8, 2008 at 10:31 AM, Wolfgang Huber <huber at="" ebi.ac.uk=""> wrote: >> > >> > > Dear Paul, >> > > >> > > >> > > > Hmm interesting. I might try introducing the extra columns into the >> > > > files and specifying all the values as 0. I can't see why that >> > > > shouldn't work? >> > > > >> > > >> > > It might, but Narendra's suggestion of reading the limma users guide is >> a >> > > worthwhile option to consider. >> > > >> > > Best wishes >> > > Wolfgang >> > > >> > > ------------------------------------------------------------------ >> > > Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber >> > > >> > > >> > > > >> > > > -Paul >> > > > >> > > > On Wed, May 7, 2008 at 1:39 PM, Narendra Kaushik >> > > > <kaushiknk at="" cardiff.ac.uk=""> wrote: >> > > > >> > > > > >> > > > > You can specify your red channel like this: >> > > > > >> > > > > RG <- read.maimages(files,source="genepix", columns=list(R="F635 >> > > > > Median",G="F532 >> > > > > Median",Rb="B635",Gb="B532")) >> > > > > >> > > > > I will suggest you read limma guide. >> > > > > >> > > > > But I think your have data from Imagene package which gives one >> file for >> > > > > each channel, you can: >> > > > > >> > > > > files <- targets[,c("FileNameCy3","FileNameCy5")] >> > > > > RG <- read.maimages(files, source="imagene") >> > > > > >> > > > > Hope, this helps >> > > > > >> > > > > Narendra >> > > > > >> > > > > >>> "Paul Geeleher" <paulgeeleher at="" gmail.com=""> 07/05/2008 13:24:01 >>> >> > > > > >> > > > > >> > > > > Hi Deepayan, >> > > > > >> > > > > Thanks for your reply. I suppose my main concern is how I should >> read >> > > > > in the data initially in order to be able to use the normal tools >> to >> > > > > analyze the data. Reading the data normally like this: >> > > > > >> > > > > RG <- read.maimages( files, source="genepix") >> > > > > >> > > > > Gives the following error: >> > > > > >> > > > > Error in RG[[a]][, i] <- obj[, columns[[a]]] : >> > > > > number of items to replace is not a multiple of replacement length >> > > > > >> > > > > >> > > > > I'm assuming this is down to the fact that the files only contain >> > > > > intensity data for one color rather than two? >> > > > > >> > > > > How should I go about reading the data? >> > > > > >> > > > > Thanks alot, >> > > > > >> > > > > -Paul. >> > > > > >> > > > > On Tue, May 6, 2008 at 10:15 PM, Deepayan Sarkar >> > > > > <deepayan.sarkar at="" gmail.com=""> wrote: >> > > > > > On 5/6/08, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: >> > > > > > > Dear Members, >> > > > > > > >> > > > > > > I've inherited a bunch of GenePix files from an miRNA >> experiment. >> > > > > They >> > > > > > > are single color arrays, ( as opposed to 2 color as is the norm >> > > > > for >> > > > > > > GenePix I think). There is a subset of 7 arrays and I wish to >> > > > > compare >> > > > > > > a group of 4 of these to the other group of 3 and analyze >> > > > > differential >> > > > > > > expression between the two groups. I was hoping somebody could >> > > > > point >> > > > > > > me in the right direction of how I'd go about doing this with >> > > > > > > Bioconductor? Is it possible using the Limma package? Is there >> any >> > > > > > > code out there to assist me? >> > > > > > > >> > > > > > > I've experience in analyzing Affymetrix data using Limma and >> PUMA, >> > > > > but >> > > > > > > not GenePix, and the Limma Users Guide seems to focus on >> analyzing >> > > > > two >> > > > > > > dye experiments. >> > > > > > >> > > > > > Any analysis ultimately boils down to some sort of normalization, >> and >> > > > > > the actual differential expression analysis. The second part in >> limma >> > > > > > (lmFit, etc.) can work with any expression matrix, irrespective >> of >> > > > > > whether it's 2-color or 1-color (or affy). >> > > > > > >> > > > > > We have been working with a miRNA array dataset recently, and we >> used >> > > > > > limma to read in the GPR files and do the differential expression >> > > > > > analysis (on one channel). For normalization, many of the >> standard >> > > > > > microarray algorithms probably don't make much sense, but VSN >> seems >> > > > > to >> > > > > > work fine. >> > > > > > >> > > > > > We don't really have code (beyond what's already in limma and >> vsn) >> > > > > > that is generally useful; most of the work is in figuring out >> which >> > > > > > rows are of interest (i.e., those representing human miRNAs), >> > > > > > combining the replicates (you seem to have four of each), etc. >> I'm >> > > > > > happy to give you more details if you are interested. >> > > > > > >> > > > > > -Deepayan >> > > > > >> > > > >> > > >> > >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
0
Entering edit mode
Hi Sean, They are Exiqon miRNA Chips. Thanks, -Paul On Wed, May 14, 2008 at 1:29 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > On Wed, May 14, 2008 at 7:54 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: >> Hi Gordon, >> >> Thanks for you email. I've followed your steps and am getting some output now. >> >> One problem though. When should the summarization step occur? What I >> mean is that, between miRNA and control signals, my GPR file contains >> about 3000 entries and when I am done with analysis topTable will >> return all of these individually. But many of the miRNAs have multiple >> entries in the ".gpr" file. So how, and when, should I go about >> combining these into one value? > > Paul, > > What is the manufacturer of these arrays? The summarization method > may depend on that somewhat. > > Sean >> On Sun, May 11, 2008 at 4:59 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>> Dear Paul, >>> >>> The limma User's Guide doesn't discuss how to read single channel data, but >>> how to do this has been described half a dozen times on this mailing list. >>> Since limma is designed for two colours, you can fool it by giving two >>> column names and ignoring the one you don't need. If you only have the Cy3 >>> channel foreground for example you might use >>> >>> Cy3 <- "F532 Mean" >>> RG <- read.maimages(source="genepix",columns=list(R=Cy3,G=Cy3)) >>> >>> then >>> >>> RG$R <- NULL >>> >>> to remove the extraneous values. >>> >>> Then RG$G could be given as input to vsnMatrix() and the output analysed >>> with lmFit(). >>> >>> Please don't edit your GenePix files manually, there's no need. It's prone >>> to introducing errors and is non-reproducible. >>> >>> The error message "number of items to replace is not a multiple of >>> replacement length" is not caused by having only one channel. limma gives a >>> far more informative message in that case. The most likely explanation is >>> that your GenePix files are not of equal lengths. If that is indeed the >>> problem, then the limma package doesn't offer any easy solution. Your only >>> approach would be to read the files in individually, then align the >>> expression values yourself. >>> >>> You cannot use read.maimages() with source="imagene" because you do not >>> have ImaGene files. >>> >>> Best wishes >>> Gordon >>> >>> >>> >>> > Date: Fri, 9 May 2008 15:54:39 +0100 >>> > From: "Paul Geeleher" <paulgeeleher at="" gmail.com=""> >>> > Subject: Re: [BioC] Analyze miRNA experiment in Bioconductor >>> > To: "Wolfgang Huber" <huber at="" ebi.ac.uk=""> >>> > Cc: bioconductor at stat.math.ethz.ch >>> > >>> > Doesn't seem to be anything in the users guide specific to this kind >>> > of analysis unfortunately. >>> > >>> > -Paul >>> > >>> > On Thu, May 8, 2008 at 10:31 AM, Wolfgang Huber <huber at="" ebi.ac.uk=""> wrote: >>> > >>> > > Dear Paul, >>> > > >>> > > >>> > > > Hmm interesting. I might try introducing the extra columns into the >>> > > > files and specifying all the values as 0. I can't see why that >>> > > > shouldn't work? >>> > > > >>> > > >>> > > It might, but Narendra's suggestion of reading the limma users guide is >>> a >>> > > worthwhile option to consider. >>> > > >>> > > Best wishes >>> > > Wolfgang >>> > > >>> > > ------------------------------------------------------------------ >>> > > Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber >>> > > >>> > > >>> > > > >>> > > > -Paul >>> > > > >>> > > > On Wed, May 7, 2008 at 1:39 PM, Narendra Kaushik >>> > > > <kaushiknk at="" cardiff.ac.uk=""> wrote: >>> > > > >>> > > > > >>> > > > > You can specify your red channel like this: >>> > > > > >>> > > > > RG <- read.maimages(files,source="genepix", columns=list(R="F635 >>> > > > > Median",G="F532 >>> > > > > Median",Rb="B635",Gb="B532")) >>> > > > > >>> > > > > I will suggest you read limma guide. >>> > > > > >>> > > > > But I think your have data from Imagene package which gives one >>> file for >>> > > > > each channel, you can: >>> > > > > >>> > > > > files <- targets[,c("FileNameCy3","FileNameCy5")] >>> > > > > RG <- read.maimages(files, source="imagene") >>> > > > > >>> > > > > Hope, this helps >>> > > > > >>> > > > > Narendra >>> > > > > >>> > > > > >>> "Paul Geeleher" <paulgeeleher at="" gmail.com=""> 07/05/2008 13:24:01 >>> >>> > > > > >>> > > > > >>> > > > > Hi Deepayan, >>> > > > > >>> > > > > Thanks for your reply. I suppose my main concern is how I should >>> read >>> > > > > in the data initially in order to be able to use the normal tools >>> to >>> > > > > analyze the data. Reading the data normally like this: >>> > > > > >>> > > > > RG <- read.maimages( files, source="genepix") >>> > > > > >>> > > > > Gives the following error: >>> > > > > >>> > > > > Error in RG[[a]][, i] <- obj[, columns[[a]]] : >>> > > > > number of items to replace is not a multiple of replacement length >>> > > > > >>> > > > > >>> > > > > I'm assuming this is down to the fact that the files only contain >>> > > > > intensity data for one color rather than two? >>> > > > > >>> > > > > How should I go about reading the data? >>> > > > > >>> > > > > Thanks alot, >>> > > > > >>> > > > > -Paul. >>> > > > > >>> > > > > On Tue, May 6, 2008 at 10:15 PM, Deepayan Sarkar >>> > > > > <deepayan.sarkar at="" gmail.com=""> wrote: >>> > > > > > On 5/6/08, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: >>> > > > > > > Dear Members, >>> > > > > > > >>> > > > > > > I've inherited a bunch of GenePix files from an miRNA >>> experiment. >>> > > > > They >>> > > > > > > are single color arrays, ( as opposed to 2 color as is the norm >>> > > > > for >>> > > > > > > GenePix I think). There is a subset of 7 arrays and I wish to >>> > > > > compare >>> > > > > > > a group of 4 of these to the other group of 3 and analyze >>> > > > > differential >>> > > > > > > expression between the two groups. I was hoping somebody could >>> > > > > point >>> > > > > > > me in the right direction of how I'd go about doing this with >>> > > > > > > Bioconductor? Is it possible using the Limma package? Is there >>> any >>> > > > > > > code out there to assist me? >>> > > > > > > >>> > > > > > > I've experience in analyzing Affymetrix data using Limma and >>> PUMA, >>> > > > > but >>> > > > > > > not GenePix, and the Limma Users Guide seems to focus on >>> analyzing >>> > > > > two >>> > > > > > > dye experiments. >>> > > > > > >>> > > > > > Any analysis ultimately boils down to some sort of normalization, >>> and >>> > > > > > the actual differential expression analysis. The second part in >>> limma >>> > > > > > (lmFit, etc.) can work with any expression matrix, irrespective >>> of >>> > > > > > whether it's 2-color or 1-color (or affy). >>> > > > > > >>> > > > > > We have been working with a miRNA array dataset recently, and we >>> used >>> > > > > > limma to read in the GPR files and do the differential expression >>> > > > > > analysis (on one channel). For normalization, many of the >>> standard >>> > > > > > microarray algorithms probably don't make much sense, but VSN >>> seems >>> > > > > to >>> > > > > > work fine. >>> > > > > > >>> > > > > > We don't really have code (beyond what's already in limma and >>> vsn) >>> > > > > > that is generally useful; most of the work is in figuring out >>> which >>> > > > > > rows are of interest (i.e., those representing human miRNAs), >>> > > > > > combining the replicates (you seem to have four of each), etc. >>> I'm >>> > > > > > happy to give you more details if you are interested. >>> > > > > > >>> > > > > > -Deepayan >>> > > > > >>> > > > >>> > > >>> > >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
0
Entering edit mode
By the way, I've uploaded one of the .gpr files if it would help to have a look: http://frink.nuigalway.ie/~pat/2007-02-19_130_0532.gpr -Paul On Wed, May 14, 2008 at 1:29 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > On Wed, May 14, 2008 at 7:54 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: >> Hi Gordon, >> >> Thanks for you email. I've followed your steps and am getting some output now. >> >> One problem though. When should the summarization step occur? What I >> mean is that, between miRNA and control signals, my GPR file contains >> about 3000 entries and when I am done with analysis topTable will >> return all of these individually. But many of the miRNAs have multiple >> entries in the ".gpr" file. So how, and when, should I go about >> combining these into one value? > > Paul, > > What is the manufacturer of these arrays? The summarization method > may depend on that somewhat. > > Sean >> On Sun, May 11, 2008 at 4:59 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>> Dear Paul, >>> >>> The limma User's Guide doesn't discuss how to read single channel data, but >>> how to do this has been described half a dozen times on this mailing list. >>> Since limma is designed for two colours, you can fool it by giving two >>> column names and ignoring the one you don't need. If you only have the Cy3 >>> channel foreground for example you might use >>> >>> Cy3 <- "F532 Mean" >>> RG <- read.maimages(source="genepix",columns=list(R=Cy3,G=Cy3)) >>> >>> then >>> >>> RG$R <- NULL >>> >>> to remove the extraneous values. >>> >>> Then RG$G could be given as input to vsnMatrix() and the output analysed >>> with lmFit(). >>> >>> Please don't edit your GenePix files manually, there's no need. It's prone >>> to introducing errors and is non-reproducible. >>> >>> The error message "number of items to replace is not a multiple of >>> replacement length" is not caused by having only one channel. limma gives a >>> far more informative message in that case. The most likely explanation is >>> that your GenePix files are not of equal lengths. If that is indeed the >>> problem, then the limma package doesn't offer any easy solution. Your only >>> approach would be to read the files in individually, then align the >>> expression values yourself. >>> >>> You cannot use read.maimages() with source="imagene" because you do not >>> have ImaGene files. >>> >>> Best wishes >>> Gordon >>> >>> >>> >>> > Date: Fri, 9 May 2008 15:54:39 +0100 >>> > From: "Paul Geeleher" <paulgeeleher at="" gmail.com=""> >>> > Subject: Re: [BioC] Analyze miRNA experiment in Bioconductor >>> > To: "Wolfgang Huber" <huber at="" ebi.ac.uk=""> >>> > Cc: bioconductor at stat.math.ethz.ch >>> > >>> > Doesn't seem to be anything in the users guide specific to this kind >>> > of analysis unfortunately. >>> > >>> > -Paul >>> > >>> > On Thu, May 8, 2008 at 10:31 AM, Wolfgang Huber <huber at="" ebi.ac.uk=""> wrote: >>> > >>> > > Dear Paul, >>> > > >>> > > >>> > > > Hmm interesting. I might try introducing the extra columns into the >>> > > > files and specifying all the values as 0. I can't see why that >>> > > > shouldn't work? >>> > > > >>> > > >>> > > It might, but Narendra's suggestion of reading the limma users guide is >>> a >>> > > worthwhile option to consider. >>> > > >>> > > Best wishes >>> > > Wolfgang >>> > > >>> > > ------------------------------------------------------------------ >>> > > Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber >>> > > >>> > > >>> > > > >>> > > > -Paul >>> > > > >>> > > > On Wed, May 7, 2008 at 1:39 PM, Narendra Kaushik >>> > > > <kaushiknk at="" cardiff.ac.uk=""> wrote: >>> > > > >>> > > > > >>> > > > > You can specify your red channel like this: >>> > > > > >>> > > > > RG <- read.maimages(files,source="genepix", columns=list(R="F635 >>> > > > > Median",G="F532 >>> > > > > Median",Rb="B635",Gb="B532")) >>> > > > > >>> > > > > I will suggest you read limma guide. >>> > > > > >>> > > > > But I think your have data from Imagene package which gives one >>> file for >>> > > > > each channel, you can: >>> > > > > >>> > > > > files <- targets[,c("FileNameCy3","FileNameCy5")] >>> > > > > RG <- read.maimages(files, source="imagene") >>> > > > > >>> > > > > Hope, this helps >>> > > > > >>> > > > > Narendra >>> > > > > >>> > > > > >>> "Paul Geeleher" <paulgeeleher at="" gmail.com=""> 07/05/2008 13:24:01 >>> >>> > > > > >>> > > > > >>> > > > > Hi Deepayan, >>> > > > > >>> > > > > Thanks for your reply. I suppose my main concern is how I should >>> read >>> > > > > in the data initially in order to be able to use the normal tools >>> to >>> > > > > analyze the data. Reading the data normally like this: >>> > > > > >>> > > > > RG <- read.maimages( files, source="genepix") >>> > > > > >>> > > > > Gives the following error: >>> > > > > >>> > > > > Error in RG[[a]][, i] <- obj[, columns[[a]]] : >>> > > > > number of items to replace is not a multiple of replacement length >>> > > > > >>> > > > > >>> > > > > I'm assuming this is down to the fact that the files only contain >>> > > > > intensity data for one color rather than two? >>> > > > > >>> > > > > How should I go about reading the data? >>> > > > > >>> > > > > Thanks alot, >>> > > > > >>> > > > > -Paul. >>> > > > > >>> > > > > On Tue, May 6, 2008 at 10:15 PM, Deepayan Sarkar >>> > > > > <deepayan.sarkar at="" gmail.com=""> wrote: >>> > > > > > On 5/6/08, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: >>> > > > > > > Dear Members, >>> > > > > > > >>> > > > > > > I've inherited a bunch of GenePix files from an miRNA >>> experiment. >>> > > > > They >>> > > > > > > are single color arrays, ( as opposed to 2 color as is the norm >>> > > > > for >>> > > > > > > GenePix I think). There is a subset of 7 arrays and I wish to >>> > > > > compare >>> > > > > > > a group of 4 of these to the other group of 3 and analyze >>> > > > > differential >>> > > > > > > expression between the two groups. I was hoping somebody could >>> > > > > point >>> > > > > > > me in the right direction of how I'd go about doing this with >>> > > > > > > Bioconductor? Is it possible using the Limma package? Is there >>> any >>> > > > > > > code out there to assist me? >>> > > > > > > >>> > > > > > > I've experience in analyzing Affymetrix data using Limma and >>> PUMA, >>> > > > > but >>> > > > > > > not GenePix, and the Limma Users Guide seems to focus on >>> analyzing >>> > > > > two >>> > > > > > > dye experiments. >>> > > > > > >>> > > > > > Any analysis ultimately boils down to some sort of normalization, >>> and >>> > > > > > the actual differential expression analysis. The second part in >>> limma >>> > > > > > (lmFit, etc.) can work with any expression matrix, irrespective >>> of >>> > > > > > whether it's 2-color or 1-color (or affy). >>> > > > > > >>> > > > > > We have been working with a miRNA array dataset recently, and we >>> used >>> > > > > > limma to read in the GPR files and do the differential expression >>> > > > > > analysis (on one channel). For normalization, many of the >>> standard >>> > > > > > microarray algorithms probably don't make much sense, but VSN >>> seems >>> > > > > to >>> > > > > > work fine. >>> > > > > > >>> > > > > > We don't really have code (beyond what's already in limma and >>> vsn) >>> > > > > > that is generally useful; most of the work is in figuring out >>> which >>> > > > > > rows are of interest (i.e., those representing human miRNAs), >>> > > > > > combining the replicates (you seem to have four of each), etc. >>> I'm >>> > > > > > happy to give you more details if you are interested. >>> > > > > > >>> > > > > > -Deepayan >>> > > > > >>> > > > >>> > > >>> > >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
0
Entering edit mode
On Wed, May 14, 2008 at 9:34 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: > By the way, I've uploaded one of the .gpr files if it would help to have a look: > > http://frink.nuigalway.ie/~pat/2007-02-19_130_0532.gpr Hi, Paul. I'm not sure what the "ID" column represents, but it appears that each miR could be represented by several IDs. If ID represents a unique sequence, then I think you could summarize across IDs, but I'm not sure that I would suggest summarizing each miR, as each sequence will likely have different hyb characteristics. All that said, I have not used Exiqon arrays, so I am just guessing about how they are designed. Sean > On Wed, May 14, 2008 at 1:29 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: >> On Wed, May 14, 2008 at 7:54 AM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: >>> Hi Gordon, >>> >>> Thanks for you email. I've followed your steps and am getting some output now. >>> >>> One problem though. When should the summarization step occur? What I >>> mean is that, between miRNA and control signals, my GPR file contains >>> about 3000 entries and when I am done with analysis topTable will >>> return all of these individually. But many of the miRNAs have multiple >>> entries in the ".gpr" file. So how, and when, should I go about >>> combining these into one value? >> >> Paul, >> >> What is the manufacturer of these arrays? The summarization method >> may depend on that somewhat. >> >> Sean >>> On Sun, May 11, 2008 at 4:59 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>> Dear Paul, >>>> >>>> The limma User's Guide doesn't discuss how to read single channel data, but >>>> how to do this has been described half a dozen times on this mailing list. >>>> Since limma is designed for two colours, you can fool it by giving two >>>> column names and ignoring the one you don't need. If you only have the Cy3 >>>> channel foreground for example you might use >>>> >>>> Cy3 <- "F532 Mean" >>>> RG <- read.maimages(source="genepix",columns=list(R=Cy3,G=Cy3)) >>>> >>>> then >>>> >>>> RG$R <- NULL >>>> >>>> to remove the extraneous values. >>>> >>>> Then RG$G could be given as input to vsnMatrix() and the output analysed >>>> with lmFit(). >>>> >>>> Please don't edit your GenePix files manually, there's no need. It's prone >>>> to introducing errors and is non-reproducible. >>>> >>>> The error message "number of items to replace is not a multiple of >>>> replacement length" is not caused by having only one channel. limma gives a >>>> far more informative message in that case. The most likely explanation is >>>> that your GenePix files are not of equal lengths. If that is indeed the >>>> problem, then the limma package doesn't offer any easy solution. Your only >>>> approach would be to read the files in individually, then align the >>>> expression values yourself. >>>> >>>> You cannot use read.maimages() with source="imagene" because you do not >>>> have ImaGene files. >>>> >>>> Best wishes >>>> Gordon >>>> >>>> >>>> >>>> > Date: Fri, 9 May 2008 15:54:39 +0100 >>>> > From: "Paul Geeleher" <paulgeeleher at="" gmail.com=""> >>>> > Subject: Re: [BioC] Analyze miRNA experiment in Bioconductor >>>> > To: "Wolfgang Huber" <huber at="" ebi.ac.uk=""> >>>> > Cc: bioconductor at stat.math.ethz.ch >>>> > >>>> > Doesn't seem to be anything in the users guide specific to this kind >>>> > of analysis unfortunately. >>>> > >>>> > -Paul >>>> > >>>> > On Thu, May 8, 2008 at 10:31 AM, Wolfgang Huber <huber at="" ebi.ac.uk=""> wrote: >>>> > >>>> > > Dear Paul, >>>> > > >>>> > > >>>> > > > Hmm interesting. I might try introducing the extra columns into the >>>> > > > files and specifying all the values as 0. I can't see why that >>>> > > > shouldn't work? >>>> > > > >>>> > > >>>> > > It might, but Narendra's suggestion of reading the limma users guide is >>>> a >>>> > > worthwhile option to consider. >>>> > > >>>> > > Best wishes >>>> > > Wolfgang >>>> > > >>>> > > ------------------------------------------------------------------ >>>> > > Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber >>>> > > >>>> > > >>>> > > > >>>> > > > -Paul >>>> > > > >>>> > > > On Wed, May 7, 2008 at 1:39 PM, Narendra Kaushik >>>> > > > <kaushiknk at="" cardiff.ac.uk=""> wrote: >>>> > > > >>>> > > > > >>>> > > > > You can specify your red channel like this: >>>> > > > > >>>> > > > > RG <- read.maimages(files,source="genepix", columns=list(R="F635 >>>> > > > > Median",G="F532 >>>> > > > > Median",Rb="B635",Gb="B532")) >>>> > > > > >>>> > > > > I will suggest you read limma guide. >>>> > > > > >>>> > > > > But I think your have data from Imagene package which gives one >>>> file for >>>> > > > > each channel, you can: >>>> > > > > >>>> > > > > files <- targets[,c("FileNameCy3","FileNameCy5")] >>>> > > > > RG <- read.maimages(files, source="imagene") >>>> > > > > >>>> > > > > Hope, this helps >>>> > > > > >>>> > > > > Narendra >>>> > > > > >>>> > > > > >>> "Paul Geeleher" <paulgeeleher at="" gmail.com=""> 07/05/2008 13:24:01 >>> >>>> > > > > >>>> > > > > >>>> > > > > Hi Deepayan, >>>> > > > > >>>> > > > > Thanks for your reply. I suppose my main concern is how I should >>>> read >>>> > > > > in the data initially in order to be able to use the normal tools >>>> to >>>> > > > > analyze the data. Reading the data normally like this: >>>> > > > > >>>> > > > > RG <- read.maimages( files, source="genepix") >>>> > > > > >>>> > > > > Gives the following error: >>>> > > > > >>>> > > > > Error in RG[[a]][, i] <- obj[, columns[[a]]] : >>>> > > > > number of items to replace is not a multiple of replacement length >>>> > > > > >>>> > > > > >>>> > > > > I'm assuming this is down to the fact that the files only contain >>>> > > > > intensity data for one color rather than two? >>>> > > > > >>>> > > > > How should I go about reading the data? >>>> > > > > >>>> > > > > Thanks alot, >>>> > > > > >>>> > > > > -Paul. >>>> > > > > >>>> > > > > On Tue, May 6, 2008 at 10:15 PM, Deepayan Sarkar >>>> > > > > <deepayan.sarkar at="" gmail.com=""> wrote: >>>> > > > > > On 5/6/08, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: >>>> > > > > > > Dear Members, >>>> > > > > > > >>>> > > > > > > I've inherited a bunch of GenePix files from an miRNA >>>> experiment. >>>> > > > > They >>>> > > > > > > are single color arrays, ( as opposed to 2 color as is the norm >>>> > > > > for >>>> > > > > > > GenePix I think). There is a subset of 7 arrays and I wish to >>>> > > > > compare >>>> > > > > > > a group of 4 of these to the other group of 3 and analyze >>>> > > > > differential >>>> > > > > > > expression between the two groups. I was hoping somebody could >>>> > > > > point >>>> > > > > > > me in the right direction of how I'd go about doing this with >>>> > > > > > > Bioconductor? Is it possible using the Limma package? Is there >>>> any >>>> > > > > > > code out there to assist me? >>>> > > > > > > >>>> > > > > > > I've experience in analyzing Affymetrix data using Limma and >>>> PUMA, >>>> > > > > but >>>> > > > > > > not GenePix, and the Limma Users Guide seems to focus on >>>> analyzing >>>> > > > > two >>>> > > > > > > dye experiments. >>>> > > > > > >>>> > > > > > Any analysis ultimately boils down to some sort of normalization, >>>> and >>>> > > > > > the actual differential expression analysis. The second part in >>>> limma >>>> > > > > > (lmFit, etc.) can work with any expression matrix, irrespective >>>> of >>>> > > > > > whether it's 2-color or 1-color (or affy). >>>> > > > > > >>>> > > > > > We have been working with a miRNA array dataset recently, and we >>>> used >>>> > > > > > limma to read in the GPR files and do the differential expression >>>> > > > > > analysis (on one channel). For normalization, many of the >>>> standard >>>> > > > > > microarray algorithms probably don't make much sense, but VSN >>>> seems >>>> > > > > to >>>> > > > > > work fine. >>>> > > > > > >>>> > > > > > We don't really have code (beyond what's already in limma and >>>> vsn) >>>> > > > > > that is generally useful; most of the work is in figuring out >>>> which >>>> > > > > > rows are of interest (i.e., those representing human miRNAs), >>>> > > > > > combining the replicates (you seem to have four of each), etc. >>>> I'm >>>> > > > > > happy to give you more details if you are interested. >>>> > > > > > >>>> > > > > > -Deepayan >>>> > > > > >>>> > > > >>>> > > >>>> > >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >
0
Entering edit mode
Dear Paul, I have no experience with miRNA arrays, so cannot give you any specific advice. With ordinary expression arrays, my practice is to keep the probes separate, especially if they actually have differing sequences. There's no good summarisation method which feeds into the topTable format. I convert to genes or transcript only at the interpretation stage. The only exception are within array replicates which satisfy the (restrictive) assumptions of the duplicateCorrelation() function. Best wishes Gordon On Wed, 14 May 2008, Paul Geeleher wrote: > Hi Gordon, > > Thanks for you email. I've followed your steps and am getting some output now. > > One problem though. When should the summarization step occur? What I > mean is that, between miRNA and control signals, my GPR file contains > about 3000 entries and when I am done with analysis topTable will > return all of these individually. But many of the miRNAs have multiple > entries in the ".gpr" file. So how, and when, should I go about > combining these into one value? > > Thanks in advance, > -Paul
0
Entering edit mode
Ok I'm close to having this all sorted and have the duplicateCorrelation() function working. For posterity I'd be happy to post a the code and detailed explanation from everything I've done to the mailing list. It should be useful for anyone doing similar MiRNA analysis in future. My last question is regarding the design matrix, I'm not sure if I'm creating it properly. I have 7 arrays, a, b, c, d, e, f and g. I want to measure differential expression of arrays a, b, c & d against e, f & g. In some of the tutorials in the limmaUsersGuide, this design matrix is simply created as follows: design <- c(-1, -1, -1, -1, 1, 1, 1) I've also seen it created using methods like this: pData <- data.frame(population = c('HER2+', 'HER2+', 'HER2+', 'HER2+', 'HER2-', 'HER2-', 'HER2-')) rownames(pData) <- RG$targets$FileName design <- model.matrix(~factor(pData$population)) These two different methods give me very different p-values come the end of analysis and I'm wondering what exactly I should be doing? Thanks for any advice, -Paul. On Thu, May 15, 2008 at 1:40 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > Dear Paul, > > I have no experience with miRNA arrays, so cannot give you any specific > advice. > > With ordinary expression arrays, my practice is to keep the probes separate, > especially if they actually have differing sequences. There's no good > summarisation method which feeds into the topTable format. I convert to > genes or transcript only at the interpretation stage. The only exception > are within array replicates which satisfy the (restrictive) assumptions of > the duplicateCorrelation() function. > > Best wishes > Gordon > > On Wed, 14 May 2008, Paul Geeleher wrote: > >> Hi Gordon, >> >> Thanks for you email. I've followed your steps and am getting some output >> now. >> >> One problem though. When should the summarization step occur? What I >> mean is that, between miRNA and control signals, my GPR file contains >> about 3000 entries and when I am done with analysis topTable will >> return all of these individually. But many of the miRNAs have multiple >> entries in the ".gpr" file. So how, and when, should I go about >> combining these into one value? >> >> Thanks in advance, >> -Paul > ADD REPLY 0 Entering edit mode Not the most experienced in this area, but I think your design is pretty much like section 8.5 in the Limma's user guide. "There are two major ways in which this comparison can be made. We can 1. create a design matrix which includes a coefficient for the mutant vs wild type difference, or 2. create a design matrix which includes separate coefficients for wild type and mutant mice and then extract the difference as a contrast." So in your situation you would have either > design plus plusvsminus a 1 0 b 1 0 c 1 0 d 1 0 e 1 1 f 1 1 g 1 1 > fit <- lmFit(eset, design) > fit <- eBayes(fit) > topTable(fit, coef="plusvsminus", adjust="BH") or plus minus a 1 0 b 1 0 c 1 0 d 1 0 e 0 1 f 0 1 g 0 1 > fit <- lmFit(eset, design) > cont.matrix <- makeContrasts(plusvsminus=plus-minus, levels=design) > fit2 <- contrasts.fit(fit, cont.matrix) > fit2 <- eBayes(fit2) > topTable(fit2, adjust="BH") Hope that helps. Paul Geeleher wrote: > Ok I'm close to having this all sorted and have the > duplicateCorrelation() function working. For posterity I'd be happy to > post a the code and detailed explanation from everything I've done to > the mailing list. It should be useful for anyone doing similar MiRNA > analysis in future. My last question is regarding the design matrix, > I'm not sure if I'm creating it properly. > > I have 7 arrays, a, b, c, d, e, f and g. I want to measure > differential expression of arrays a, b, c & d against e, f & g. > > In some of the tutorials in the limmaUsersGuide, this design matrix is > simply created as follows: > > design <- c(-1, -1, -1, -1, 1, 1, 1) > > I've also seen it created using methods like this: > > pData <- data.frame(population = c('HER2+', 'HER2+', 'HER2+', 'HER2+', > 'HER2-', 'HER2-', 'HER2-')) > rownames(pData) <- RG$targets$FileName > design <- model.matrix(~factor(pData$population)) > > > These two different methods give me very different p-values come the > end of analysis and I'm wondering what exactly I should be doing? > > Thanks for any advice, > > -Paul. > > On Thu, May 15, 2008 at 1:40 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> Dear Paul, >> >> I have no experience with miRNA arrays, so cannot give you any specific >> advice. >> >> With ordinary expression arrays, my practice is to keep the probes separate, >> especially if they actually have differing sequences. There's no good >> summarisation method which feeds into the topTable format. I convert to >> genes or transcript only at the interpretation stage. The only exception >> are within array replicates which satisfy the (restrictive) assumptions of >> the duplicateCorrelation() function. >> >> Best wishes >> Gordon >> >> On Wed, 14 May 2008, Paul Geeleher wrote: >> >>> Hi Gordon, >>> >>> Thanks for you email. I've followed your steps and am getting some output >>> now. >>> >>> One problem though. When should the summarization step occur? What I >>> mean is that, between miRNA and control signals, my GPR file contains >>> about 3000 entries and when I am done with analysis topTable will >>> return all of these individually. But many of the miRNAs have multiple >>> entries in the ".gpr" file. So how, and when, should I go about >>> combining these into one value? >>> >>> Thanks in advance, >>> -Paul > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- ************************************************************** Daniel Brewer Institute of Cancer Research Molecular Carcinogenesis MUCRC 15 Cotswold Road Sutton, Surrey SM2 5NG United Kingdom Tel: +44 (0) 20 8722 4109 Fax: +44 (0) 20 8722 4141 Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}
0
Entering edit mode
Ah yes thanks Daniel, I think thats problem solved. Anyway as I said, here's what I've what I've come up with in analyzing genepix miRNA data in bioconductor. I doubt this code is perfect so comments and improvements are encouraged... # A script for calculating differential expression of miRNA from single channel genepix data # Paul Geeleher library(limma) library(RColorBrewer) # create a color pallete for boxplots cols <- brewer.pal(12, "Set3") # This defines the column name of the mean Cy3 foreground intensites Cy3 <- "F532 Mean" # This defines the column name of the mean Cy3 background intensites Cy3b <- "B532 Mean" # Read the targets file (see limmaUsersGuide for info on how to create this) targets <- readTargets("targets.csv") # read values from gpr file # because there is no red channel, read Rb & R to be the same values as G and Gb # this is a type of hack that allows the function to work RG <- read.maimages( targets$FileName, source="genepix",columns=list(R=Cy3,G=Cy3, Rb=Cy3b, Gb=Cy3b)) # remove the extraneous red channel values RG$R <- NULL RG$Rb <- NULL # create a pData for the data just read # this indicates which population each array belongs to pData <- data.frame(population = c('a', 'a', 'a', 'a', 'b', 'b', 'b')) rownames(pData) <- RG$targets$FileName # create design matrix design <- model.matrix(~factor(pData$population)) # In my .gpr files all miRNAs contain the string "miR" in their name # so the grep function can be used to extract all of these, removing # all control signals and printing buffers etc. # You need to check your .gpr files to find which signals you should extract. miRs <- grep("miR", RG$genes$Name) RG.final <- RG[miRs, ] # boxplot of foreground and background intensities, useful for quality control boxplot(data.frame(log2(RG.final$Gb)),main="Background", col=cols) boxplot(data.frame(log2(RG.final$G)),main="Foreground", col=cols) # load vsn library, this contains the normalization functions library('vsn') # Do VSN normalization and output as vns object mat <- vsnMatrix(RG.final$G) # view a boxplot of the normalized data boxplot(as.data.frame(mat at hx), main="Normalized intensities", col=cols) # insert rownames into matrix with normalized data # this will mean that the gene names will appear in our final output rownames(mat at hx) <- RG.final$genes$Name # my .gpr files contain 4 "within-array replicates" of each miRNA. # We need to make full use of this information by calculating the duplicate correlation # in order to use the duplicateCorrelation() function below, # we need to make sure that the replicates of each gene appear in sequence in this matrix, # so we sort the normalized values so the replicate groups of 4 miRs appear in sequence mat at hx <- mat at hx[order(rownames(mat at hx)), ] # calculate duplicate correlation between the 4 replicates on each array corfit <- duplicateCorrelation(mat, design, ndups=4) # show a boxplot of the correlations boxplot(tanh(corfit$atanh.correlations)) # fit the linear model, including info on duplicates and correlation fit <- lmFit(mat, design, ndups=4, correlation=corfit$consensus) # calculate values using ebayes ebayes <- eBayes(fit) # output a list of top differnetially expressed genes... topTable(ebayes, coef = 2, adjust = "BH", n = 100) On Mon, May 19, 2008 at 1:35 PM, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: > Not the most experienced in this area, but I think your design is pretty > much like section 8.5 in the Limma's user guide. > > "There are two major ways in which this comparison can be made. > We can > 1. create a design matrix which includes a coefficient for the mutant > vs wild type difference, > or > 2. create a design matrix which includes separate coefficients for wild > type and mutant mice and then extract the difference as a contrast." > > So in your situation you would have either >> design > plus plusvsminus > a 1 0 > b 1 0 > c 1 0 > d 1 0 > e 1 1 > f 1 1 > g 1 1 >> fit <- lmFit(eset, design) >> fit <- eBayes(fit) >> topTable(fit, coef="plusvsminus", adjust="BH") > > or > plus minus > a 1 0 > b 1 0 > c 1 0 > d 1 0 > e 0 1 > f 0 1 > g 0 1 >> fit <- lmFit(eset, design) >> cont.matrix <- makeContrasts(plusvsminus=plus-minus, levels=design) >> fit2 <- contrasts.fit(fit, cont.matrix) >> fit2 <- eBayes(fit2) >> topTable(fit2, adjust="BH") > > Hope that helps. > > Paul Geeleher wrote: >> Ok I'm close to having this all sorted and have the >> duplicateCorrelation() function working. For posterity I'd be happy to >> post a the code and detailed explanation from everything I've done to >> the mailing list. It should be useful for anyone doing similar MiRNA >> analysis in future. My last question is regarding the design matrix, >> I'm not sure if I'm creating it properly. >> >> I have 7 arrays, a, b, c, d, e, f and g. I want to measure >> differential expression of arrays a, b, c & d against e, f & g. >> >> In some of the tutorials in the limmaUsersGuide, this design matrix is >> simply created as follows: >> >> design <- c(-1, -1, -1, -1, 1, 1, 1) >> >> I've also seen it created using methods like this: >> >> pData <- data.frame(population = c('HER2+', 'HER2+', 'HER2+', 'HER2+', >> 'HER2-', 'HER2-', 'HER2-')) >> rownames(pData) <- RG$targets$FileName >> design <- model.matrix(~factor(pData$population)) >> >> >> These two different methods give me very different p-values come the >> end of analysis and I'm wondering what exactly I should be doing? >> >> Thanks for any advice, >> >> -Paul. >> >> On Thu, May 15, 2008 at 1:40 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>> Dear Paul, >>> >>> I have no experience with miRNA arrays, so cannot give you any specific >>> advice. >>> >>> With ordinary expression arrays, my practice is to keep the probes separate, >>> especially if they actually have differing sequences. There's no good >>> summarisation method which feeds into the topTable format. I convert to >>> genes or transcript only at the interpretation stage. The only exception >>> are within array replicates which satisfy the (restrictive) assumptions of >>> the duplicateCorrelation() function. >>> >>> Best wishes >>> Gordon >>> >>> On Wed, 14 May 2008, Paul Geeleher wrote: >>> >>>> Hi Gordon, >>>> >>>> Thanks for you email. I've followed your steps and am getting some output >>>> now. >>>> >>>> One problem though. When should the summarization step occur? What I >>>> mean is that, between miRNA and control signals, my GPR file contains >>>> about 3000 entries and when I am done with analysis topTable will >>>> return all of these individually. But many of the miRNAs have multiple >>>> entries in the ".gpr" file. So how, and when, should I go about >>>> combining these into one value? >>>> >>>> Thanks in advance, >>>> -Paul >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > ************************************************************** > > Daniel Brewer > > Institute of Cancer Research > Molecular Carcinogenesis > MUCRC > 15 Cotswold Road > Sutton, Surrey SM2 5NG > United Kingdom > > Tel: +44 (0) 20 8722 4109 > Fax: +44 (0) 20 8722 4141 > > Email: daniel.brewer at icr.ac.uk > > ************************************************************** > > The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > > This e-mail message is confidential and for use by the...{{dropped:3}}
0
Entering edit mode
On Mon, 19 May 2008, Paul Geeleher wrote: > Ok I'm close to having this all sorted and have the > duplicateCorrelation() function working. For posterity I'd be happy to > post a the code and detailed explanation from everything I've done to > the mailing list. It should be useful for anyone doing similar MiRNA > analysis in future. My last question is regarding the design matrix, > I'm not sure if I'm creating it properly. > > I have 7 arrays, a, b, c, d, e, f and g. I want to measure > differential expression of arrays a, b, c & d against e, f & g. > > In some of the tutorials in the limmaUsersGuide, this design matrix is > simply created as follows: > > design <- c(-1, -1, -1, -1, 1, 1, 1) This design is strictly for two colour arrays with dye-swaps where the two RNA sources to be compared are on the two channels of the same arrays. This doesn't seem to be your situation. (Are you taking this suggestion from the User's Guide out of context?) > I've also seen it created using methods like this: > > pData <- data.frame(population = c('HER2+', 'HER2+', 'HER2+', 'HER2+', > 'HER2-', 'HER2-', 'HER2-')) > rownames(pData) <- RG$targets$FileName > design <- model.matrix(~factor(pData\$population)) This is more likely to be appropriate for your experiment. Best wishes Gordon > These two different methods give me very different p-values come the > end of analysis and I'm wondering what exactly I should be doing? > > Thanks for any advice, > > -Paul.