Disorderly duplicate spots

0

Entering edit mode

Yannick Wurm ▴ 220

@yannick-wurm-2314

Last seen 11.4 years ago

Dear List, I am a graduate student working with the fire ant Solenopsis invicta. We did some two-color cDNA microarrays that I've begun analyzing with limma. But something feels wrong about how I'm doing things: we printed whole clones from a ~25,000 clone cDNA library onto our microarray. Simultaneously, we sequenced our clones. They assemble to ~12,000 transcripts. Many are singlets, but some transcripts are represented by multiple clones (one transcript is represented by 32 clones!). So during analysis, treating each clone as independent feels wrong. It means: - correcting for 25,000 multiple tests rather than 10,000, thus reducing my power; - and not taking into account the technical replication we get by multiple spots on the array. The limma manual has a section on Within-Array Replicate Spots. But only mentions what to do for people who have a single duplicate of every spot on their array. I'm sure other people have had to deal with this in the past. Do you have any pointers? Thanks & regards, Yannick -------------------------------------------- yannick . wurm @ unil . ch Ant Genomics, Ecology & Evolution @ Lausanne http://www.unil.ch/dee/page28685_fr.html

limma limma • 1.8k views

ADD COMMENT • link updated 18.0 years ago by Naomi Altman ★ 6.0k • written 18.0 years ago by Yannick Wurm ▴ 220

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 17 hours ago

United States

Hi Yannick, Yannick Wurm wrote: > Dear List, > > I am a graduate student working with the fire ant Solenopsis invicta. > We did some two-color cDNA microarrays that I've begun analyzing with > limma. But something feels wrong about how I'm doing things: we > printed whole clones from a ~25,000 clone cDNA library onto our > microarray. Simultaneously, we sequenced our clones. They assemble to > ~12,000 transcripts. Many are singlets, but some transcripts are > represented by multiple clones (one transcript is represented by 32 > clones!). > > So during analysis, treating each clone as independent feels wrong. > It means: > - correcting for 25,000 multiple tests rather than 10,000, thus > reducing my power; > - and not taking into account the technical replication we get by > multiple spots on the array. > > The limma manual has a section on Within-Array Replicate Spots. But > only mentions what to do for people who have a single duplicate of > every spot on their array. > > I'm sure other people have had to deal with this in the past. Do you > have any pointers? This is a not uncommon question on this list, so a search of the list archives might be useful. However, to summarize what has been discussed: 1.) limma isn't designed to handle variable numbers of technical replicates. 2.) The alternatives are all suboptimal. You can average the technical replicate spots, but this will reduce the variability of the averaged measurements and may increase the likelihood that they will appear significant. You can ignore the replication (which is what most people do with Affy analyses, BTW), but as you noted this will increase your multiplicity. 3.) Filtering your data using some agnostic criteria such as IQR, variance, etc will help with the multiplicity, but there isn't a clean way to address variable numbers of technical replicates. Best, Jim > > Thanks & regards, > > Yannick > > > -------------------------------------------- > yannick . wurm @ unil . ch > Ant Genomics, Ecology & Evolution @ Lausanne > http://www.unil.ch/dee/page28685_fr.html > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD COMMENT • link 18.0 years ago James W. MacDonald 68k

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 4.8 years ago

United States

Dear Yannick, On the whole, most people have equal numbers of duplicates for each gene, and can use the methods discussed in limma. However, we had a situation similar to yours. First, we did a graphical analysis to determine if the expression profile of a clone set was fairly parallel over the arrays. A parallel profile indicates that the assessment of differential expression will be the same for any clone. (Almost all of ours were, and we suspect that some of the others were possibly assembly errors.) Then we picked the clone that was at a reasonably high quantile of the expression distribution. i.e. we did not pick the most highly expressed clone, in case this was due to some type of error. We picked the median, or the clone at the 75th percentile etc. --Naomi At 07:48 AM 1/17/2008, Yannick Wurm wrote: >Dear List, > >I am a graduate student working with the fire ant Solenopsis invicta. >We did some two-color cDNA microarrays that I've begun analyzing with >limma. But something feels wrong about how I'm doing things: we >printed whole clones from a ~25,000 clone cDNA library onto our >microarray. Simultaneously, we sequenced our clones. They assemble to >~12,000 transcripts. Many are singlets, but some transcripts are >represented by multiple clones (one transcript is represented by 32 >clones!). > >So during analysis, treating each clone as independent feels wrong. >It means: > - correcting for 25,000 multiple tests rather than 10,000, thus >reducing my power; > - and not taking into account the technical replication we get by >multiple spots on the array. > >The limma manual has a section on Within-Array Replicate Spots. But >only mentions what to do for people who have a single duplicate of >every spot on their array. > >I'm sure other people have had to deal with this in the past. Do you >have any pointers? > >Thanks & regards, > >Yannick > > >-------------------------------------------- > yannick . wurm @ unil . ch >Ant Genomics, Ecology & Evolution @ Lausanne > http://www.unil.ch/dee/page28685_fr.html > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 18.0 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Hi Naomi & Jim, thanks for your replies! I'll look into doing something along the same lines as you did Naomi. Have a wonderful weekend, Yannick On Jan 17, 2008, at 16:00 , Naomi Altman wrote: > Dear Yannick, > On the whole, most people have equal numbers of duplicates for each > gene, and can use the methods discussed in limma. > > However, we had a situation similar to yours. > > First, we did a graphical analysis to determine if the expression > profile of a clone set was fairly parallel over the arrays. A > parallel profile indicates that the assessment of differential > expression will be the same for any clone. (Almost all of ours were, > and we suspect that some of the others were possibly assembly > errors.) Then we picked the clone that was at a reasonably high > quantile of the expression distribution. i.e. we did not pick the > most highly expressed clone, in case this was due to some type of > error. We picked the median, or the clone at the 75th percentile etc. > > --Naomi > > At 07:48 AM 1/17/2008, Yannick Wurm wrote: >> Dear List, >> >> I am a graduate student working with the fire ant Solenopsis invicta. >> We did some two-color cDNA microarrays that I've begun analyzing with >> limma. But something feels wrong about how I'm doing things: we >> printed whole clones from a ~25,000 clone cDNA library onto our >> microarray. Simultaneously, we sequenced our clones. They assemble to >> ~12,000 transcripts. Many are singlets, but some transcripts are >> represented by multiple clones (one transcript is represented by 32 >> clones!). >> >> So during analysis, treating each clone as independent feels wrong. >> It means: >> - correcting for 25,000 multiple tests rather than 10,000, >> thus >> reducing my power; >> - and not taking into account the technical replication we >> get by >> multiple spots on the array. >> >> The limma manual has a section on Within-Array Replicate Spots. But >> only mentions what to do for people who have a single duplicate of >> every spot on their array. >> >> I'm sure other people have had to deal with this in the past. Do you >> have any pointers? >> >> Thanks & regards, >> >> Yannick >> >> >> -------------------------------------------- >> yannick . wurm @ unil . ch >> Ant Genomics, Ecology & Evolution @ Lausanne >> http://www.unil.ch/dee/page28685_fr.html >>

ADD REPLY • link 18.0 years ago Yannick Wurm ▴ 220

0

Entering edit mode

Friends, We used an Affymetrix microarray with about 45 000 genes. We have 4 groups with 3 arrays. How many genes should I except after par wise filtering (simpleaffy)? I know it depends on the parameters and stringency, but I want to know an average, or the minimum, to perform a statistical analysis. What algorithm do you think its better: RMA or MAS 5? Thanks and regards, Patr?cia > Hi Naomi & Jim, > > thanks for your replies! > I'll look into doing something along the same lines as you did Naomi. > > Have a wonderful weekend, > > Yannick > > On Jan 17, 2008, at 16:00 , Naomi Altman wrote: > >> Dear Yannick, >> On the whole, most people have equal numbers of duplicates for each >> gene, and can use the methods discussed in limma. >> >> However, we had a situation similar to yours. >> >> First, we did a graphical analysis to determine if the expression >> profile of a clone set was fairly parallel over the arrays. A >> parallel profile indicates that the assessment of differential >> expression will be the same for any clone. (Almost all of ours were, >> and we suspect that some of the others were possibly assembly >> errors.) Then we picked the clone that was at a reasonably high >> quantile of the expression distribution. i.e. we did not pick the >> most highly expressed clone, in case this was due to some type of >> error. We picked the median, or the clone at the 75th percentile etc. >> >> --Naomi >> >> At 07:48 AM 1/17/2008, Yannick Wurm wrote: >>> Dear List, >>> >>> I am a graduate student working with the fire ant Solenopsis invicta. >>> We did some two-color cDNA microarrays that I've begun analyzing with >>> limma. But something feels wrong about how I'm doing things: we >>> printed whole clones from a ~25,000 clone cDNA library onto our >>> microarray. Simultaneously, we sequenced our clones. They assemble to >>> ~12,000 transcripts. Many are singlets, but some transcripts are >>> represented by multiple clones (one transcript is represented by 32 >>> clones!). >>> >>> So during analysis, treating each clone as independent feels wrong. >>> It means: >>> - correcting for 25,000 multiple tests rather than 10,000, >>> thus >>> reducing my power; >>> - and not taking into account the technical replication we >>> get by >>> multiple spots on the array. >>> >>> The limma manual has a section on Within-Array Replicate Spots. But >>> only mentions what to do for people who have a single duplicate of >>> every spot on their array. >>> >>> I'm sure other people have had to deal with this in the past. Do you >>> have any pointers? >>> >>> Thanks & regards, >>> >>> Yannick >>> >>> >>> -------------------------------------------- >>> yannick . wurm @ unil . ch >>> Ant Genomics, Ecology & Evolution @ Lausanne >>> http://www.unil.ch/dee/page28685_fr.html >>> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -------------------------- > Esta mensagem foi verificada > pelo sistema de antiv?rus DIM e > acredita-se estar livre de Virus. > Patr?cia Luiza Nunes da Costa Laborat?rio de Oncologia Experimental, Grupo de Ades?o Celular Faculdade de Medicina da Universidade de Paulo-FM USP Av. Dr. Arnaldo, 455 sala 4112 Cerqueira Cesar Cep 01246-903 Tel: (11) 3061-7486 e (11) 8202-7073 -------------------------- Esta mensagem foi verificada pelo sistema de antiv?rus DIM e acredita-se estar livre de Virus.

ADD REPLY • link 18.0 years ago Patrícia Luiza Nunes da Costa ▴ 90

0

Entering edit mode

Dear Patricia, Please do not hijack a thread and ask a different question to the subject line. How many genes depends on your arrays quality and the biology of your experiment. There will be some genes that are silent across all of the conditions. Why don't you plot the mean expression and variance of all probesets to see what those distributions look like? RMA is generally regarded as better than MAS5. Cheers, Alex -------------------------------------------- Alex C. Lam Roslin Institute (Edinburgh) Midlothian EH25 9PS United Kingdom Tel: +44 131 5274471 Roslin Institute is a company limited by guarantee, registered in Scotland (registered number SC157100) and a Scottish Charity (registered number SC023592). Our registered office is at Roslin, Midlothian, EH25 9PS. VAT registration number 847380013. The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Patr?cia Luiza Nunes da Costa Sent: 18 January 2008 12:50 To: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Disorderly duplicate spots Friends, We used an Affymetrix microarray with about 45 000 genes. We have 4 groups with 3 arrays. How many genes should I except after par wise filtering (simpleaffy)? I know it depends on the parameters and stringency, but I want to know an average, or the minimum, to perform a statistical analysis. What algorithm do you think its better: RMA or MAS 5? Thanks and regards, Patr?cia > Hi Naomi & Jim, > > thanks for your replies! > I'll look into doing something along the same lines as you did Naomi. > > Have a wonderful weekend, > > Yannick > > On Jan 17, 2008, at 16:00 , Naomi Altman wrote: > >> Dear Yannick, >> On the whole, most people have equal numbers of duplicates for each >> gene, and can use the methods discussed in limma. >> >> However, we had a situation similar to yours. >> >> First, we did a graphical analysis to determine if the expression >> profile of a clone set was fairly parallel over the arrays. A >> parallel profile indicates that the assessment of differential >> expression will be the same for any clone. (Almost all of ours were, >> and we suspect that some of the others were possibly assembly >> errors.) Then we picked the clone that was at a reasonably high >> quantile of the expression distribution. i.e. we did not pick the >> most highly expressed clone, in case this was due to some type of >> error. We picked the median, or the clone at the 75th percentile etc. >> >> --Naomi >> >> At 07:48 AM 1/17/2008, Yannick Wurm wrote: >>> Dear List, >>> >>> I am a graduate student working with the fire ant Solenopsis invicta. >>> We did some two-color cDNA microarrays that I've begun analyzing >>> with limma. But something feels wrong about how I'm doing things: we >>> printed whole clones from a ~25,000 clone cDNA library onto our >>> microarray. Simultaneously, we sequenced our clones. They assemble >>> to ~12,000 transcripts. Many are singlets, but some transcripts are >>> represented by multiple clones (one transcript is represented by 32 >>> clones!). >>> >>> So during analysis, treating each clone as independent feels wrong. >>> It means: >>> - correcting for 25,000 multiple tests rather than 10,000, >>> thus reducing my power; >>> - and not taking into account the technical replication we >>> get by multiple spots on the array. >>> >>> The limma manual has a section on Within-Array Replicate Spots. But >>> only mentions what to do for people who have a single duplicate of >>> every spot on their array. >>> >>> I'm sure other people have had to deal with this in the past. Do you >>> have any pointers? >>> >>> Thanks & regards, >>> >>> Yannick >>> >>> >>> -------------------------------------------- >>> yannick . wurm @ unil . ch Ant Genomics, Ecology & >>> Evolution @ Lausanne >>> http://www.unil.ch/dee/page28685_fr.html >>> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -------------------------- > Esta mensagem foi verificada > pelo sistema de antiv?rus DIM e > acredita-se estar livre de Virus. > Patr?cia Luiza Nunes da Costa Laborat?rio de Oncologia Experimental, Grupo de Ades?o Celular Faculdade de Medicina da Universidade de Paulo-FM USP Av. Dr. Arnaldo, 455 sala 4112 Cerqueira Cesar Cep 01246-903 Tel: (11) 3061-7486 e (11) 8202-7073 -------------------------- Esta mensagem foi verificada pelo sistema de antiv?rus DIM e acredita-se estar livre de Virus. _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 18.0 years ago alex lam RI ▴ 310

0

Entering edit mode

Ok Alex I didn?t pay attention... sorry Thanks for your help! Patricia > Dear Patricia, > Please do not hijack a thread and ask a different question to the subject > line. > > How many genes depends on your arrays quality and the biology of your > experiment. There will be some genes that are silent across all of the > conditions. Why don't you plot the mean expression and variance of all > probesets to see what those distributions look like? > > RMA is generally regarded as better than MAS5. > > Cheers, > Alex > > -------------------------------------------- > Alex C. Lam > Roslin Institute (Edinburgh) > Midlothian > EH25 9PS > United Kingdom > Tel: +44 131 5274471 > > Roslin Institute is a company limited by guarantee, registered in Scotland > (registered number SC157100) and a Scottish Charity (registered number > SC023592). Our registered office is at Roslin, Midlothian, EH25 9PS. VAT > registration number 847380013. > > The information contained in this e-mail (including any attachments) is > confidential and is intended for the use of the addressee only. The > opinions expressed within this e-mail (including any attachments) are the > opinions of the sender and do not necessarily constitute those of Roslin > Institute (Edinburgh) ("the Institute") unless specifically stated by a > sender who is duly authorised to do so on behalf of the Institute > > > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Patr?cia > Luiza Nunes da Costa > Sent: 18 January 2008 12:50 > To: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Disorderly duplicate spots > > Friends, > > We used an Affymetrix microarray with about 45 000 genes. We have 4 groups > with 3 arrays. > How many genes should I except after par wise filtering (simpleaffy)? I > know it depends on the parameters and stringency, but I want to know an > average, or the minimum, to perform a statistical analysis. > What algorithm do you think its better: RMA or MAS 5? > > Thanks and regards, > > Patr?cia > > > > > > >> Hi Naomi & Jim, >> >> thanks for your replies! >> I'll look into doing something along the same lines as you did Naomi. >> >> Have a wonderful weekend, >> >> Yannick >> >> On Jan 17, 2008, at 16:00 , Naomi Altman wrote: >> >>> Dear Yannick, >>> On the whole, most people have equal numbers of duplicates for each >>> gene, and can use the methods discussed in limma. >>> >>> However, we had a situation similar to yours. >>> >>> First, we did a graphical analysis to determine if the expression >>> profile of a clone set was fairly parallel over the arrays. A >>> parallel profile indicates that the assessment of differential >>> expression will be the same for any clone. (Almost all of ours were, >>> and we suspect that some of the others were possibly assembly >>> errors.) Then we picked the clone that was at a reasonably high >>> quantile of the expression distribution. i.e. we did not pick the >>> most highly expressed clone, in case this was due to some type of >>> error. We picked the median, or the clone at the 75th percentile etc. >>> >>> --Naomi >>> >>> At 07:48 AM 1/17/2008, Yannick Wurm wrote: >>>> Dear List, >>>> >>>> I am a graduate student working with the fire ant Solenopsis invicta. >>>> We did some two-color cDNA microarrays that I've begun analyzing >>>> with limma. But something feels wrong about how I'm doing things: we >>>> printed whole clones from a ~25,000 clone cDNA library onto our >>>> microarray. Simultaneously, we sequenced our clones. They assemble >>>> to ~12,000 transcripts. Many are singlets, but some transcripts are >>>> represented by multiple clones (one transcript is represented by 32 >>>> clones!). >>>> >>>> So during analysis, treating each clone as independent feels wrong. >>>> It means: >>>> - correcting for 25,000 multiple tests rather than 10,000, >>>> thus reducing my power; >>>> - and not taking into account the technical replication we >>>> get by multiple spots on the array. >>>> >>>> The limma manual has a section on Within-Array Replicate Spots. But >>>> only mentions what to do for people who have a single duplicate of >>>> every spot on their array. >>>> >>>> I'm sure other people have had to deal with this in the past. Do you >>>> have any pointers? >>>> >>>> Thanks & regards, >>>> >>>> Yannick >>>> >>>> >>>> -------------------------------------------- >>>> yannick . wurm @ unil . ch Ant Genomics, Ecology & >>>> Evolution @ Lausanne >>>> http://www.unil.ch/dee/page28685_fr.html >>>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -------------------------- >> Esta mensagem foi verificada >> pelo sistema de antiv?rus DIM e >> acredita-se estar livre de Virus. >> > > > Patr?cia Luiza Nunes da Costa > Laborat?rio de Oncologia Experimental, Grupo de Ades?o Celular Faculdade > de Medicina da Universidade de Paulo-FM USP Av. Dr. Arnaldo, 455 sala 4112 > Cerqueira Cesar Cep 01246-903 > Tel: (11) 3061-7486 e (11) 8202-7073 > > > > -------------------------- > Esta mensagem foi verificada > pelo sistema de antiv?rus DIM e > acredita-se estar livre de Virus. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -------------------------- > Esta mensagem foi verificada > pelo sistema de antiv?rus DIM e > acredita-se estar livre de Virus. > Patr?cia Luiza Nunes da Costa Laborat?rio de Oncologia Experimental, Grupo de Ades?o Celular Faculdade de Medicina da Universidade de Paulo-FM USP Av. Dr. Arnaldo, 455 sala 4112 Cerqueira Cesar Cep 01246-903 Tel: (11) 3061-7486 e (11) 8202-7073 -------------------------- Esta mensagem foi verificada pelo sistema de antiv?rus DIM e acredita-se estar livre de Virus.

ADD REPLY • link 18.0 years ago Patrícia Luiza Nunes da Costa ▴ 90

Login before adding your answer.