Affy vs. cDNA : Low and not expressed genes
3
0
Entering edit mode
@adaikalavan-ramasamy-167
Last seen 9.6 years ago
Dear all, Thank you for the very interesting discussion on the topic of "replicates and low expression levels" in the last few day. I am facing a related problem regarding normalization and would appreciate any advice. A small time course experiment was done on blood macrophages and hybridized to affymetrix chip HGU-133A. There were 3 replicates and 3 time points (0, 2, 48 hour). The main problem is that at time point 0 hour, there are 95 % Absent calls. The percentage of Absent call decreases to 70% in 2 hour and 20% in 2 days. Initially I assumed that there was some physical problem with the array. But later I was corrected by the biologists that it was expected as many genes are not expressed in blood macrophages. Thus most of the 95 % Absent were due to not expressed genes ... Apparently this is common in developmental biology. My first question is how does one normalize this kind of data ? The assumption in two-colour cDNA data of "most of the genes are not differently expressed" does not hold here. Median normalization would not be meaningful in this scenario. We then explored the possibility of using housekeeping genes for normalization. But it seems that the 100 housekeeping genes for HGU- 133A are standard and not specified for our experiment. This is because only 28 of these 100 genes are expressed through out all time points. The biologists have decided to re-do the experiment again and I think they are more likely to hear our advice BEFORE doing the experiments. My second question is this: Will 2-colour cDNA with UHR as reference overcome this problem ? Now I would expect to see most of the un-expressed genes (and previously Absent in affy) to have very negative log ratio values. But I don't think the assumption of "most genes are not differentially expressed" will hold again. And how does one deal with this ... My last question is has anyone done a comparison of Affymetrix to cDNA results/efficiency/advantages. I am interested in quantifying the benefits of spending 5 times as much money on something that has typically 40% absent calls. Thank you very much in advance. Regards, Adai.
Normalization Normalization • 1.1k views
ADD COMMENT
0
Entering edit mode
Park, Richard ▴ 220
@park-richard-227
Last seen 9.6 years ago
Dear Adai, I can not tell you what is the correct way of doing analysis, but maybe I can give you a little insight by telling you how I do analysis on my own chips. I am computational biologist and I have been doing most of my labs microarray analysis for the past year. We use affymetrix chips in our lab and we do not remove any genes before normalization. Our types of experiments run from wt vs ko, various time course treatments, as well as comparing various cell types from various cell sorts. If you want to go along the idea of removing absent genes from your chips, you shouldn't be removing all of the absent genes for each time point, you should remove only those genes that are absent at all 3 points and then perhaps normalize. However, I have not been a big fan of the absent, present, and marginal calls, ever since I moved away from the affymetrix 5.0 processing to RMA processing. Whenever I analyze a microarray experiment I always normalize everything together, this allows me to see the big picture of everything and from that point I may start filtering. Also, removing genes so early in the analysis restricts your ability to determine the quality of your replicates b/c you have less points to reference. There are also various methods of coming up with values at each time point: the standard is to avg each of the replicates, you can also run an outlyer elimination algorithm for the time points, and thirdly (for time course experiments), I have been testing out a way of using a loess method to use information from each of the time points to calculate a spline to come up with a value. After you have a single value for each of the time points for each gene, the next logical step would be to calculate fold change values between each of the points. Also, with 3 replicates you can also calculate the p-values between each of the time points (however you should keep in mind that p-values are not a better indication of what is goin on compared to fold change up until you have at least >8 replicates for each time point (from terry speed's website). For some of my recent time course analyses, I have found fold change vs fold change plots very informative. Plotting the various combinations allows you to see what genes are being differently expressed between the time points. Other plots that are informative are MvsA plots (log fold change vs avg expression value), as well as volcano plots (fc vs pvalues). At this point I make various lists of genes based on the various plots, and then highlight these lists in related graphs. (I use an in- house method of plotting gene lists i.e. b-cell related genes, nk cell related genes onto microarray plots). This allows me to combine the biology of pathways with this type of microarray analysis. At this point, people tend to spend a lot of time researching the gene lists on pubmed using unigene, and locus link ids and try to create a picture of what is going on. We also create random data sets based on the microarry data to use as a reference point as a confirmation of our results. I hope this helps, Richard Park Computational Data Analyzer Joslin Diabetes Center -----Original Message----- From: Adaikalavan Ramasamy [mailto:gisar@nus.edu.sg] Sent: Tuesday, June 03, 2003 6:36 AM To: bioconductor@stat.math.ethz.ch Subject: [BioC] Affy vs. cDNA : Low and not expressed genes Dear all, Thank you for the very interesting discussion on the topic of "replicates and low expression levels" in the last few day. I am facing a related problem regarding normalization and would appreciate any advice. A small time course experiment was done on blood macrophages and hybridized to affymetrix chip HGU-133A. There were 3 replicates and 3 time points (0, 2, 48 hour). The main problem is that at time point 0 hour, there are 95 % Absent calls. The percentage of Absent call decreases to 70% in 2 hour and 20% in 2 days. Initially I assumed that there was some physical problem with the array. But later I was corrected by the biologists that it was expected as many genes are not expressed in blood macrophages. Thus most of the 95 % Absent were due to not expressed genes ... Apparently this is common in developmental biology. My first question is how does one normalize this kind of data ? The assumption in two-colour cDNA data of "most of the genes are not differently expressed" does not hold here. Median normalization would not be meaningful in this scenario. We then explored the possibility of using housekeeping genes for normalization. But it seems that the 100 housekeeping genes for HGU- 133A are standard and not specified for our experiment. This is because only 28 of these 100 genes are expressed through out all time points. The biologists have decided to re-do the experiment again and I think they are more likely to hear our advice BEFORE doing the experiments. My second question is this: Will 2-colour cDNA with UHR as reference overcome this problem ? Now I would expect to see most of the un-expressed genes (and previously Absent in affy) to have very negative log ratio values. But I don't think the assumption of "most genes are not differentially expressed" will hold again. And how does one deal with this ... My last question is has anyone done a comparison of Affymetrix to cDNA results/efficiency/advantages. I am interested in quantifying the benefits of spending 5 times as much money on something that has typically 40% absent calls. Thank you very much in advance. Regards, Adai. _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT
0
Entering edit mode
Laurent Gautier ★ 2.3k
@laurent-gautier-29
Last seen 9.6 years ago
On Tue, Jun 03, 2003 at 06:35:54PM +0800, Adaikalavan Ramasamy wrote: > Dear all, > > Thank you for the very interesting discussion on the topic of > "replicates and low expression levels" in the last few day. I am facing > a related problem regarding normalization and would appreciate any > advice. > > A small time course experiment was done on blood macrophages and > hybridized to affymetrix chip HGU-133A. There were 3 replicates and 3 > time points (0, 2, 48 hour). > > The main problem is that at time point 0 hour, there are 95 % Absent > calls. The percentage of Absent call decreases to 70% in 2 hour and 20% > in 2 days. Initially I assumed that there was some physical problem with > the array. But later I was corrected by the biologists that it was > expected as many genes are not expressed in blood macrophages. Thus most > of the 95 % Absent were due to not expressed genes ... Apparently this > is common in developmental biology. > > My first question is how does one normalize this kind of data ? The > assumption in two-colour cDNA data of "most of the genes are not > differently expressed" does not hold here. Median normalization would > not be meaningful in this scenario. One strategy I have been using goes like: - normalize the replicates from each time point independently (with the affy package, use 'split.AffyBatch' and 'normalize'). The method of normalization you prefer is welcome, I would be tempted to use a quantiles based one. - merge the 3 normalized chips (with the pack affy, use 'merge.AffyBatch') and look at the distribution of the intensities (with the package affy, something like 'hist(myaffybatch)' should do the job. I would advice to use the parameter 'col' in the function call to color the densities according to the time point they belong to. If you are lucky, the "leftmost mode" for each density will be at about the same location and you can go on. If you are bit less lucky, you will have to use normalize with the method "constant" to bring the "leftmost" modes at the same locations (use the optional parameter 'FUN' to make a function that gets that mode for each chip). This should make it. (note: of course this will probably give you *a lot* of false positives. A constrained non-linear tranformation would perform better. I explored that a bit... I hope to put things together and come with some code for BioC.. sometimes... ). > > We then explored the possibility of using housekeeping genes for > normalization. But it seems that the 100 housekeeping genes for HGU- 133A > are standard and not specified for our experiment. This is because only > 28 of these 100 genes are expressed through out all time points. > I remember having a real hard-time with house-keeping genes and cDNA. I cannot tell with Affymetrix arrays, but I would be very careful using them for normalization purposes. > > The biologists have decided to re-do the experiment again and I think > they are more likely to hear our advice BEFORE doing the experiments. My > second question is this: Will 2-colour cDNA with UHR as reference > overcome this problem ? > > Now I would expect to see most of the un-expressed genes (and previously > Absent in affy) to have very negative log ratio values. But I don't > think the assumption of "most genes are not differentially expressed" > will hold again. And how does one deal with this ... Give a go to the suggestion and tell us if it makes sense in your case. > > My last question is has anyone done a comparison of Affymetrix to cDNA > results/efficiency/advantages. I am interested in quantifying the > benefits of spending 5 times as much money on something that has > typically 40% absent calls. Thank you very much in advance. > > Regards, Adai. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- -------------------------------------------------------------- currently at the National Yang-Ming University in Taipei, Taiwan -------------------------------------------------------------- Laurent Gautier CBS, Building 208, DTU PhD. Student DK-2800 Lyngby,Denmark tel: +45 45 25 24 89 http://www.cbs.dtu.dk/laurent
ADD COMMENT
0
Entering edit mode
@adaikalavan-ramasamy-167
Last seen 9.6 years ago
Thank you Stephen Henderson, Richard Park and Laurent Gautier for their extremely helpful suggestions. The biologist like the idea of FC-FC plot but I decided to present then t-values plot in order to account for the variations (after some convincing that t-values do account for FC wrt to variation, sigh). Given the fact that there were only 3 replicates, I think I have salvaged as much as could be. Thank you again !
ADD COMMENT

Login before adding your answer.

Traffic: 722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6