topTable

0

Entering edit mode

Lev Soinov ▴ 470

@lev-soinov-2119

Last seen 9.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070828/ 5b3d6bc1/attachment.pl

• 590 views

ADD COMMENT • link updated 16.7 years ago by Jenny Drnevich ★ 2.2k • written 16.7 years ago by Lev Soinov ▴ 470

0

Entering edit mode

Jenny Drnevich ★ 2.2k

@jenny-drnevich-382

Last seen 9.6 years ago

Hi Lev, I think you are a little fixated on removing probes that are "bad" in one of your two contrasts. I don't think it's that serious of an issue, and I don't know anyone else who worries about it either. Especially since as you mention, there are not that many "bad" probes. It's highly unlikely that they would be significant anyway, so I don't see why you are so set on removing them. At most, I would only worry about checking significant genes in each contrast. Even if they slipped through, you are expecting some false positives in your list anyway, so I don't think they would radically affect the conclusions drawn from the lists. You're analysis steps 1-4 are fine, and I would stop there. That's my 2 cents, Jenny > I do some analysis in LIMMA and would be very grateful for your comments. > I have three treatments: 1, 2 and 3, comparing 2vs.1 and 3vs.1. > Then I analyse the created lists further, identifying genes that > are different/similar between the contrasts. As suggested earlier > on this Lists I: > 1. normalise using ALL the data; > 2. filter out probes which are not expressed across ALL > treatments 1, 2 and 3; > 3. run LIMMA on the filtered data; > 4. produce two gene lists for the two contrasts 2vs1 and 3vs1, > using topTable. > > To take the full advantage of LIMMA, in the above steps 3 and 4, > I process the data for all treatments together: > design <- model.matrix(~0 +factor(c(1,1,1,2,2,2,3,3,3))) > colnames(design) <- c("group1", "group2", "group3") > contrast.matrix <- makeContrasts(group2-group1, > group3-group1,levels=design) > fit <- lmFit(data_normalised_filtered, design) > fit2 <- contrasts.fit(fit, contrast.matrix) > fit2 <- eBayes(fit2) > topTable(fit2, coef=1, adjust="BH") > topTable(fit2, coef=2, adjust="BH") > > This means that some probes may have meaningless results for one > of the two contrasts. For example, if probe A is "not expressed" in > 1 and 2, but is "expressed" in 3, it will be kept in the analysis > (step 2), but obviously its fold change or p-values will be > meaningless for the 2vs.1 comparison (because we are comparing > noise vs. noise here). Recognising this, as the 5th step of my > procedure (after running topTable), I remove probes such as A from > the topTable results for the comparison 2vs.1, but keep them in the > results for the comparison 3vs.1. > So, for example, the topTable for the contrast 2vs.1: > ID logFC t P.Value adj.P.Val B > X -3.58 -14.19 1.068322e-06 0.0164 3.839 > Y -4.71 -13.02 2.000032e-06 0.0164 3.589 > A -2.52 -11.94 3.721566e-06 0.0203 3.315 > Z -2.19 -11.17 5.993895e-06 0.0222 3.086 > Will become: > ID logFC t P.Value adj.P.Val B > X -3.58 -14.19 1.068322e-06 0.0164 3.839 > Y -4.71 -13.02 2.000032e-06 0.0164 3.589 > Z -2.19 -11.17 5.993895e-06 0.0222 3.086 > > The other way to make comparisons 2vs.1 and 3vs.1 would be to > process them separately, doing filtering for each pair separately > as well. But then it would decrease the power. > I realise that keeping such partially "bad" probes (probes that > are "bad" in one comparison, but are "good" in the other) and > removing them after running the topTable can adversely affect > "good" probes. It can happen either through eBayes or through the > multiple testing correction. My perception is that it would not > affect the results a lot, because the "bad" probes are not > numerous. Besides, probe rankings should remain the same. > Would you say that what I described above is a sensible way to go? > > Looking forward to your replies, > Lev. > > >--------------------------------- > > [[alternative HTML version deleted]] > > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD COMMENT • link 16.7 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070828/ eda7719f/attachment.pl

ADD REPLY • link 16.7 years ago Lev Soinov ▴ 470

0

Entering edit mode

Hi Lev, I agree with Jenny. I think you're worrying too much. For a gene that is not expressed on treatments 1 and 2, and is strongly expressed on 3, you'll expect upregulation when comparing 3 vs 1, as you say. But if comparing 1 vs 2 you'll get WKW ("who-knows-what", as you said), because both will have some low intensity rather than exactly zero and the log ratios can vary wildly. I think you'll find, as Jenny suggested, that regardless the measured log ratio, that particular gene will *not* be classified as "differentially expressed" for 1 vs. 2 (the P value is likely to be quite high for that gene in that contrast). So you will be able to compare contrasts and select the genes that you want. You're never going to pick absolutely *everything*, because there is always some error, and the more you pick, the more mistakes you make. P values won't tell you if a gene is DE (differentially expressed) or not, but it'll give you an idea of how likely you are to make a mistake by calling it DE or not DE... ranking a list of genes by P value first, and THEN looking at log ratios will allow you to eliminate pretty much every case [1] of "no expression but high log ratio" that you're concerned about. In my work, I'm mostly trying to find genes that have no expression in a particular situation, but that can be activated after a certain treatment, so I have to deal with the same issues you are talking about. The only spots I remove are the ones that do not pass an intensity threshold in BOTH channels, and in ALL slides, because they just add noise and do not contribute anything useful. Anything else stays, and I think my analyses work reasonably well so far. Jose [1] you may still get some odd things, partly depending on whether you use background correction or not, and how good the method you use is (if you do)... I always do a "clean up" check, looking at actual intensities, signal-to-noise ratios (given by the scanning software for every spot) etc... Quoting Lev Soinov <lev_embl1 at="" yahoo.co.uk="">: > Hi Jenny, > > I would not worry about this at all, if I did not have an > objective to compare the contrasts afterwards. For example, I need > to identify those genes that are regulated in one contrast, but are > not differentially expressed in the other one. In the example that I > gave in my previous e-mail, probe A is "not expressed" in the > treatments 1 and 2, but is "expressed" in 3. This means that I will > get upregulation in 3vs.1 and who-knows-what in 2vs.1. To avoid such > situations, I remove A from the topTable list for 2vs.1 and > conclude that A is upregulated in 3vs.1, but cannot be classified > in 2vs.1. If I do not remove it from the 2vs.1 list, I may end up > with a lot spurious results/false conclusions. > Does it sound reasonable? > > With kind regards, > Lev. > > > > > > Jenny Drnevich <drnevich at="" uiuc.edu=""> wrote: > Hi Lev, > > I think you are a little fixated on removing probes that are "bad" in > one of your two contrasts. I don't think it's that serious of an > issue, and I don't know anyone else who worries about it either. > Especially since as you mention, there are not that many "bad" > probes. It's highly unlikely that they would be significant anyway, > so I don't see why you are so set on removing them. At most, I would > only worry about checking significant genes in each contrast. Even if > they slipped through, you are expecting some false positives in your > list anyway, so I don't think they would radically affect the > conclusions drawn from the lists. You're analysis steps 1-4 are > fine, and I would stop there. > > That's my 2 cents, > Jenny > >> I do some analysis in LIMMA and would be very grateful for your comments. >> I have three treatments: 1, 2 and 3, comparing 2vs.1 and 3vs.1. >> Then I analyse the created lists further, identifying genes that >> are different/similar between the contrasts. As suggested earlier >> on this Lists I: >> 1. normalise using ALL the data; >> 2. filter out probes which are not expressed across ALL >> treatments 1, 2 and 3; >> 3. run LIMMA on the filtered data; >> 4. produce two gene lists for the two contrasts 2vs1 and 3vs1, >> using topTable. >> >> To take the full advantage of LIMMA, in the above steps 3 and 4, >> I process the data for all treatments together: >> design <- model.matrix(~0 +factor(c(1,1,1,2,2,2,3,3,3))) >> colnames(design) <- c("group1", "group2", "group3") >> contrast.matrix <- makeContrasts(group2-group1, >> group3-group1,levels=design) >> fit <- lmFit(data_normalised_filtered, design) >> fit2 <- contrasts.fit(fit, contrast.matrix) >> fit2 <- eBayes(fit2) >> topTable(fit2, coef=1, adjust="BH") >> topTable(fit2, coef=2, adjust="BH") >> >> This means that some probes may have meaningless results for one >> of the two contrasts. For example, if probe A is "not expressed" in >> 1 and 2, but is "expressed" in 3, it will be kept in the analysis >> (step 2), but obviously its fold change or p-values will be >> meaningless for the 2vs.1 comparison (because we are comparing >> noise vs. noise here). Recognising this, as the 5th step of my >> procedure (after running topTable), I remove probes such as A from >> the topTable results for the comparison 2vs.1, but keep them in the >> results for the comparison 3vs.1. >> So, for example, the topTable for the contrast 2vs.1: >> ID logFC t P.Value adj.P.Val B >> X -3.58 -14.19 1.068322e-06 0.0164 3.839 >> Y -4.71 -13.02 2.000032e-06 0.0164 3.589 >> A -2.52 -11.94 3.721566e-06 0.0203 3.315 >> Z -2.19 -11.17 5.993895e-06 0.0222 3.086 >> Will become: >> ID logFC t P.Value adj.P.Val B >> X -3.58 -14.19 1.068322e-06 0.0164 3.839 >> Y -4.71 -13.02 2.000032e-06 0.0164 3.589 >> Z -2.19 -11.17 5.993895e-06 0.0222 3.086 >> >> The other way to make comparisons 2vs.1 and 3vs.1 would be to >> process them separately, doing filtering for each pair separately >> as well. But then it would decrease the power. >> I realise that keeping such partially "bad" probes (probes that >> are "bad" in one comparison, but are "good" in the other) and >> removing them after running the topTable can adversely affect >> "good" probes. It can happen either through eBayes or through the >> multiple testing correction. My perception is that it would not >> affect the results a lot, because the "bad" probes are not >> numerous. Besides, probe rankings should remain the same. >> Would you say that what I described above is a sensible way to go? >> >> Looking forward to your replies, >> Lev. >> >> >> --------------------------------- >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich at uiuc.edu > > > > > --------------------------------- > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

ADD REPLY • link 16.7 years ago J.delasHeras@ed.ac.uk ★ 1.9k

Login before adding your answer.