Question

help needed on avereps function

0

Entering edit mode

Francois Pepin ★ 1.3k

@francois-pepin-1012

Last seen 11.3 years ago

Hi Erika, I'm bringing the discussion back to the list so other people can chime in and so it's archived for future reference. What are you using for the ID argument in avereps? Since the code doesn't seem to work for you (i.e. you still have duplicates), I'm guessing it's not using the proper identifiers. Without any code, it's impossible for us to understand what is happening. As for the lists of differentially expressed genes, you'd have to tell us just how many genes you get with each method and how different the lists are. Methods like Limma borrow information from the other genes when calculating significance, so this could change the p-values. In addition, multiple hypothesis testing will also be affected if you have a different number of probes. So other than guessing, there's not much that we can do. Sending your code (including sessionInfo()) and giving us more details of your results will allow people to get a better idea of what is happening and how to fix it, if necessary. Francois Erika Melissari wrote: > > Dear Dr Pepin, > > sorry to disturb you, but I sent several times an email to Bioconductor > list about some problems that I have using avereps function and no > answer I received. > Perhaps my question is very unimportant for Bioconductor list, but I > noted some uncounted results when I use this function that concerned me > and I do not manage to give an explanation. > If you have a little time and you would like to help me, I would like to > have your opinion about these problems. > As LIMMA help suggested, I use avereps function after normalization and > before using lmFit, that is I perform lmFit with data normalized and > averaged. > I noted two strange results: > 1) I obtain a different list of differentially expressed genes if I use > or not avereps function. If I have well understood this function, his > effect is to average M, A and weights values for spot with the same > probe id code (in my case this is an Agilent code). Why should my > statistical significance change and what list of differentially > expressed genes is right...or more safe? > 2) when I checked the averaged list of genes I found spot not averaged > with the same Probe id. You can see an example of this below. What are > the reason that does not allow for the averaging? > > Maybe the problems that I see are not a consequence of using avereps > function, particularly for the point 1), but should they to be explained > in other terms? > > I apologize again for the disturb that I am causing you and I thank you > in advance for any help you will like to give me. > > Best regards > > Erika > > > Erika Melissari > Ph.D. student > Department of Experimental Pathology, MBIE, > University of Pisa > Santa Chiara Hospital, via Roma 67 > 56126 Pisa > e-mail: erika.melissari at bioclinica.unipi.it > <mailto:erika.melissari at="" bioclinica.unipi.it=""> > ----- Original Message ----- > *From:* Erika Melissari <mailto:erika.melissari at="" bioclinica.unipi.it=""> > *To:* bioconductor at stat.math.ethz.ch > <mailto:bioconductor at="" stat.math.ethz.ch=""> ; Francois Pepin > <mailto:fpepin at="" cs.mcgill.ca=""> > *Sent:* Friday, June 05, 2009 18:23 PM > *Subject:* avereps function > > Dear list, > > I used averep function after normalization and before lmFit to average > spot copies on microarrays. > I noted that since a lot of spots have been averaged (the total number > of spots have been reduced to 41000 from 43000), other spots do not have. > See this example: > > Block Column Row ID Name Sequence ProbeUID GeneName logFC > adj.P.Val B > 1 85 183 A_23_P135769 NM_001101 > TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 > ACTB 0.266302 0.048228 0.181434 > 1 20 393 A_23_P135769 NM_001101 > TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 > ACTB -0.20687 0.068295 -0.6233 > 1 56 294 A_23_P135769 NM_001101 > TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 > ACTB 0.110065 0.382405 -4.54642 > 1 22 299 A_23_P135769 NM_001101 > TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 > ACTB 0.085017 0.405978 -4.66767 > 1 53 457 A_23_P135769 NM_001101 > TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 > ACTB 0.080708 0.483304 -5.0517 > 1 17 39 A_23_P135769 NM_001101 > TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 > ACTB 0.063279 0.710629 -5.73913 > 1 45 199 A_23_P135769 NM_001101 > TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 > ACTB 0.051584 0.778778 -5.87993 > 1 64 279 A_23_P135769 NM_001101 > TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 > ACTB -0.04158 0.800246 -5.91735 > 1 16 358 A_23_P135769 NM_001101 > TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 > ACTB 0.024847 0.880504 -6.03386 > 1 21 435 A_23_P135769 NM_001101 > TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 > ACTB 0.000153 0.999438 -6.11393 > 1 4 111 A_23_P31323 NM_001101 > ACTCTTCCAGCCTTCCTTCCTGGGCATGGAGTCCTGTGGCATCCACGAAACTACCTTCAA 8562 > ACTB 0.283846 0.043577 0.472915 > 1 17 275 A_24_P226554 NM_001101 > GCACCCAGCACAATGAAGATCAAGATCATTGCTCCTCCTGAGCGCAAGTACTCCGTGTGG 21338 > ACTB 0.030637 0.848504 -5.9958 > 1 74 251 A_32_P137939 NM_001101 > AGGCAGCCAGGGCTTACCTGTACACTGACTTGAGACCAGTTGAATAAAAGTGCGCACCTT 19564 > ACTB -0.20387 0.177982 -2.87144 > > > Why the group of first 10 probes was not averaged by avereps? > Any suggestion will be appreciated. > > Thank you so much > > Erika >

Normalization probe limma SANTA Normalization probe limma SANTA • 1.5k views

ADD COMMENT • link updated 16.5 years ago by Erika Melissari ▴ 250 • written 16.5 years ago by Francois Pepin ★ 1.3k

score 0 · Answer 1 · 2009-06-18

Dear Dr Pepin, thank you for your help. My sessionInfo() is: > sessionInfo() R version 2.8.0 (2008-10-20) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_2.16.4 and this is how I used avereps. First I select only "Gene" on my array. isGene<-gal[,7]=="FALSE" MAq_gene<-MAq[isGene,] then, on normalized data (in this case they are quantile normalized data) I use avereps() MAq_av<-avereps(MAq_gene) and finnaly I use lmFit fit<-lmFit(MAq_av,design,weights=MAq_av$weights) Checking MAq_av file, I noted a lot of probe with the same ID not averaged, although x dim of MAq_gene is 43376 and x dim of MAq_av is 41000. I checked MAq_av$genes$ID and they are Agilent PROBE ID. Using avereps I obtained 334 unique PROBE ID (Agilent code), and 541 unique PROBE ID if I do not use it. Only 6 PROBE ID are in common between this two sets. If I use GeneName to check genes in common, they are only 16. I have thought about the possibility that using these two different MA file in lmFit can generate two different lists of differentially expressed genes due to borrowing information from the other genes and, for this reason I chose to use the not averaged list of differentially expressed genes, because LIMMA have much more information to assign significance. However I never expected a so different result. I hope information I give you can help us to understand where the problem is. Thank you for your kind help and for your helpfulness Erika ----- Original Message ----- From: "Francois Pepin" <fpepin@cs.mcgill.ca> To: "Erika Melissari" <erika.melissari at="" bioclinica.unipi.it="">; "BioC" <bioconductor at="" stat.math.ethz.ch=""> Sent: Wednesday, June 17, 2009 00:51 AM Subject: Re: help needed on avereps function > Hi Erika, > > I'm bringing the discussion back to the list so other people can chime in > and so it's archived for future reference. > > What are you using for the ID argument in avereps? Since the code doesn't > seem to work for you (i.e. you still have duplicates), I'm guessing it's > not using the proper identifiers. Without any code, it's impossible for us > to understand what is happening. > > As for the lists of differentially expressed genes, you'd have to tell us > just how many genes you get with each method and how different the lists > are. Methods like Limma borrow information from the other genes when > calculating significance, so this could change the p-values. In addition, > multiple hypothesis testing will also be affected if you have a different > number of probes. > > So other than guessing, there's not much that we can do. Sending your code > (including sessionInfo()) and giving us more details of your results will > allow people to get a better idea of what is happening and how to fix it, > if necessary. > > Francois > > Erika Melissari wrote: >> >> Dear Dr Pepin, >> sorry to disturb you, but I sent several times an email to Bioconductor >> list about some problems that I have using avereps function and no answer >> I received. >> Perhaps my question is very unimportant for Bioconductor list, but I >> noted some uncounted results when I use this function that concerned me >> and I do not manage to give an explanation. >> If you have a little time and you would like to help me, I would like to >> have your opinion about these problems. >> As LIMMA help suggested, I use avereps function after normalization and >> before using lmFit, that is I perform lmFit with data normalized and >> averaged. >> I noted two strange results: >> 1) I obtain a different list of differentially expressed genes if I use >> or not avereps function. If I have well understood this function, his >> effect is to average M, A and weights values for spot with the same probe >> id code (in my case this is an Agilent code). Why should my statistical >> significance change and what list of differentially expressed genes is >> right...or more safe? >> 2) when I checked the averaged list of genes I found spot not averaged >> with the same Probe id. You can see an example of this below. What are >> the reason that does not allow for the averaging? >> Maybe the problems that I see are not a consequence of using avereps >> function, particularly for the point 1), but should they to be explained >> in other terms? >> I apologize again for the disturb that I am causing you and I thank you >> in advance for any help you will like to give me. >> Best regards >> Erika >> Erika Melissari >> Ph.D. student >> Department of Experimental Pathology, MBIE, >> University of Pisa >> Santa Chiara Hospital, via Roma 67 >> 56126 Pisa >> e-mail: erika.melissari at bioclinica.unipi.it >> <mailto:erika.melissari at="" bioclinica.unipi.it=""> >> ----- Original Message ----- >> *From:* Erika Melissari <mailto:erika.melissari at="" bioclinica.unipi.it=""> >> *To:* bioconductor at stat.math.ethz.ch >> <mailto:bioconductor at="" stat.math.ethz.ch=""> ; Francois Pepin >> <mailto:fpepin at="" cs.mcgill.ca=""> >> *Sent:* Friday, June 05, 2009 18:23 PM >> *Subject:* avereps function >> >> Dear list, >> I used averep function after normalization and before lmFit to average >> spot copies on microarrays. >> I noted that since a lot of spots have been averaged (the total number of >> spots have been reduced to 41000 from 43000), other spots do not have. >> See this example: >> Block Column Row ID Name Sequence ProbeUID GeneName logFC adj.P.Val B >> 1 85 183 A_23_P135769 NM_001101 >> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB >> 0.266302 0.048228 0.181434 >> 1 20 393 A_23_P135769 NM_001101 >> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 >> ACTB -0.20687 0.068295 -0.6233 >> 1 56 294 A_23_P135769 NM_001101 >> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB >> 0.110065 0.382405 -4.54642 >> 1 22 299 A_23_P135769 NM_001101 >> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB >> 0.085017 0.405978 -4.66767 >> 1 53 457 A_23_P135769 NM_001101 >> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB >> 0.080708 0.483304 -5.0517 >> 1 17 39 A_23_P135769 NM_001101 >> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB >> 0.063279 0.710629 -5.73913 >> 1 45 199 A_23_P135769 NM_001101 >> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB >> 0.051584 0.778778 -5.87993 >> 1 64 279 A_23_P135769 NM_001101 >> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 >> ACTB -0.04158 0.800246 -5.91735 >> 1 16 358 A_23_P135769 NM_001101 >> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB >> 0.024847 0.880504 -6.03386 >> 1 21 435 A_23_P135769 NM_001101 >> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB >> 0.000153 0.999438 -6.11393 >> 1 4 111 A_23_P31323 NM_001101 >> ACTCTTCCAGCCTTCCTTCCTGGGCATGGAGTCCTGTGGCATCCACGAAACTACCTTCAA 8562 ACTB >> 0.283846 0.043577 0.472915 >> 1 17 275 A_24_P226554 NM_001101 >> GCACCCAGCACAATGAAGATCAAGATCATTGCTCCTCCTGAGCGCAAGTACTCCGTGTGG 21338 ACTB >> 0.030637 0.848504 -5.9958 >> 1 74 251 A_32_P137939 NM_001101 >> AGGCAGCCAGGGCTTACCTGTACACTGACTTGAGACCAGTTGAATAAAAGTGCGCACCTT 19564 >> ACTB -0.20387 0.177982 -2.87144 >> >> Why the group of first 10 probes was not averaged by avereps? >> Any suggestion will be appreciated. >> Thank you so much >> Erika >> >