Question

ask for your help about some questions of using edgeR

0

Entering edit mode

Zhenling Peng ▴ 30

@zhenling-peng-4864

Last seen 10.5 years ago

Dear Sir/Madam, Would you mind me disturbing you for a few minutes? I know edgeR can do comparison among multiple libraries. I did have 4 libraries of miRNA, where two of them are normal (library1 and library2), the other two are abnormal (library3 and library4). And I'm trying to get the miRNA which expressed differently between the two groups: normal and abnormal. I utilized common dispersion, and the results are displayed at the end. When using edgeR, I have some questions as follows, 1) according to the results got from *summarydecideTestsDGEde.com, p.value = 0.05))*, it seems that there is only 1 miRNA: RNA_de is expressed differently if I take 0.05 as the cut-off of the *adjust.p.val* (please see *line 20* and *line 12*). While *line 16* indicates that there is a constriction on the counts in libraries. In fact, the counts in library 1 and 2 are not consistently greater than those in library 3 and 4. So I wondered how the package edgeR deals with this kind of inconsistent data. Is it still reliable to determine that the RNA_de is differently expressed between two groups? When the edgeR makes a comparison among multiple libraries, does the edgeR merge the libraries within the same group together before the comparison or not? if it does, how does it merge these libraries together? I tried to find the answer in the manual and the two papers to be implemented into edgeR, but I failed finally as my limited knowledge on statistics.So could you please give me a favor to explain those questions for me?2) according to the norm.factor for each library, and the value of the common.dispersion (*line 1-4*,* line10* and *11*), I think it's OK to use common dispersion for our data. What do you think about it? Actually, I'm not sure *when we can use * *common.dispersion*. And I tried tagwise dispersion with prion.n = 10 (20/(4-2)) also, but the results are very similar to the results got from common.dispersion. In addition, I made the pair-wise comparison between two libraries (one is normal, the other is abnormal), but the the value of the *common.dispersion*approximates to 0 (1.0E-16). So I doubt the reliability of the results when using common dispersion to make pair-wise comparison between libraries. In general, if we have multiple libraries and we only want to find the differently expressed miRNA/genes among groups, is their any other comparison except pair-wise comparison between groups? Would you mind giving me some suggestions? 3) From *line 19*, we can see that the miRNA---RNA_ad shows consistent and significant difference between groups, this is consistent with the P-value for this miRNA in *line 15*. While the ajust.p.val is equal to 1 (*line 15*, this ajust.p.val is based on the method "BH"). So I wondered whether we can use the P-value to analyze differential expression or not. If not, do we have to use ajust.p.val to make analysis? Is it always more reliable to use ajust.p.val instead of P-value to make analysis? Could you please give me some advice? I'm very sorry to disturb you with so many naive questions. But I really confused about the unsatisfied results after using edgeR, and I'm not sure if I made any mistakes when using edgeR. Could you please give me a favor? Would you mind giving me any advice about my questions? Thank you very much! Best wishes, Zhenling Peng University of Alberta > dim(d) [1] 658 4 > d An object of class "DGEList" $samples files group lib.size norm.factors library1 library1 normal 3184257 1.0025762 .............................................line 1 library2 library2 normal 2825126 0.9873665 .............................................line 2 library3 library3 abnormal 3955120 1.0476795 .............................................line 3 library4 library4 abnormal 4692333 0.9642192 .............................................line 4 $counts library1 library2 library3 library4 RNA_aa 302440 274619 479102 403961 ..............................................line 5 RNA_ab 298597 283090 206178 331258 ..............................................line 6 RNA_ac 293963 218700 508874 573151 ..............................................line 7 RNA_bd 199355 168652 180300 262947 ..............................................line 8 RNA_ae 149868 131546 177037 201992 ..............................................line 9 653 more rows ... > d <- estimateCommonDisp(d) > d$common.dispersion [1] 0.03584351 ...............................................line 10 > sqrt(d$common.dispersion) [1] 0.1893238 ...............................................line 11 > de.com <- exactTest(d) Comparison of groups: normal - abnormal > topTagsde.com, n=5) Comparison of groups: normal-abnormal logConc logFC P.Value adjust.p.val RNA_de -15.851627 1.7031386 5.969173e-07 0.0003927716 .............line 12 RNA_ea -16.861371 1.4352602 2.465201e-04 0.0554942582 .............line 13 RNA_cf -34.679446 30.6732173 2.833211e-03 0.3225116431 .............line 14 RNA_ad -3.690956 0.6652757 1.554169e-02 1.0000000000 ..............line 15 > detags.com <- rownamestopTagsde.com, n=5)$table) > d$counts[detags.com, ] library1 library2 library3 library4 RNA_de 153 * 35 * *62 * 18 ...............................................line 16 RNA_ea 49 34 28 16 ...............................................line 17 RNA_cf 5 4 0 0 ...............................................line 18 RNA_ad 293963 218700 508874 573151 ...............................................line 19 > summarydecideTestsDGEde.com, p.value = 0.05)) ...............................................line 20 [,1] -1 0 0 657 1 1 [[alternative HTML version deleted]]

miRNA edgeR miRNA edgeR • 1.2k views

ADD COMMENT • link 13.5 years ago Zhenling Peng ▴ 30

score 0 · Answer 1 · 2011-09-18

Dear Sir/Madam, Would you mind me disturbing you for a few minutes? I know edgeR can do comparison among multiple libraries. I did have 4 libraries of miRNA, where two of them are normal (library1 and library2), the other two are abnormal (library3 and library4). And I'm trying to get the miRNA which expressed differently between the two groups: normal and abnormal. I utilized common dispersion, and the results are displayed at the end. When using edgeR, I have some questions as follows, 1) according to the results got from *summarydecideTestsDGEde.com, p.value = 0.05))*, it seems that there is only 1 miRNA: RNA_de is expressed differently if I take 0.05 as the cut-off of the *adjust.p.val* (please see *line 20* and *line 12*). While *line 16* indicates that there is a constriction on the counts in libraries. In fact, the counts in library 1 and 2 are not consistently greater than those in library 3 and 4. So I wondered how the package edgeR deals with this kind of inconsistent data. Is it still reliable to determine that the RNA_de is differently expressed between two groups? When the edgeR makes a comparison among multiple libraries, does the edgeR merge the libraries within the same group together before the comparison or not? if it does, how does it merge these libraries together? I tried to find the answer in the manual and the two papers to be implemented into edgeR, but I failed finally as my limited knowledge on statistics.So could you please give me a favor to explain those questions for me? 2) according to the norm.factor for each library, and the value of the common.dispersion (*line 1-4*,* line10* and *11*), I think it's OK to use common dispersion for our data. What do you think about it? Actually, I'm not sure *when we can use * *common.dispersion*. And I tried tagwise dispersion with prion.n = 10 (20/(4-2)) also, but the results are very similar to the results got from common.dispersion. In addition, I made the pair-wise comparison between two libraries (one is normal, the other is abnormal), but the the value of the *common.dispersion*approximates to 0 (1.0E-16). So I doubt the reliability of the results when using common dispersion to make pair-wise comparison between libraries. In general, if we have multiple libraries and we only want to find the differently expressed miRNA/genes among groups, is their any other comparison except pair-wise comparison between groups? Would you mind giving me some suggestions? 3) From *line 19*, we can see that the miRNA---RNA_ad shows consistent and significant difference between groups, this is consistent with the P-value for this miRNA in *line 15*. While the ajust.p.val is equal to 1 (*line 15*, this ajust.p.val is based on the method "BH"). So I wondered whether we can use the P-value to analyze differential expression or not. If not, do we have to use ajust.p.val to make analysis? Is it always more reliable to use ajust.p.val instead of P-value to make analysis? Could you please give me some advice? I'm very sorry to disturb you with so many naive questions. But I really confused about the unsatisfied results after using edgeR, and I'm not sure if I made any mistakes when using edgeR. Could you please give me a favor? Would you mind giving me any advice about my questions? Thank you very much! Best wishes, Zhenling Peng University of Alberta > dim(d) [1] 658 4 > d An object of class "DGEList" $samples files group lib.size norm.factors library1 library1 normal 3184257 1.0025762 ...................................................................... ........line 1 library2 library2 normal 2825126 0.9873665 ...................................................................... ........line 2 library3 library3 abnormal 3955120 1.0476795 ...................................................................... ........line 3 library4 library4 abnormal 4692333 0.9642192 ...................................................................... ........line 4 $counts library1 library2 library3 library4 RNA_aa 302440 274619 479102 403961 ...................................................................... ........line 5 RNA_ab 298597 283090 206178 331258 ...................................................................... ........line 6 RNA_ac 293963 218700 508874 573151 ...................................................................... ........line 7 RNA_bd 199355 168652 180300 262947 ...................................................................... ........line 8 RNA_ae 149868 131546 177037 201992 ...................................................................... ........line 9 653 more rows ... > d <- estimateCommonDisp(d) > d$common.dispersion [1] 0.03584351 ...................................................................... ........line 10 > sqrt(d$common.dispersion) [1] 0.1893238 ...................................................................... ........line 11 > de.com <- exactTest(d) Comparison of groups: normal - abnormal > topTagsde.com, n=5) Comparison of groups: normal-abnormal logConc logFC P.Value adjust.p.val RNA_de -15.851627 1.7031386 5.969173e-07 0.0003927716 ............................................line 12 RNA_ea -16.861371 1.4352602 2.465201e-04 0.0554942582 ............................................line 13 RNA_cf -34.679446 30.6732173 2.833211e-03 0.3225116431 ............................................line 14 RNA_ad -3.690956 0.6652757 1.554169e-02 1.0000000000 ............................................line 15 > detags.com <- rownamestopTagsde.com, n=5)$table) > d$counts[detags.com, ] library1 library2 library3 library4 RNA_de 153 * 35 * *62 * 18 ...................................................................... ........line 16 RNA_ea 49 34 28 16 ...................................................................... ........line 17 RNA_cf 5 4 0 0 ...................................................................... ........line 18 RNA_ad 293963 218700 508874 573151 ...................................................................... ........line 19 > summarydecideTestsDGEde.com, p.value = 0.05)) ...................................................................... ........line 20 [,1] -1 0 0 657 1 1 [[alternative HTML version deleted]]

score 0 · Answer 2 · 2011-09-20

Dear Sir/Madam, I'm very sorry to disturb you again. Your answers helped me a lot! Thank you very much! But I'm still a little confused about some small questions, could you please give me a bit more explanation for those questions? 1) according to the results got from *summarydecideTestsDGEde.com, p.value >> = 0.05))*, it seems that there is only 1 miRNA: RNA_de is expressed >> differently if I take 0.05 as the cut-off of the *adjust.p.val* (please see >> *line 20* and *line 12*). >> > > This of course may simply be correct. There may be only a few > differentially expressed miRs in your data. The common dispersion estimate > appears to be reasonably small. > Is there any relationship between the number of differentially expressed miRs and the value of the common dispersion. In general, is tagwise dispersion estimation better than common dispersion estimation? Is there any criteria to help us choose dispersion estimation strategy? > There is no reason why you need to restrict to a FDR (adjusted p-value) of > less than 0.05. You could easily examine the top three miRs. > Here, I just tried to confirm which p-value is used to make analysis in edgeR. :) > > There is something wrong with your topTags() output. This function does > not print column headings in the format that you have presented. You must > be using other R code that you haven't shown. > I'm not sure if I used some other R codes to help display the column headings. Because I'm not familiar with R, I only followed the Case study 8 of edgeR manual, where topTags() prints column headings. While *line 16* indicates that there is a constriction on the counts in >> libraries. In fact, the counts in library 1 and 2 are not consistently >> greater than those in library 3 and 4. So I wondered how the package edgeR >> deals with this kind of inconsistent data. Is it still reliable to determine >> that the RNA_de is differently expressed between two groups? When the edgeR >> makes a comparison among multiple libraries, does the edgeR merge the >> libraries within the same group together before the comparison or not? if it >> does, how does it merge these libraries together? >> >> I tried to find the answer in the manual and the two papers to be >> implemented into edgeR, but I failed finally as my limited knowledge on >> statistics. >> > > No, it doesn't simply merge libraries together. The whole point of having > biological replicate libraries is to measure the variation between them. > This gives an estimate of the underlying biological variation, against which > significance is assessed. > > Think of it like a two-sample t-test, where you assess the separation of > two groups relative to variation within the two groups. You do not require > observations to all be the same, and you don't merge them together. With > common dispersion, edgeR is analogous to two-sample t-tests where the same > standard deviation estimate is used for every miR. With tagwise dispersion, > edgeR is analogous to two-sample t-tests with different standard deviations > estimated for each miR. > > So could you please give me a favor to explain those questions for me?2) >> according to the norm.factor for each library, and the value of the >> common.dispersion (*line 1-4*,* line10* and *11*), I think it's OK to use >> common dispersion for our data. What do you think about it? Actually, I'm >> not sure *when we can use * *common.dispersion*. >> > > I generally recommend tagwise, if you have any reasonable number of > replicates. The choice has nothing to do with the norm.factors. > I compared the results from the common dispersion and tagwise dispersion again. The miRNA "RNA_de" with contradicted counts (line 16) is not differently expressed any more, according to the results based on tagwise dispersion. In fact, the FDR of RNA_de is 0.2005 now. > > And I tried tagwise dispersion with prion.n = 10 (20/(4-2)) also, but the >> results are very similar to the results got from common.dispersion. In >> addition, I made the pair-wise comparison between two libraries (one is >> normal, the other is abnormal), but the the value of the >> *common.dispersion*approximates to 0 (1.0E-16). So I doubt the reliability >> of the results when using common dispersion to make pair-wise comparison >> between libraries. >> > > Well, if you try to do a comparison without replicates, it is impossible to > estimated biological variability, so edgeR puts it to zero. You are correct > that this is not reliable, but the problem is with the lack of replicates, > not with the use of common dispersion. > > In general, if we have multiple libraries and we only want to find the >> differently expressed miRNA/genes among groups, is their any other >> comparison except pair-wise comparison between groups? Would you mind giving >> me some suggestions? >> > > I can't see what is wrong with the pair-wise comparison. You only have two > groups, so there is nothing else sensible that could be done. > > 3) From *line 19*, we can see that the miRNA---RNA_ad shows consistent and >> significant difference between groups, this is consistent with the P-value >> for this miRNA in *line 15*. While the ajust.p.val is equal to 1 (*line >> 15*, this ajust.p.val is based on the method "BH"). So I wondered whether we >> can use the P-value to analyze differential expression or not. If not, do we >> have to use ajust.p.val to make analysis? Is it always more reliable to use >> ajust.p.val instead of P-value to make analysis? Could you please give me >> some advice? >> > > Well, if you simply use unadusted p-values, you would find about 5% of the > miRs to be signficantly different even from random data. You will be > guaranteed to find statistical significant differences even when there are > none. This would not be considered defensible in a scientific journal. > Do you mean it's more reliable to use adjust.p.val to make analysis? But I'm really confused about the statistic result from edgeR. For example, miRNA logConc logFC P.Value adj.P.Val library1 library2 library3 library4 miR-127 -3.27733 -0.52769 0.042147 1 293963 218700 508874 573151 miR-124 -5.8367 -0.45822 0.07751 1 43728 44432 97614 80855 (no matter what dispersion estimation is used, the adj.P.val are both equivalent to 1 for those two miRNA.) In these two miRNA, we can determine that they are differently expressed between normal and abnormal, based on the facts that the counts in library3 and library4 are nearly twice higher than those in library1 and library2. But both of the adj.p.val equal to 1. Actually, this case is existed in about 30% miRNAs of my data. I'm not sure whether this case is related to the sizes of the libraries (Please see line 1-4). Would you mind giving me some suggestions for this case? Thank you very much! Best wishes, Zhenling University of Alberta results: > dim(d) [1] 658 4 > d An object of class "DGEList" $samples files group lib.size norm.factors library1 library1 normal 3184257 1.0025762 ........................................................line 1 library2 library2 normal 2825126 0.9873665 ........................................................line 2 library3 library3 abnormal 3955120 1.0476795 ........................................................line 3 library4 library4 abnormal 4692333 0.9642192 ........................................................line 4 $counts library1 library2 library3 library4 RNA_aa 302440 274619 479102 403961 ........................................................line 5 RNA_ab 298597 283090 206178 331258 ........................................................line 6 RNA_ac 293963 218700 508874 573151 ........................................................line 7 RNA_bd 199355 168652 180300 262947 ........................................................line 8 RNA_ae 149868 131546 177037 201992 ........................................................line 9 653 more rows ... > d <- estimateCommonDisp(d) > d$common.dispersion [1] 0.03584351 .........................................................line 10 > sqrt(d$common.dispersion) [1] 0.1893238 .........................................................line 11 > de.com <- exactTest(d) Comparison of groups: normal - abnormal > topTagsde.com, n=5) Comparison of groups: normal-abnormal logConc logFC P.Value adjust.p.val RNA_de -15.851627 1.7031386 5.969173e-07 0.0003927716 .......................line 12 RNA_ea -16.861371 1.4352602 2.465201e-04 0.0554942582 .......................line 13 RNA_cf -34.679446 30.6732173 2.833211e-03 0.3225116431 .......................line 14 RNA_ad -3.690956 0.6652757 1.554169e-02 1.0000000000 .......................line 15 > detags.com <- rownamestopTagsde.com, n=5)$table) > d$counts[detags.com, ] library1 library2 library3 library4 RNA_de 153 * 35 * *62 * 18 ........................................................line 16 RNA_ea 49 34 28 16 ........................................................line 17 RNA_cf 5 4 0 0 ........................................................line 18 RNA_ad 293963 218700 508874 573151 ........................................................line 19 [[alternative HTML version deleted]]

score 0 · Answer 3 · 2011-09-20

Dear Zhenling, I'll try to answer your questions as best I can. But the bottom line is that there may simply be only a few differentially expressed miRs between your two groups. > Date: Sun, 18 Sep 2011 14:32:12 -0600 > From: Zhenling Peng <zhenling at="" ualberta.ca=""> > To: bioconductor at r-project.org > Subject: [BioC] ask for your help about some questions of using edgeR > > Dear Sir/Madam, > > Would you mind me disturbing you for a few minutes? > > I know edgeR can do comparison among multiple libraries. I did have 4 > libraries of miRNA, where two of them are normal (library1 and > library2), the other two are abnormal (library3 and library4). And I'm > trying to get the miRNA which expressed differently between the two > groups: normal and abnormal. I utilized common dispersion, and the > results are displayed at the end. When using edgeR, I have some > questions as follows, > > 1) according to the results got from *summarydecideTestsDGEde.com, > p.value = 0.05))*, it seems that there is only 1 miRNA: RNA_de is > expressed differently if I take 0.05 as the cut-off of the > *adjust.p.val* (please see *line 20* and *line 12*). This of course may simply be correct. There may be only a few differentially expressed miRs in your data. The common dispersion estimate appears to be reasonably small. There is no reason why you need to restrict to a FDR (adjusted p-value) of less than 0.05. You could easily examine the top three miRs. There is something wrong with your topTags() output. This function does not print column headings in the format that you have presented. You must be using other R code that you haven't shown. > While *line 16* indicates that there is a constriction on the counts in > libraries. In fact, the counts in library 1 and 2 are not consistently > greater than those in library 3 and 4. So I wondered how the package > edgeR deals with this kind of inconsistent data. Is it still reliable to > determine that the RNA_de is differently expressed between two groups? > When the edgeR makes a comparison among multiple libraries, does the > edgeR merge the libraries within the same group together before the > comparison or not? if it does, how does it merge these libraries > together? > > I tried to find the answer in the manual and the two papers to be > implemented into edgeR, but I failed finally as my limited knowledge on > statistics. No, it doesn't simply merge libraries together. The whole point of having biological replicate libraries is to measure the variation between them. This gives an estimate of the underlying biological variation, against which significance is assessed. Think of it like a two-sample t-test, where you assess the separation of two groups relative to variation within the two groups. You do not require observations to all be the same, and you don't merge them together. With common dispersion, edgeR is analogous to two-sample t-tests where the same standard deviation estimate is used for every miR. With tagwise dispersion, edgeR is analogous to two-sample t-tests with different standard deviations estimated for each miR. > So could you please give me a favor to explain those questions for me?2) > according to the norm.factor for each library, and the value of the > common.dispersion (*line 1-4*,* line10* and *11*), I think it's OK to > use common dispersion for our data. What do you think about it? > Actually, I'm not sure *when we can use * *common.dispersion*. I generally recommend tagwise, if you have any reasonable number of replicates. The choice has nothing to do with the norm.factors. > And I tried tagwise dispersion with prion.n = 10 (20/(4-2)) also, but > the results are very similar to the results got from common.dispersion. > In addition, I made the pair-wise comparison between two libraries (one > is normal, the other is abnormal), but the the value of the > *common.dispersion*approximates to 0 (1.0E-16). So I doubt the > reliability of the results when using common dispersion to make > pair-wise comparison between libraries. Well, if you try to do a comparison without replicates, it is impossible to estimated biological variability, so edgeR puts it to zero. You are correct that this is not reliable, but the problem is with the lack of replicates, not with the use of common dispersion. > In general, if we have multiple libraries and we only want to find the > differently expressed miRNA/genes among groups, is their any other > comparison except pair-wise comparison between groups? Would you mind > giving me some suggestions? I can't see what is wrong with the pair-wise comparison. You only have two groups, so there is nothing else sensible that could be done. > 3) From *line 19*, we can see that the miRNA---RNA_ad shows consistent > and significant difference between groups, this is consistent with the > P-value for this miRNA in *line 15*. While the ajust.p.val is equal to > 1 (*line 15*, this ajust.p.val is based on the method "BH"). So I > wondered whether we can use the P-value to analyze differential > expression or not. If not, do we have to use ajust.p.val to make > analysis? Is it always more reliable to use ajust.p.val instead of > P-value to make analysis? Could you please give me some advice? Well, if you simply use unadusted p-values, you would find about 5% of the miRs to be signficantly different even from random data. You will be guaranteed to find statistical significant differences even when there are none. This would not be considered defensible in a scientific journal. Best wishes Gordon > I'm very sorry to disturb you with so many naive questions. But I really > confused about the unsatisfied results after using edgeR, and I'm not sure > if I made any mistakes when using edgeR. Could you please give me a favor? > Would you mind giving me any advice about my questions? Thank you very much! > > Best wishes, > Zhenling Peng > University of Alberta > >> dim(d) > [1] 658 4 >> d > An object of class "DGEList" > $samples > files group lib.size norm.factors > library1 library1 normal 3184257 1.0025762 > .............................................line 1 > library2 library2 normal 2825126 0.9873665 > .............................................line 2 > library3 library3 abnormal 3955120 1.0476795 > .............................................line 3 > library4 library4 abnormal 4692333 0.9642192 > .............................................line 4 > $counts > library1 library2 library3 library4 > RNA_aa 302440 274619 479102 403961 > ..............................................line 5 > RNA_ab 298597 283090 206178 331258 > ..............................................line 6 > RNA_ac 293963 218700 508874 573151 > ..............................................line 7 > RNA_bd 199355 168652 180300 262947 > ..............................................line 8 > RNA_ae 149868 131546 177037 201992 > ..............................................line 9 > 653 more rows ... > >> d <- estimateCommonDisp(d) >> d$common.dispersion > [1] 0.03584351 > ...............................................line 10 >> sqrt(d$common.dispersion) > [1] 0.1893238 > ...............................................line 11 > >> de.com <- exactTest(d) > Comparison of groups: normal - abnormal >> topTagsde.com, n=5) > Comparison of groups: normal-abnormal > logConc logFC P.Value > adjust.p.val > > RNA_de -15.851627 1.7031386 5.969173e-07 0.0003927716 > .............line 12 > RNA_ea -16.861371 1.4352602 2.465201e-04 0.0554942582 > .............line 13 > RNA_cf -34.679446 30.6732173 2.833211e-03 0.3225116431 > .............line 14 > RNA_ad -3.690956 0.6652757 1.554169e-02 1.0000000000 > ..............line 15 This is not correct output from topTags(). >> detags.com <- rownamestopTagsde.com, n=5)$table) >> d$counts[detags.com, ] > library1 library2 library3 library4 > RNA_de 153 * 35 * *62 * 18 > ...............................................line 16 > RNA_ea 49 34 28 16 > ...............................................line 17 > RNA_cf 5 4 0 0 > ...............................................line 18 > RNA_ad 293963 218700 508874 573151 > ...............................................line 19 > >> summarydecideTestsDGEde.com, p.value = 0.05)) > ...............................................line 20 > [,1] > -1 0 > 0 657 > 1 1 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}