edgeR, logFC calculation in factor combination
2
0
Entering edit mode
Mike Miller ▴ 70
@mike-miller-6388
Last seen 9.6 years ago
Dear All, I am using edgeR for an RNASeq experiment (~30 samples), where I need to explore the influence of 2 factors with 2 levels each (there are actually more factors, but for simplicity I put only 2 here): 1. disease_state (levels: control_0 and control_1) 2. localization (levels: A and B) I combined the factors and got 4 combinations: control_0.localization_A, control_1.localization_A, control_0.localization_B, control_1.localization_B. I am specifically interested in a list of diff. expressed genes between these 2 combinations: control_1.localization_A – control_1.localization_B. When separating up- and down-regulated genes, I did it simply according to logFC column (after filtering for genes with padj < 0.05). If the logFC<0, could I infer that the gene is DOWN-regulated in control_1.localization_A? I assume that this is true. However, when I take the raw counts for one of the diff. expressed genes (for which logFC<0), in control_1.localization_A and in control_1.localization_B conditions, it turned out to be the following: mean(control_1.localization_A) / mean(control_1.localization_B) = 4.2 (Formula explained in words: mean of raw counts for the diff. expressed gene X in samples which are control_1 and localization_A, divided by mean of raw counts for the diff. expressed gene X in samples which are control_1 and localization_B) This is contrary to the conclusion I got from logFC, since according to raw counts that gene is UP-regulated in control_1.localization_A. I know that logFC is calculated differently, but shouldn't the ratio (whether the gene is up- or down-regulated) stay nevertheless the same (if library sizes are very similar in all samples, which is the case here)? Maybe a more precise question would be: how is logFC calculated in this case, when we have a combination of different factors? Thank you very much in advance for any piece of clarification! Mike [[alternative HTML version deleted]]
RNASeq edgeR RNASeq edgeR • 2.5k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia
Dear Mike, edgeR does compute logFC correctly, and a positive logFC does obviously mean that the counts are higher, other things being equal, in the second group than the first group. If you are making the comparison B-A (B minus A), then a positive logFC means expression is higher in B than in A. The topTable() function always output the contrast you are making at the top of the table, so that the table is unambiguous. I can't tell from your email what comparison you actually made. What is the non-ascii character is your email between "control_1.localization_A" and "control_1.localization_B"? Was it supposed to be a minus sign or something else? I don't know whether you are making a simple comparison between two groups, or whether you have fitted a more complex linear model. This can affect the interpration of logFCs. Best wishes Gordon > Date: Fri, 16 May 2014 12:06:14 +0200 > From: Mike Miller <mike.bioc32 at="" gmail.com=""> > To: <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] edgeR, logFC calculation in factor combination > Message-ID: > <cankskzosyxfqkn3y=itkq4qj7dpp+da+umgpjku0de98ruu5bg at="" mail.gmail.com=""> > Content-Type: text/plain > > Dear All, > > > I am using edgeR for an RNASeq experiment (~30 samples), where I need to > explore the influence of 2 factors with 2 levels each (there are actually > more factors, but for simplicity I put only 2 here): > > 1. disease_state (levels: control_0 and control_1) > > 2. localization (levels: A and B) > > > I combined the factors and got 4 combinations: > > control_0.localization_A, control_1.localization_A, > control_0.localization_B, control_1.localization_B. > > I am specifically interested in a list of diff. expressed genes between > these 2 combinations: control_1.localization_A ??? control_1.localization_B. > > When separating up- and down-regulated genes, I did it simply according to > logFC column (after filtering for genes with padj < 0.05). If the logFC<0, > could I infer that the gene is DOWN-regulated in control_1.localization_A? > I assume that this is true. > > However, when I take the raw counts for one of the diff. expressed genes > (for which logFC<0), in control_1.localization_A and in > control_1.localization_B conditions, it turned out to be the following: > > mean(control_1.localization_A) / mean(control_1.localization_B) = 4.2 > > (Formula explained in words: mean of raw counts for the diff. expressed > gene X in samples which are control_1 and localization_A, divided by mean > of raw counts for the diff. expressed gene X in samples which are control_1 > and localization_B) > > This is contrary to the conclusion I got from logFC, since according to > raw counts that gene is UP-regulated in control_1.localization_A. > > I know that logFC is calculated differently, but shouldn't the ratio > (whether the gene is up- or down-regulated) stay nevertheless the same (if > library sizes are very similar in all samples, which is the case here)? > > Maybe a more precise question would be: how is logFC calculated in this > case, when we have a combination of different factors? > > Thank you very much in advance for any piece of clarification! > > Mike ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
ADD COMMENT
0
Entering edit mode
Dear Gordon, Thank you for your help. The contrast was: (control_1.localization_A) minus (control_1.localization_B). It was not a simple comparison between groups, the complete design looks like (in previous email I tried to simplify the question, for that reason, I excluded some factors in the question): design_combi=model.matrix(~0+ ctrl_loc + Gender+Sample_blocking, data=combi) > colnames(design_combi) [1] "control_0.localization_A" "control_0.localization_B" "control_1.localization_A" [4] "control_1.localization_B" "Gender1" "Sample_blocking15" [7] "Sample_blocking16" "Sample_blocking17" "Sample_blocking18" [10] "Sample_blocking19" "Sample_blocking2" "Sample_blocking20" [13] "Sample_blocking21" "Sample_blocking22" "Sample_blocking23" [16] "Sample_blocking24" "Sample_blocking27" "Sample_blocking28" [19] "Sample_blocking3" "Sample_blocking33" "Sample_blocking34" [22] "Sample_blocking35" "Sample_blocking36" "Sample_blocking37" [25] "Sample_blocking38" "Sample_blocking39" "Sample_blocking6" [28] "Sample_blocking8" Factor "ctrl_loc" contains combinations of 2 factors (thus has 4 levels): 1. disease_state (levels: control_0 and control_1) 2. localization (levels: A and B) Levels: control_0.localization_A, control_1.localization_A, control_0.localization_B, control_1.localization_B. Factor "Gender" has 2 levels: 0 and 1. Factor "Sample_blocking" is here to account for the batch effect (some samples are coming from the same patient). Since the "Design matrix not of full rank" I excluded 2 columns from the "design_combi" (columns number 25 and 26) to be able to compute dispersions. After fitting the model, I defined contrasts: my_contrasts=makeContrasts(disease.A_minus_disease.B=control_1.localiz ation_A - control_1.localization_B, levels=design_combi). lrt=glmLRT( fit, contrast=my_contrasts[, "disease.A_minus_disease.B"]) I was wondering how is logFC computed in this case (in this contrast), and I wanted just to confirm that when the logFC >0, that means that the gene is up-regulated in control_1.localization_A compared to control_1.localization_B. To check that, I tried to manually calculate just for one gene the logFC value, but obviously I do not know how it is computed in this case. Thank you very much in advance! Best regards, Mike On Sun, May 18, 2014 at 5:55 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Mike, > > edgeR does compute logFC correctly, and a positive logFC does obviously > mean that the counts are higher, other things being equal, in the second > group than the first group. > > If you are making the comparison B-A (B minus A), then a positive logFC > means expression is higher in B than in A. The topTable() function always > output the contrast you are making at the top of the table, so that the > table is unambiguous. > > I can't tell from your email what comparison you actually made. What is > the non-ascii character is your email between "control_1.localization_A" > and "control_1.localization_B"? Was it supposed to be a minus sign or > something else? > > I don't know whether you are making a simple comparison between two > groups, or whether you have fitted a more complex linear model. This can > affect the interpration of logFCs. > > Best wishes > Gordon > > > Date: Fri, 16 May 2014 12:06:14 +0200 >> From: Mike Miller <mike.bioc32@gmail.com> >> To: <bioconductor@stat.math.ethz.ch> >> Subject: [BioC] edgeR, logFC calculation in factor combination >> Message-ID: >> <cankskzosyxfqkn3y=itkq4qj7dpp+da+umgpjku0de98ruu5bg@mail.>> gmail.com> >> Content-Type: text/plain >> >> Dear All, >> >> >> I am using edgeR for an RNASeq experiment (~30 samples), where I need to >> explore the influence of 2 factors with 2 levels each (there are actually >> more factors, but for simplicity I put only 2 here): >> >> 1. disease_state (levels: control_0 and control_1) >> >> 2. localization (levels: A and B) >> >> >> I combined the factors and got 4 combinations: >> >> control_0.localization_A, control_1.localization_A, >> control_0.localization_B, control_1.localization_B. >> >> I am specifically interested in a list of diff. expressed genes between >> these 2 combinations: control_1.localization_A – >> control_1.localization_B. >> >> When separating up- and down-regulated genes, I did it simply according to >> logFC column (after filtering for genes with padj < 0.05). If the logFC<0, >> could I infer that the gene is DOWN-regulated in control_1.localization_A? >> I assume that this is true. >> >> However, when I take the raw counts for one of the diff. expressed genes >> (for which logFC<0), in control_1.localization_A and in >> control_1.localization_B conditions, it turned out to be the following: >> >> mean(control_1.localization_A) / mean(control_1.localization_B) = 4.2 >> >> (Formula explained in words: mean of raw counts for the diff. expressed >> gene X in samples which are control_1 and localization_A, divided by mean >> of raw counts for the diff. expressed gene X in samples which are >> control_1 >> and localization_B) >> >> This is contrary to the conclusion I got from logFC, since according to >> raw counts that gene is UP-regulated in control_1.localization_A. >> >> I know that logFC is calculated differently, but shouldn't the ratio >> (whether the gene is up- or down-regulated) stay nevertheless the same (if >> library sizes are very similar in all samples, which is the case here)? >> >> Maybe a more precise question would be: how is logFC calculated in this >> case, when we have a combination of different factors? >> >> Thank you very much in advance for any piece of clarification! >> >> Mike >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:9}}
ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia
Dear Mike, On Mon, 19 May 2014, Mike Miller wrote: > Dear Gordon, > > > Thank you for your help. > > The contrast was: (control_1.localization_A) minus > (control_1.localization_B). It was not a simple comparison between > groups, the complete design looks like (in previous email I tried to > simplify the question, for that reason, I excluded some factors in the > question): Unfortunately, the phenomenon that you say you have observed could not have occured in the simplified situation you reported, which made answering the question difficult. > design_combi=model.matrix(~0+ ctrl_loc + Gender+Sample_blocking, data=combi) > >> colnames(design_combi) > [1] "control_0.localization_A" "control_0.localization_B" > "control_1.localization_A" > [4] "control_1.localization_B" "Gender1" > "Sample_blocking15" > [7] "Sample_blocking16" "Sample_blocking17" "Sample_blocking18" > [10] "Sample_blocking19" "Sample_blocking2" "Sample_blocking20" > [13] "Sample_blocking21" "Sample_blocking22" "Sample_blocking23" > [16] "Sample_blocking24" "Sample_blocking27" "Sample_blocking28" > [19] "Sample_blocking3" "Sample_blocking33" "Sample_blocking34" > [22] "Sample_blocking35" "Sample_blocking36" "Sample_blocking37" > [25] "Sample_blocking38" "Sample_blocking39" "Sample_blocking6" > [28] "Sample_blocking8" > > Factor "ctrl_loc" contains combinations of 2 factors (thus has 4 levels): > 1. disease_state (levels: control_0 and control_1) > 2. localization (levels: A and B) > Levels: control_0.localization_A, control_1.localization_A, > control_0.localization_B, control_1.localization_B. > > Factor "Gender" has 2 levels: 0 and 1. > Factor "Sample_blocking" is here to account for the batch effect (some > samples are coming from the same patient). > Since the "Design matrix not of full rank" I excluded 2 columns from the > "design_combi" (columns number 25 and 26) to be able to compute dispersions. This means that you are treating blocks 38 and 39 as if they were the same as block 1. You seem to be trying to adjust for too many things. > After fitting the model, I defined contrasts: > my_contrasts=makeContrasts(disease.A_minus_disease.B=control_1.local ization_A > - control_1.localization_B, levels=design_combi). > > lrt=glmLRT( fit, contrast=my_contrasts[, "disease.A_minus_disease.B"]) > > I was wondering how is logFC computed in this case (in this contrast), and > I wanted just to confirm that when the logFC >0, that means that the gene > is up-regulated in control_1.localization_A compared to > control_1.localization_B. What it means is that the gene would be higher in localization A than in B, if both localizations were applied to cases of the same gender and the same block. Given that the localizations A and B might actually be unbalanced by gender and block, the observed marginal differences between A and B could be quite different. > To check that, I tried to manually calculate just for one gene the logFC > value, but obviously I do not know how it is computed in this case. You can't compute it manually. The whole point of a package like edgeR is to do computations you can't do easily by yourself. Best wishes Gordon > Thank you very much in advance! > > Best regards, > Mike > > > On Sun, May 18, 2014 at 5:55 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Dear Mike, >> >> edgeR does compute logFC correctly, and a positive logFC does obviously >> mean that the counts are higher, other things being equal, in the second >> group than the first group. >> >> If you are making the comparison B-A (B minus A), then a positive logFC >> means expression is higher in B than in A. The topTable() function always >> output the contrast you are making at the top of the table, so that the >> table is unambiguous. >> >> I can't tell from your email what comparison you actually made. What is >> the non-ascii character is your email between "control_1.localization_A" >> and "control_1.localization_B"? Was it supposed to be a minus sign or >> something else? >> >> I don't know whether you are making a simple comparison between two >> groups, or whether you have fitted a more complex linear model. This can >> affect the interpration of logFCs. >> >> Best wishes >> Gordon >> >> >> Date: Fri, 16 May 2014 12:06:14 +0200 >>> From: Mike Miller <mike.bioc32 at="" gmail.com=""> >>> To: <bioconductor at="" stat.math.ethz.ch=""> >>> Subject: [BioC] edgeR, logFC calculation in factor combination >>> Message-ID: >>> <cankskzosyxfqkn3y=itkq4qj7dpp+da+umgpjku0de98ruu5bg at="" mail.="">>> gmail.com> >>> Content-Type: text/plain >>> >>> Dear All, >>> >>> >>> I am using edgeR for an RNASeq experiment (~30 samples), where I need to >>> explore the influence of 2 factors with 2 levels each (there are actually >>> more factors, but for simplicity I put only 2 here): >>> >>> 1. disease_state (levels: control_0 and control_1) >>> >>> 2. localization (levels: A and B) >>> >>> >>> I combined the factors and got 4 combinations: >>> >>> control_0.localization_A, control_1.localization_A, >>> control_0.localization_B, control_1.localization_B. >>> >>> I am specifically interested in a list of diff. expressed genes between >>> these 2 combinations: control_1.localization_A ??? >>> control_1.localization_B. >>> >>> When separating up- and down-regulated genes, I did it simply according to >>> logFC column (after filtering for genes with padj < 0.05). If the logFC<0, >>> could I infer that the gene is DOWN-regulated in control_1.localization_A? >>> I assume that this is true. >>> >>> However, when I take the raw counts for one of the diff. expressed genes >>> (for which logFC<0), in control_1.localization_A and in >>> control_1.localization_B conditions, it turned out to be the following: >>> >>> mean(control_1.localization_A) / mean(control_1.localization_B) = 4.2 >>> >>> (Formula explained in words: mean of raw counts for the diff. expressed >>> gene X in samples which are control_1 and localization_A, divided by mean >>> of raw counts for the diff. expressed gene X in samples which are >>> control_1 >>> and localization_B) >>> >>> This is contrary to the conclusion I got from logFC, since according to >>> raw counts that gene is UP-regulated in control_1.localization_A. >>> >>> I know that logFC is calculated differently, but shouldn't the ratio >>> (whether the gene is up- or down-regulated) stay nevertheless the same (if >>> library sizes are very similar in all samples, which is the case here)? >>> >>> Maybe a more precise question would be: how is logFC calculated in this >>> case, when we have a combination of different factors? >>> >>> Thank you very much in advance for any piece of clarification! >>> >>> Mike >>> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
ADD COMMENT
0
Entering edit mode
Dear Gordon, Thank you for your answer. I still have one more question regarding output of the lrt function. On Mon, May 19, 2014 at 11:47 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Mike, > > > On Mon, 19 May 2014, Mike Miller wrote: > > Dear Gordon, >> >> >> Thank you for your help. >> >> The contrast was: (control_1.localization_A) minus >> (control_1.localization_B). It was not a simple comparison between groups, >> the complete design looks like (in previous email I tried to simplify the >> question, for that reason, I excluded some factors in the question): >> > > Unfortunately, the phenomenon that you say you have observed could not > have occured in the simplified situation you reported, which made answering > the question difficult. > > > design_combi=model.matrix(~0+ ctrl_loc + Gender+Sample_blocking, >> data=combi) >> >> colnames(design_combi) >>> >> [1] "control_0.localization_A" "control_0.localization_B" >> "control_1.localization_A" >> [4] "control_1.localization_B" "Gender1" >> "Sample_blocking15" >> [7] "Sample_blocking16" "Sample_blocking17" "Sample_blocking18" >> [10] "Sample_blocking19" "Sample_blocking2" "Sample_blocking20" >> [13] "Sample_blocking21" "Sample_blocking22" "Sample_blocking23" >> [16] "Sample_blocking24" "Sample_blocking27" "Sample_blocking28" >> [19] "Sample_blocking3" "Sample_blocking33" "Sample_blocking34" >> [22] "Sample_blocking35" "Sample_blocking36" "Sample_blocking37" >> [25] "Sample_blocking38" "Sample_blocking39" "Sample_blocking6" >> [28] "Sample_blocking8" >> >> Factor "ctrl_loc" contains combinations of 2 factors (thus has 4 levels): >> 1. disease_state (levels: control_0 and control_1) >> 2. localization (levels: A and B) >> Levels: control_0.localization_A, control_1.localization_A, >> control_0.localization_B, control_1.localization_B. >> >> Factor "Gender" has 2 levels: 0 and 1. >> Factor "Sample_blocking" is here to account for the batch effect (some >> samples are coming from the same patient). >> Since the "Design matrix not of full rank" I excluded 2 columns from the >> "design_combi" (columns number 25 and 26) to be able to compute >> dispersions. >> > > This means that you are treating blocks 38 and 39 as if they were the same > as block 1. You seem to be trying to adjust for too many things. > > > After fitting the model, I defined contrasts: >> my_contrasts=makeContrasts(disease.A_minus_disease.B= >> control_1.localization_A >> - control_1.localization_B, levels=design_combi). >> >> lrt=glmLRT( fit, contrast=my_contrasts[, "disease.A_minus_disease.B"]) >> >> I was wondering how is logFC computed in this case (in this contrast), and >> I wanted just to confirm that when the logFC >0, that means that the gene >> is up-regulated in control_1.localization_A compared to >> control_1.localization_B. >> > > What it means is that the gene would be higher in localization A than in > B, if both localizations were applied to cases of the same gender and the > same block. > I am having difficulties in understanding how the testing works. I find it necessary to understand it in order to see which samples (combinations of which factors) will be used for qPCR validation. I'll try to be precise in describing what still troubles me: Assuming that there are 2 blocks: block_1 and block_2, and the "final list of DE genes" is actually the list of genes (the output of lrt()) with a certain p-val assigned to that gene after testing, and assuming that the design is as previously described: design_combi=model.matrix(~0+ ctrl_loc + Gender+Sample_blocking, data=combi), my question would be how this final list of DE genes is produced: are those all the genes that are differentially expressed (DE) between A_disease and B_disease if the comparison takes into account only males in the block_1 plus all the genes that are DE between A_disease and B_disease if the comparison takes into account only males in the block_2, plus all the genes that are DE between A_disease and B_disease if the comparison takes into account females in the block_1 plus all the genes that are DE between A_disease and B_disease if the comparison takes into account females in the block_2? Thanks a lot in advance! Best, Mike > > Given that the localizations A and B might actually be unbalanced by > gender and block, the observed marginal differences between A and B could > be quite different. > > > To check that, I tried to manually calculate just for one gene the logFC >> value, but obviously I do not know how it is computed in this case. >> > > You can't compute it manually. The whole point of a package like edgeR is > to do computations you can't do easily by yourself. > > Best wishes > Gordon > > > Thank you very much in advance! >> >> Best regards, >> Mike >> >> >> On Sun, May 18, 2014 at 5:55 AM, Gordon K Smyth <smyth@wehi.edu.au> >> wrote: >> >> Dear Mike, >>> >>> edgeR does compute logFC correctly, and a positive logFC does obviously >>> mean that the counts are higher, other things being equal, in the second >>> group than the first group. >>> >>> If you are making the comparison B-A (B minus A), then a positive logFC >>> means expression is higher in B than in A. The topTable() function >>> always >>> output the contrast you are making at the top of the table, so that the >>> table is unambiguous. >>> >>> I can't tell from your email what comparison you actually made. What is >>> the non-ascii character is your email between "control_1.localization_A" >>> and "control_1.localization_B"? Was it supposed to be a minus sign or >>> something else? >>> >>> I don't know whether you are making a simple comparison between two >>> groups, or whether you have fitted a more complex linear model. This can >>> affect the interpration of logFCs. >>> >>> Best wishes >>> Gordon >>> >>> >>> Date: Fri, 16 May 2014 12:06:14 +0200 >>> >>>> From: Mike Miller <mike.bioc32@gmail.com> >>>> To: <bioconductor@stat.math.ethz.ch> >>>> Subject: [BioC] edgeR, logFC calculation in factor combination >>>> Message-ID: >>>> <cankskzosyxfqkn3y=itkq4qj7dpp+da+umgpjku0de98ruu5bg@mail.>>>> gmail.com> >>>> Content-Type: text/plain >>>> >>>> Dear All, >>>> >>>> >>>> I am using edgeR for an RNASeq experiment (~30 samples), where I need to >>>> explore the influence of 2 factors with 2 levels each (there are >>>> actually >>>> more factors, but for simplicity I put only 2 here): >>>> >>>> 1. disease_state (levels: control_0 and control_1) >>>> >>>> 2. localization (levels: A and B) >>>> >>>> >>>> I combined the factors and got 4 combinations: >>>> >>>> control_0.localization_A, control_1.localization_A, >>>> control_0.localization_B, control_1.localization_B. >>>> >>>> I am specifically interested in a list of diff. expressed genes between >>>> these 2 combinations: control_1.localization_A – >>>> control_1.localization_B. >>>> >>>> When separating up- and down-regulated genes, I did it simply according >>>> to >>>> logFC column (after filtering for genes with padj < 0.05). If the >>>> logFC<0, >>>> could I infer that the gene is DOWN-regulated in >>>> control_1.localization_A? >>>> I assume that this is true. >>>> >>>> However, when I take the raw counts for one of the diff. expressed genes >>>> (for which logFC<0), in control_1.localization_A and in >>>> control_1.localization_B conditions, it turned out to be the following: >>>> >>>> mean(control_1.localization_A) / mean(control_1.localization_B) = 4.2 >>>> >>>> (Formula explained in words: mean of raw counts for the diff. expressed >>>> gene X in samples which are control_1 and localization_A, divided by >>>> mean >>>> of raw counts for the diff. expressed gene X in samples which are >>>> control_1 >>>> and localization_B) >>>> >>>> This is contrary to the conclusion I got from logFC, since according to >>>> raw counts that gene is UP-regulated in control_1.localization_A. >>>> >>>> I know that logFC is calculated differently, but shouldn't the ratio >>>> (whether the gene is up- or down-regulated) stay nevertheless the same >>>> (if >>>> library sizes are very similar in all samples, which is the case here)? >>>> >>>> Maybe a more precise question would be: how is logFC calculated in this >>>> case, when we have a combination of different factors? >>>> >>>> Thank you very much in advance for any piece of clarification! >>>> >>>> Mike >>>> >>>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the >>> addressee. >>> You must not disclose, forward, print or use it without the permission of >>> the sender. >>> ______________________________________________________________________ >>> >> >> > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
ADD REPLY
0
Entering edit mode
Dear Mike, On Fri, 30 May 2014, Mike Miller wrote: > Dear Gordon, > > Thank you for your answer. I still have one more question regarding output > of the lrt function. ... > I am having difficulties in understanding how the testing works. I find > it necessary to understand it in order to see which samples > (combinations of which factors) will be used for qPCR validation. Yes seem to assuming that you can reproduce the results of the edgeR analysis using just a subset of the samples. I am a bit puzzled why you would think that. You will need all the samples to reproduce the same results. > I'll try to be precise in describing what still troubles me: Assuming > that there are 2 blocks: block_1 and block_2, and the "final list of DE > genes" is actually the list of genes (the output of lrt()) with a > certain p-val assigned to that gene after testing, and assuming that the > design is as previously described: > > design_combi=model.matrix(~0+ ctrl_loc + Gender+Sample_blocking, > data=combi), > > my question would be how this final list of DE genes is produced: are > those all the genes that are differentially expressed (DE) between > A_disease and B_disease if the comparison takes into account only males > in the block_1 plus all the genes that are DE between A_disease and > B_disease if the comparison takes into account only males in the > block_2, plus all the genes that are DE between A_disease and B_disease > if the comparison takes into account females in the block_1 plus all the > genes that are DE between A_disease and B_disease if the comparison > takes into account females in the block_2? No, it's nothing like that at all. The model that you have fitted assumes that the same genes are DE in all blocks and all genders. Therefore the software accumulates information from all the blocks and all the samples. edgeR fits a log-linear glm to the data -- the mathematics behind this is explained in the published papers. Best wishes Gordon > Thanks a lot in advance! > > Best, > Mike ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLY

Login before adding your answer.

Traffic: 440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6