Question: confused with edgeR: multiple groups vs one control
0
gravatar for Guest User
6.0 years ago by
Guest User12k
Guest User12k wrote:
Hello everyone! I'm just starting my MSc and I'm new to bioinformatics! I'm currently stucked in a DE analysis of Arabidopsis thaliana sRNA seq comparing a wild type (control group) vs several different mutants (geneA, geneB, geneC, doubleGeneAC, tripleGenesXYZ) with two biological replicates for each group (12 columns in the counts table); for what I've been looking, there aren't many examples of what to do in such cases and I would like to have some feedback regarding if I'm doing something wrong in the analysis! Basically what I do in edgeR is give the design the groups as factors (releveling the Wt as the control): [1] Ath_genA Ath_genA Ath_genB Ath_genB Ath_genC Ath_genC Ath_genAC Ath_genAC Ath_genXYZ Ath_genXYZ Ath_wt Ath_wt Levels: Ath_wt Ath_genA Ath_genB Ath_genC Ath_genAC Ath_genXYZ > design <- model.matrix(~0+groups) then, make the contrasts > contrasts <- makeContrasts( AvsWT = Ath_genA-Ath_wt, BvsWt = Ath_genB-Ath_wt, CvsWT = Ath_genC-Ath_wt, ACvsWT = Ath_genAC-Ath_wt, XYZvsWT = Ath_genXYZ-Ath_wt, ACvsA = Ath_genAC-Ath_genA, ACvsC = Ath_genAC-Ath_genC, AandCvsAC = (Ath_genA+Ath_genC)/2-Ath_genAC, levels=design) The rest is basically what I've read in the edgeR vignette: > d <- DGEList(counts = TableCountsFilter, genes= InfoFilter ,group = groups) > d <- calcNormFactors(d, method = "TMM") > d <- estimateGLMCommonDisp(d,design, verbose=T) > d <- estimateGLMTrendedDisp(d,design) > d <- estimateGLMTagwiseDisp(d,design) > fit <- glmFit(d, design) Then, I would perform a likelihood ratio test for each contrast > glmLRT(fit, contrast=contrasts[,for_each_Contrast]) to finally get the top DE miRNAs using topTags for a given contrast. For what I know, mutations in genA and genB produce similar phenotypes and some Northern blots confirm that miR156 is upregulated in those mutations compared to WT, while miR172 is downregulated compared to WT, yet in my results the logFC of miR156 is close to 0.5 for genA and -0.1 for genB. Am I doing something wrong with the analysis? I came across the function geneas() in limma which accounts for multiple groups testing against a single control but no homologue function for edgeR, could this be causing the conflict I'm seeing in the DEanalysis vs Northern blots? I sincerely appreciate your help! J. Rodriguez-Medina -- output of sessionInfo(): > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines grDevices datasets utils parallel stats graphics methods base other attached packages: [1] edgeR_3.2.4 limma_3.16.8 Biobase_2.20.1 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] tools_3.0.2 -- Sent via the guest posting facility at bioconductor.org.
ADD COMMENTlink modified 6.0 years ago by Mark Robinson870 • written 6.0 years ago by Guest User12k
Answer: confused with edgeR: multiple groups vs one control
0
gravatar for Mark Robinson
6.0 years ago by
Mark Robinson870
Mark Robinson870 wrote:
Dear Joel, > For what I know, mutations in genA and genB produce similar phenotypes and some Northern blots confirm that miR156 is upregulated in those mutations compared to WT, while miR172 is downregulated compared to WT, yet in my results the logFC of miR156 is close to 0.5 for genA and -0.1 for genB. Given that you have "positive" controls here, I first suggest making a plot of these individual features, something like: cps <- cpm(d) barplot(cps["identifier for mir172",]) barplot(cps["identifier for mir156",]) Before worrying about the statistics (I think your code is ok, but can't be 100% sure), does your count data recapitulate your expectation from northern blots ? Best regards, Mark On 22.11.2013, at 08:07, Joel Rodriguez-Medina [guest] <guest at="" bioconductor.org=""> wrote: > > Hello everyone! > > I'm just starting my MSc and I'm new to bioinformatics! I'm currently stucked in a DE analysis of Arabidopsis thaliana sRNA seq comparing a wild type (control group) vs several different mutants (geneA, geneB, geneC, doubleGeneAC, tripleGenesXYZ) with two biological replicates for each group (12 columns in the counts table); for what I've been looking, there aren't many examples of what to do in such cases and I would like to have some feedback regarding if I'm doing something wrong in the analysis! > > Basically what I do in edgeR is give the design the groups as factors (releveling the Wt as the control): > > [1] Ath_genA Ath_genA Ath_genB Ath_genB Ath_genC Ath_genC Ath_genAC Ath_genAC Ath_genXYZ Ath_genXYZ Ath_wt Ath_wt > > Levels: Ath_wt Ath_genA Ath_genB Ath_genC Ath_genAC Ath_genXYZ > >> design <- model.matrix(~0+groups) > > then, make the contrasts > > >> contrasts <- makeContrasts( > AvsWT = Ath_genA-Ath_wt, > BvsWt = Ath_genB-Ath_wt, > CvsWT = Ath_genC-Ath_wt, > ACvsWT = Ath_genAC-Ath_wt, > XYZvsWT = Ath_genXYZ-Ath_wt, > ACvsA = Ath_genAC-Ath_genA, > ACvsC = Ath_genAC-Ath_genC, > AandCvsAC = (Ath_genA+Ath_genC)/2-Ath_genAC, > levels=design) > > The rest is basically what I've read in the edgeR vignette: >> d <- DGEList(counts = TableCountsFilter, genes= InfoFilter ,group = groups) >> d <- calcNormFactors(d, method = "TMM") >> d <- estimateGLMCommonDisp(d,design, verbose=T) >> d <- estimateGLMTrendedDisp(d,design) >> d <- estimateGLMTagwiseDisp(d,design) >> fit <- glmFit(d, design) > > Then, I would perform a likelihood ratio test for each contrast > >> glmLRT(fit, contrast=contrasts[,for_each_Contrast]) > to finally get the top DE miRNAs using topTags for a given contrast. > > > For what I know, mutations in genA and genB produce similar phenotypes and some Northern blots confirm that miR156 is upregulated in those mutations compared to WT, while miR172 is downregulated compared to WT, yet in my results the logFC of miR156 is close to 0.5 for genA and -0.1 for genB. > > Am I doing something wrong with the analysis? > > I came across the function geneas() in limma which accounts for multiple groups testing against a single control but no homologue function for edgeR, could this be causing the conflict I'm seeing in the DEanalysis vs Northern blots? > > > I sincerely appreciate your help! > > > J. Rodriguez-Medina > > > > > > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 > [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] splines grDevices datasets utils parallel stats graphics methods base > > other attached packages: > [1] edgeR_3.2.4 limma_3.16.8 Biobase_2.20.1 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] tools_3.0.2 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------- Prof. Dr. Mark Robinson Bioinformatics, Institute of Molecular Life Sciences University of Zurich http://tiny.cc/mrobin
ADD COMMENTlink written 6.0 years ago by Mark Robinson870
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 377 users visited in the last hour