Limma to find differentially expressed genes

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

I have my matrix designed in the following way which I name as mat1 Probes sample1 sample1 sample2 sample2 sample3 sample3 sample4 sample4 rep1 rep2 rep1 rep2 rep1 rep2 rep1 rep2 ----------------------------------------------------- ------------------- gene1 5.098 5.076 5.072 4.677 7.450 7.456 8.564 8.555 gene2 8.906 8.903 6.700 6.653 6.749 6.754 7.546 7.540 gene3 7.409 7.398 5.392 5.432 6.715 6.724 5.345 5.330 gene4 4.876 4.869 5.864 5.981 4.280 4.290 4.267 4.255 gene4 3.567 3.560 3.554 3.425 8.500 8.564 6.345 6.330 gene5 2.569 2.560 8.600 8.645 5.225 5.234 7.345 7.333 I use the limma package to find the DEG's Group <- factor(c("p1", "p1", "p2", "p2","p3", "p4","p4") design <- model.matrix(~0 + Group) colnames(design) <- gsub("Group","", colnames(design)) fit <- lmFit(mat1[,1:4],design) contrast.matrix<-makeContrasts(p1-p2,levels=design) fit2<-contrasts.fit(fit,contrast.matrix) fit2<-eBayes(fit2) sel.diif<-p.adjust(fit2$F.p.value,method="fdr")<0.05 deg<-mat1[,1:4][sel.diif,] So will "deg" just give me those genes which are significant in sample one versus two. I am interested in those genes which are significant only in first sample but not in the second sample and am not sure if this is the right approach. -- output of sessionInfo(): R version 2.15.2 (2012-10-26) Platform: i686-redhat-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.14.4 csSAM_1.2.1 GOstats_2.24.0 RSQLite_0.10.0 DBI_0.2-5 [6] graph_1.36.2 Category_2.22.0 AnnotationDbi_1.20.5 affy_1.36.1 Biobase_2.16.0 [11] BiocGenerics_0.4.0 R.utils_1.23.2 R.oo_1.13.0 R.methodsS3_1.4.2 loaded via a namespace (and not attached): [1] affyio_1.22.0 annotate_1.36.0 AnnotationForge_1.0.3 BiocInstaller_1.8.3 genefilter_1.40.0 [6] GO.db_2.8.0 GSEABase_1.18.0 IRanges_1.16.6 parallel_2.15.2 preprocessCore_1.18.0 [11] RBGL_1.34.0 splines_2.15.2 stats4_2.15.2 survival_2.36-14 tools_2.15.2 [16] XML_3.9-4 xtable_1.6-0 zlibbioc_1.4.0 -- Sent via the guest posting facility at bioconductor.org.

GO limma GO limma • 1.7k views

ADD COMMENT • link updated 11.0 years ago by James W. MacDonald 65k • written 11.0 years ago by Guest User ★ 13k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 48 minutes ago

United States

Hi Sandy, On 4/19/2013 8:03 AM, Sandy [guest] wrote: > > I have my matrix designed in the following way which I name as mat1 > > Probes sample1 sample1 sample2 sample2 sample3 sample3 sample4 sample4 > rep1 rep2 rep1 rep2 rep1 rep2 rep1 rep2 > -------------------------------------------------- ---------------------- > gene1 5.098 5.076 5.072 4.677 7.450 7.456 8.564 8.555 > gene2 8.906 8.903 6.700 6.653 6.749 6.754 7.546 7.540 > gene3 7.409 7.398 5.392 5.432 6.715 6.724 5.345 5.330 > gene4 4.876 4.869 5.864 5.981 4.280 4.290 4.267 4.255 > gene4 3.567 3.560 3.554 3.425 8.500 8.564 6.345 6.330 > gene5 2.569 2.560 8.600 8.645 5.225 5.234 7.345 7.333 > > I use the limma package to find the DEG's > > Group<- factor(c("p1", "p1", "p2", "p2","p3", "p4","p4") > design<- model.matrix(~0 + Group) > colnames(design)<- gsub("Group","", colnames(design)) > fit<- lmFit(mat1[,1:4],design) > contrast.matrix<-makeContrasts(p1-p2,levels=design) > fit2<-contrasts.fit(fit,contrast.matrix) > fit2<-eBayes(fit2) > sel.diif<-p.adjust(fit2$F.p.value,method="fdr")<0.05 > deg<-mat1[,1:4][sel.diif,] > > So will "deg" just give me those genes which are significant in sample one versus two. I am interested in those genes which are significant only in first sample but not in the second sample and am not sure if this is the right approach. Two things. First, you shouldn't be digging around in the 'fit2' object if you don't know what you are doing. If you read the user's guide, you will see that the correct sequence of events for you would be (after the eBayes step): topGenes <- topTable(fit2, coef = 1, p.value = 0.05, number = Inf) Then if you want the underlying data deg <- mat[row.names(topGenes),1:4] Second, I think you might misunderstand a fundamental concept here. There is no such thing as genes being significant in one sample but not in the second. A microarray can't really tell you if a gene is expressed or not. All it can tell you is if a gene appears to be expressed at a significantly different level in one sample versus another. That is what you are testing here, whether or not genes in p1 are expressed at a different level than in p2. So this raises a question; when you say 'significant only in first sample but not in the second sample' are you instead asking for genes that are UP in p1 as compared to p2? If so, you can further filter results in the topGenes object by the logFC column, selecting only those with a positive value. Best, Jim > > > -- output of sessionInfo(): > > R version 2.15.2 (2012-10-26) > Platform: i686-redhat-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_3.14.4 csSAM_1.2.1 GOstats_2.24.0 RSQLite_0.10.0 DBI_0.2-5 > [6] graph_1.36.2 Category_2.22.0 AnnotationDbi_1.20.5 affy_1.36.1 Biobase_2.16.0 > [11] BiocGenerics_0.4.0 R.utils_1.23.2 R.oo_1.13.0 R.methodsS3_1.4.2 > > loaded via a namespace (and not attached): > [1] affyio_1.22.0 annotate_1.36.0 AnnotationForge_1.0.3 BiocInstaller_1.8.3 genefilter_1.40.0 > [6] GO.db_2.8.0 GSEABase_1.18.0 IRanges_1.16.6 parallel_2.15.2 preprocessCore_1.18.0 > [11] RBGL_1.34.0 splines_2.15.2 stats4_2.15.2 survival_2.36-14 tools_2.15.2 > [16] XML_3.9-4 xtable_1.6-0 zlibbioc_1.4.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 11.0 years ago James W. MacDonald 65k

Login before adding your answer.