#Identify differentially expressed genes

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Hi all, I have an illumina dataset, VST transformed and normalized. 4 samples each in triplicates, this is the sampleType= sampleType <- c('A','A','A','B','B','B','C','D','D','C','C','D') Now I would like to do perform each pairwise comparison, A vs B, A vs C, A vs D...etc.) but I am confused how to set the colnames(design) here i what i did if (require(limma)) { sampleType <- c('TG_TAC','TG_TAC','TG_TAC','WT_TAC','WT_TAC','WT_TAC', 'WT_SHAM','TG_SHAM','TG_SHAM','WT_SHAM','WT_SHAM','TG_SHAM') ## compare 'A' and 'B' design <- model.matrix(~ factor(sampleType)) colnames(design) <-c('A','B' ) fit <- lmFit(selDataMatrix, design) fit <- eBayes(fit) ## Add gene symbols to gene properties if (require(lumiMouseAll.db) & require(annotate)) { geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db') geneName <- sapply(lookUp(probeList, 'lumiMouseAll.db', 'GENENAME'), function(x) x[1]) fit$genes <- data.frame(ID= probeList, geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE) } # print the top 50 genes print(topTable(fit, adjust='fdr', number=5)) ## get significant gene list with FDR adjusted p.values less than 0.01 p.adj <- p.adjust(fit$p.value[,2]) sigGene.adj <- probeList[ p.adj < 0.01] ## without FDR adjustment sigGene <- probeList[ fit$p.value[,2] < 0.01] } how do I properly set up each pairwise comparison? thanks paolo -- output of sessionInfo(): R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C [5] LC_TIME=Italian_Italy.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] annotate_1.34.1 lumiMouseAll.db_1.18.0 org.Mm.eg.db_2.7.1 limma_3.12.1 [5] lumiMouseIDMapping_1.10.0 RSQLite_0.11.1 DBI_0.2-5 AnnotationDbi_1.18.1 [9] lumi_2.8.0 nleqslv_1.9.3 methylumi_2.2.0 ggplot2_0.9.1 [13] reshape2_1.2.1 scales_0.2.1 Biobase_2.16.0 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] affy_1.34.0 affyio_1.24.0 bigmemory_4.2.11 BiocInstaller_1.4.7 Biostrings_2.24.1 [6] bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 [11] DNAcopy_1.30.0 GenomicRanges_1.8.7 genoset_1.6.0 grid_2.15.0 hdrcde_2.16 [16] IRanges_1.14.4 KernSmooth_2.23-8 labeling_0.1 lattice_0.20-6 MASS_7.3-19 [21] Matrix_1.0-7 memoise_0.1 mgcv_1.7-18 munsell_0.3 nlme_3.1-104 [26] plyr_1.7.1 preprocessCore_1.18.0 proto_0.3-9.2 RColorBrewer_1.0-5 RCurl_1.91-1.1 [31] Rsamtools_1.8.5 rtracklayer_1.16.2 stats4_2.15.0 stringr_0.6 tools_2.15.0 [36] XML_3.9-4.1 xtable_1.7-0 zlibbioc_1.2.0 -- Sent via the guest posting facility at bioconductor.org.

• 1.4k views

ADD COMMENT • link updated 11.8 years ago by yao chen ▴ 210 • written 11.8 years ago by Guest User ★ 13k

0

Entering edit mode

yao chen ▴ 210

@yao-chen-5205

Last seen 9.6 years ago

Hi Paolo, You just need to put all 4 groups in the design matrix. It's a clear example in the Limma userguide.: > f <- factor(targets$Target, levels=c("RNA1","RNA2","RNA3")) > design <- model.matrix(~0+f) > colnames(design) <- c("RNA1","RNA2","RNA3") To make all pair-wise comparisons between the three groups one could proceed > fit <- lmFit(eset, design) > contrast.matrix <- makeContrasts(RNA2-RNA1, RNA3-RNA2, RNA3-RNA1, + levels=design) > fit2 <- contrasts.fit(fit, contrast.matrix) > fit2 <- eBayes(fit2) A list of top genes for RNA2 versus RNA1 can be obtained from > topTable(fit2, coef=1, adjust="BH") Best, Jack 2012/7/12 Paolo [guest] <guest@bioconductor.org> > > Hi all, > I have an illumina dataset, VST transformed and normalized. 4 samples each > in triplicates, > this is the sampleType= > > sampleType <- c('A','A','A','B','B','B','C','D','D','C','C','D') > Now I would like to do perform each pairwise comparison, A vs B, A vs C, A > vs D...etc.) > > but I am confused how to set the colnames(design) > > here i what i did > > if (require(limma)) { > sampleType <- > c('TG_TAC','TG_TAC','TG_TAC','WT_TAC','WT_TAC','WT_TAC','WT_SHAM','T G_SHAM','TG_SHAM','WT_SHAM','WT_SHAM','TG_SHAM') > ## compare 'A' and 'B' > design <- model.matrix(~ factor(sampleType)) > colnames(design) <-c('A','B' ) > fit <- lmFit(selDataMatrix, design) > fit <- eBayes(fit) > ## Add gene symbols to gene properties > if (require(lumiMouseAll.db) & require(annotate)) { > geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db') > geneName <- sapply(lookUp(probeList, 'lumiMouseAll.db', > 'GENENAME'), function(x) x[1]) > fit$genes <- data.frame(ID= probeList, > geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE) > } > # print the top 50 genes > print(topTable(fit, adjust='fdr', number=5)) > > ## get significant gene list with FDR adjusted p.values less than 0.01 > p.adj <- p.adjust(fit$p.value[,2]) > sigGene.adj <- probeList[ p.adj < 0.01] > ## without FDR adjustment > sigGene <- probeList[ fit$p.value[,2] < 0.01] > } > > > > how do I properly set up each pairwise comparison? > > thanks > paolo > > > > > > > -- output of sessionInfo(): > > R version 2.15.0 (2012-03-30) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 > LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C > [5] LC_TIME=Italian_Italy.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] annotate_1.34.1 lumiMouseAll.db_1.18.0 > org.Mm.eg.db_2.7.1 limma_3.12.1 > [5] lumiMouseIDMapping_1.10.0 RSQLite_0.11.1 DBI_0.2-5 > AnnotationDbi_1.18.1 > [9] lumi_2.8.0 nleqslv_1.9.3 methylumi_2.2.0 > ggplot2_0.9.1 > [13] reshape2_1.2.1 scales_0.2.1 Biobase_2.16.0 > BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] affy_1.34.0 affyio_1.24.0 bigmemory_4.2.11 > BiocInstaller_1.4.7 Biostrings_2.24.1 > [6] bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1 > dichromat_1.2-4 digest_0.5.2 > [11] DNAcopy_1.30.0 GenomicRanges_1.8.7 genoset_1.6.0 > grid_2.15.0 hdrcde_2.16 > [16] IRanges_1.14.4 KernSmooth_2.23-8 labeling_0.1 > lattice_0.20-6 MASS_7.3-19 > [21] Matrix_1.0-7 memoise_0.1 mgcv_1.7-18 > munsell_0.3 nlme_3.1-104 > [26] plyr_1.7.1 preprocessCore_1.18.0 proto_0.3-9.2 > RColorBrewer_1.0-5 RCurl_1.91-1.1 > [31] Rsamtools_1.8.5 rtracklayer_1.16.2 stats4_2.15.0 > stringr_0.6 tools_2.15.0 > [36] XML_3.9-4.1 xtable_1.7-0 zlibbioc_1.2.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.8 years ago yao chen ▴ 210

0

Entering edit mode

Thanks but there is still something that i miss, here is the code that I used, I set up sampletype, 4 samples, in triplicates, respect to the position if the dataMatrix with my normalized expression values: dataMatrix <- exprs(lumi.N.Q) presentCount <- detectionCall(x.lumi) selDataMatrix <- dataMatrix[presentCount > 0,] probeList <- rownames(selDataMatrix) sampleType <- c('A','A','A','B','B','B','C','D','D','C','C','D') then I created a design matrix. design <- model.matrix(~ factor(sampleType)) colnames(design) <- c('A','B','C','D') fit1 <- lmFit(selDataMatrix, design) constrast.matrix <- makeContrasts (A-B,C-D,levels=design) fit1_2 <- contrasts.fit(fit1,constrast.matrix) fit1_2 <- eBayes(fit1_2) topTable(fit1_2,coef=2, adjust="BH") and for instance I get top genes in coef=2 -> C-D I tried to change sampletype excluding sample C like this: sampleType <- c('A','A','A','B','B','B','x','D','D','x','x,'D') run the same command above and I obtain some other topgenes, I would rather expect to encounter some error because sample C is not anymore defined, where am i wrong? 2012/7/12 Yao Chen <chenyao.bioinfor at="" gmail.com="">: > Hi Paolo, > > You just need to put all 4 groups in the design matrix. It's a clear example > in the Limma userguide.: > >> f <- factor(targets$Target, levels=c("RNA1","RNA2","RNA3")) >> design <- model.matrix(~0+f) >> colnames(design) <- c("RNA1","RNA2","RNA3") > To make all pair-wise comparisons between the three groups one could proceed >> fit <- lmFit(eset, design) >> contrast.matrix <- makeContrasts(RNA2-RNA1, RNA3-RNA2, RNA3-RNA1, > + levels=design) >> fit2 <- contrasts.fit(fit, contrast.matrix) >> fit2 <- eBayes(fit2) > A list of top genes for RNA2 versus RNA1 can be obtained from >> topTable(fit2, coef=1, adjust="BH") > > Best, > > Jack > > 2012/7/12 Paolo [guest] <guest at="" bioconductor.org=""> >> >> >> Hi all, >> I have an illumina dataset, VST transformed and normalized. 4 samples each >> in triplicates, >> this is the sampleType= >> >> sampleType <- c('A','A','A','B','B','B','C','D','D','C','C','D') >> Now I would like to do perform each pairwise comparison, A vs B, A vs C, A >> vs D...etc.) >> >> but I am confused how to set the colnames(design) >> >> here i what i did >> >> if (require(limma)) { >> sampleType <- >> c('TG_TAC','TG_TAC','TG_TAC','WT_TAC','WT_TAC','WT_TAC','WT_SHAM',' TG_SHAM','TG_SHAM','WT_SHAM','WT_SHAM','TG_SHAM') >> ## compare 'A' and 'B' >> design <- model.matrix(~ factor(sampleType)) >> colnames(design) <-c('A','B' ) >> fit <- lmFit(selDataMatrix, design) >> fit <- eBayes(fit) >> ## Add gene symbols to gene properties >> if (require(lumiMouseAll.db) & require(annotate)) { >> geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db') >> geneName <- sapply(lookUp(probeList, 'lumiMouseAll.db', >> 'GENENAME'), function(x) x[1]) >> fit$genes <- data.frame(ID= probeList, >> geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE) >> } >> # print the top 50 genes >> print(topTable(fit, adjust='fdr', number=5)) >> >> ## get significant gene list with FDR adjusted p.values less than 0.01 >> p.adj <- p.adjust(fit$p.value[,2]) >> sigGene.adj <- probeList[ p.adj < 0.01] >> ## without FDR adjustment >> sigGene <- probeList[ fit$p.value[,2] < 0.01] >> } >> >> >> >> how do I properly set up each pairwise comparison? >> >> thanks >> paolo >> >> >> >> >> >> >> -- output of sessionInfo(): >> >> R version 2.15.0 (2012-03-30) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 >> LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C >> [5] LC_TIME=Italian_Italy.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] annotate_1.34.1 lumiMouseAll.db_1.18.0 >> org.Mm.eg.db_2.7.1 limma_3.12.1 >> [5] lumiMouseIDMapping_1.10.0 RSQLite_0.11.1 DBI_0.2-5 >> AnnotationDbi_1.18.1 >> [9] lumi_2.8.0 nleqslv_1.9.3 methylumi_2.2.0 >> ggplot2_0.9.1 >> [13] reshape2_1.2.1 scales_0.2.1 Biobase_2.16.0 >> BiocGenerics_0.2.0 >> >> loaded via a namespace (and not attached): >> [1] affy_1.34.0 affyio_1.24.0 bigmemory_4.2.11 >> BiocInstaller_1.4.7 Biostrings_2.24.1 >> [6] bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1 >> dichromat_1.2-4 digest_0.5.2 >> [11] DNAcopy_1.30.0 GenomicRanges_1.8.7 genoset_1.6.0 >> grid_2.15.0 hdrcde_2.16 >> [16] IRanges_1.14.4 KernSmooth_2.23-8 labeling_0.1 >> lattice_0.20-6 MASS_7.3-19 >> [21] Matrix_1.0-7 memoise_0.1 mgcv_1.7-18 >> munsell_0.3 nlme_3.1-104 >> [26] plyr_1.7.1 preprocessCore_1.18.0 proto_0.3-9.2 >> RColorBrewer_1.0-5 RCurl_1.91-1.1 >> [31] Rsamtools_1.8.5 rtracklayer_1.16.2 stats4_2.15.0 >> stringr_0.6 tools_2.15.0 >> [36] XML_3.9-4.1 xtable_1.7-0 zlibbioc_1.2.0 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 11.8 years ago Paolo Kunderfranco ▴ 350

0

Entering edit mode

Hi Paolo, Did you still use "colnames(design) <- c('A','B','C','D')"? 2012/7/12 Paolo Kunderfranco <paolo.kunderfranco@gmail.com> > Thanks but there is still something that i miss, here is the code that > I used, I set up sampletype, 4 samples, in triplicates, respect to the > position if the dataMatrix with my normalized expression values: > > dataMatrix <- exprs(lumi.N.Q) > presentCount <- detectionCall(x.lumi) > selDataMatrix <- dataMatrix[presentCount > 0,] > probeList <- rownames(selDataMatrix) > sampleType <- c('A','A','A','B','B','B','C','D','D','C','C','D') > > then I created a design matrix. > > design <- model.matrix(~ factor(sampleType)) > colnames(design) <- c('A','B','C','D') > fit1 <- lmFit(selDataMatrix, design) > constrast.matrix <- makeContrasts (A-B,C-D,levels=design) > fit1_2 <- contrasts.fit(fit1,constrast.matrix) > fit1_2 <- eBayes(fit1_2) > topTable(fit1_2,coef=2, adjust="BH") > > and for instance I get top genes in coef=2 -> C-D > I tried to change sampletype excluding sample C like this: > sampleType <- c('A','A','A','B','B','B','x','D','D','x','x,'D') > run the same command above and I obtain some other topgenes, I would > rather expect to encounter some error because sample C is not anymore > defined, where am i wrong? > > > > > > 2012/7/12 Yao Chen <chenyao.bioinfor@gmail.com>: > > Hi Paolo, > > > > You just need to put all 4 groups in the design matrix. It's a clear > example > > in the Limma userguide.: > > > >> f <- factor(targets$Target, levels=c("RNA1","RNA2","RNA3")) > >> design <- model.matrix(~0+f) > >> colnames(design) <- c("RNA1","RNA2","RNA3") > > To make all pair-wise comparisons between the three groups one could > proceed > >> fit <- lmFit(eset, design) > >> contrast.matrix <- makeContrasts(RNA2-RNA1, RNA3-RNA2, RNA3-RNA1, > > + levels=design) > >> fit2 <- contrasts.fit(fit, contrast.matrix) > >> fit2 <- eBayes(fit2) > > A list of top genes for RNA2 versus RNA1 can be obtained from > >> topTable(fit2, coef=1, adjust="BH") > > > > Best, > > > > Jack > > > > 2012/7/12 Paolo [guest] <guest@bioconductor.org> > >> > >> > >> Hi all, > >> I have an illumina dataset, VST transformed and normalized. 4 samples > each > >> in triplicates, > >> this is the sampleType= > >> > >> sampleType <- c('A','A','A','B','B','B','C','D','D','C','C','D') > >> Now I would like to do perform each pairwise comparison, A vs B, A vs > C, A > >> vs D...etc.) > >> > >> but I am confused how to set the colnames(design) > >> > >> here i what i did > >> > >> if (require(limma)) { > >> sampleType <- > >> > c('TG_TAC','TG_TAC','TG_TAC','WT_TAC','WT_TAC','WT_TAC','WT_SHAM','T G_SHAM','TG_SHAM','WT_SHAM','WT_SHAM','TG_SHAM') > >> ## compare 'A' and 'B' > >> design <- model.matrix(~ factor(sampleType)) > >> colnames(design) <-c('A','B' ) > >> fit <- lmFit(selDataMatrix, design) > >> fit <- eBayes(fit) > >> ## Add gene symbols to gene properties > >> if (require(lumiMouseAll.db) & require(annotate)) { > >> geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db') > >> geneName <- sapply(lookUp(probeList, 'lumiMouseAll.db', > >> 'GENENAME'), function(x) x[1]) > >> fit$genes <- data.frame(ID= probeList, > >> geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE) > >> } > >> # print the top 50 genes > >> print(topTable(fit, adjust='fdr', number=5)) > >> > >> ## get significant gene list with FDR adjusted p.values less than 0.01 > >> p.adj <- p.adjust(fit$p.value[,2]) > >> sigGene.adj <- probeList[ p.adj < 0.01] > >> ## without FDR adjustment > >> sigGene <- probeList[ fit$p.value[,2] < 0.01] > >> } > >> > >> > >> > >> how do I properly set up each pairwise comparison? > >> > >> thanks > >> paolo > >> > >> > >> > >> > >> > >> > >> -- output of sessionInfo(): > >> > >> R version 2.15.0 (2012-03-30) > >> Platform: i386-pc-mingw32/i386 (32-bit) > >> > >> locale: > >> [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 > >> LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C > >> [5] LC_TIME=Italian_Italy.1252 > >> > >> attached base packages: > >> [1] stats graphics grDevices utils datasets methods base > >> > >> other attached packages: > >> [1] annotate_1.34.1 lumiMouseAll.db_1.18.0 > >> org.Mm.eg.db_2.7.1 limma_3.12.1 > >> [5] lumiMouseIDMapping_1.10.0 RSQLite_0.11.1 DBI_0.2-5 > >> AnnotationDbi_1.18.1 > >> [9] lumi_2.8.0 nleqslv_1.9.3 methylumi_2.2.0 > >> ggplot2_0.9.1 > >> [13] reshape2_1.2.1 scales_0.2.1 Biobase_2.16.0 > >> BiocGenerics_0.2.0 > >> > >> loaded via a namespace (and not attached): > >> [1] affy_1.34.0 affyio_1.24.0 bigmemory_4.2.11 > >> BiocInstaller_1.4.7 Biostrings_2.24.1 > >> [6] bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1 > >> dichromat_1.2-4 digest_0.5.2 > >> [11] DNAcopy_1.30.0 GenomicRanges_1.8.7 genoset_1.6.0 > >> grid_2.15.0 hdrcde_2.16 > >> [16] IRanges_1.14.4 KernSmooth_2.23-8 labeling_0.1 > >> lattice_0.20-6 MASS_7.3-19 > >> [21] Matrix_1.0-7 memoise_0.1 mgcv_1.7-18 > >> munsell_0.3 nlme_3.1-104 > >> [26] plyr_1.7.1 preprocessCore_1.18.0 proto_0.3-9.2 > >> RColorBrewer_1.0-5 RCurl_1.91-1.1 > >> [31] Rsamtools_1.8.5 rtracklayer_1.16.2 stats4_2.15.0 > >> stringr_0.6 tools_2.15.0 > >> [36] XML_3.9-4.1 xtable_1.7-0 zlibbioc_1.2.0 > >> > >> -- > >> Sent via the guest posting facility at bioconductor.org. > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 11.8 years ago yao chen ▴ 210

Login before adding your answer.