#Identify differentially expressed genes
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.3 years ago
Hi all, I have an illumina dataset, VST transformed and normalized. 4 samples each in triplicates, this is the sampleType= sampleType <- c('A','A','A','B','B','B','C','D','D','C','C','D') Now I would like to do perform each pairwise comparison, A vs B, A vs C, A vs D...etc.) but I am confused how to set the colnames(design) here i what i did if (require(limma)) { sampleType <- c('TG_TAC','TG_TAC','TG_TAC','WT_TAC','WT_TAC','WT_TAC', 'WT_SHAM','TG_SHAM','TG_SHAM','WT_SHAM','WT_SHAM','TG_SHAM') ## compare 'A' and 'B' design <- model.matrix(~ factor(sampleType)) colnames(design) <-c('A','B' ) fit <- lmFit(selDataMatrix, design) fit <- eBayes(fit) ## Add gene symbols to gene properties if (require(lumiMouseAll.db) & require(annotate)) { geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db') geneName <- sapply(lookUp(probeList, 'lumiMouseAll.db', 'GENENAME'), function(x) x[1]) fit$genes <- data.frame(ID= probeList, geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE) } # print the top 50 genes print(topTable(fit, adjust='fdr', number=5)) ## get significant gene list with FDR adjusted p.values less than 0.01 p.adj <- p.adjust(fit$p.value[,2]) sigGene.adj <- probeList[ p.adj < 0.01] ## without FDR adjustment sigGene <- probeList[ fit$p.value[,2] < 0.01] } how do I properly set up each pairwise comparison? thanks paolo -- output of sessionInfo(): R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C [5] LC_TIME=Italian_Italy.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] annotate_1.34.1 lumiMouseAll.db_1.18.0 org.Mm.eg.db_2.7.1 limma_3.12.1 [5] lumiMouseIDMapping_1.10.0 RSQLite_0.11.1 DBI_0.2-5 AnnotationDbi_1.18.1 [9] lumi_2.8.0 nleqslv_1.9.3 methylumi_2.2.0 ggplot2_0.9.1 [13] reshape2_1.2.1 scales_0.2.1 Biobase_2.16.0 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] affy_1.34.0 affyio_1.24.0 bigmemory_4.2.11 BiocInstaller_1.4.7 Biostrings_2.24.1 [6] bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 [11] DNAcopy_1.30.0 GenomicRanges_1.8.7 genoset_1.6.0 grid_2.15.0 hdrcde_2.16 [16] IRanges_1.14.4 KernSmooth_2.23-8 labeling_0.1 lattice_0.20-6 MASS_7.3-19 [21] Matrix_1.0-7 memoise_0.1 mgcv_1.7-18 munsell_0.3 nlme_3.1-104 [26] plyr_1.7.1 preprocessCore_1.18.0 proto_0.3-9.2 RColorBrewer_1.0-5 RCurl_1.91-1.1 [31] Rsamtools_1.8.5 rtracklayer_1.16.2 stats4_2.15.0 stringr_0.6 tools_2.15.0 [36] XML_3.9-4.1 xtable_1.7-0 zlibbioc_1.2.0 -- Sent via the guest posting facility at bioconductor.org.
• 1.5k views
ADD COMMENT
0
Entering edit mode
yao chen ▴ 210
@yao-chen-5205
Last seen 10.3 years ago
Hi Paolo, You just need to put all 4 groups in the design matrix. It's a clear example in the Limma userguide.: > f <- factor(targets$Target, levels=c("RNA1","RNA2","RNA3")) > design <- model.matrix(~0+f) > colnames(design) <- c("RNA1","RNA2","RNA3") To make all pair-wise comparisons between the three groups one could proceed > fit <- lmFit(eset, design) > contrast.matrix <- makeContrasts(RNA2-RNA1, RNA3-RNA2, RNA3-RNA1, + levels=design) > fit2 <- contrasts.fit(fit, contrast.matrix) > fit2 <- eBayes(fit2) A list of top genes for RNA2 versus RNA1 can be obtained from > topTable(fit2, coef=1, adjust="BH") Best, Jack 2012/7/12 Paolo [guest] <guest@bioconductor.org> > > Hi all, > I have an illumina dataset, VST transformed and normalized. 4 samples each > in triplicates, > this is the sampleType= > > sampleType <- c('A','A','A','B','B','B','C','D','D','C','C','D') > Now I would like to do perform each pairwise comparison, A vs B, A vs C, A > vs D...etc.) > > but I am confused how to set the colnames(design) > > here i what i did > > if (require(limma)) { > sampleType <- > c('TG_TAC','TG_TAC','TG_TAC','WT_TAC','WT_TAC','WT_TAC','WT_SHAM','T G_SHAM','TG_SHAM','WT_SHAM','WT_SHAM','TG_SHAM') > ## compare 'A' and 'B' > design <- model.matrix(~ factor(sampleType)) > colnames(design) <-c('A','B' ) > fit <- lmFit(selDataMatrix, design) > fit <- eBayes(fit) > ## Add gene symbols to gene properties > if (require(lumiMouseAll.db) & require(annotate)) { > geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db') > geneName <- sapply(lookUp(probeList, 'lumiMouseAll.db', > 'GENENAME'), function(x) x[1]) > fit$genes <- data.frame(ID= probeList, > geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE) > } > # print the top 50 genes > print(topTable(fit, adjust='fdr', number=5)) > > ## get significant gene list with FDR adjusted p.values less than 0.01 > p.adj <- p.adjust(fit$p.value[,2]) > sigGene.adj <- probeList[ p.adj < 0.01] > ## without FDR adjustment > sigGene <- probeList[ fit$p.value[,2] < 0.01] > } > > > > how do I properly set up each pairwise comparison? > > thanks > paolo > > > > > > > -- output of sessionInfo(): > > R version 2.15.0 (2012-03-30) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 > LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C > [5] LC_TIME=Italian_Italy.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] annotate_1.34.1 lumiMouseAll.db_1.18.0 > org.Mm.eg.db_2.7.1 limma_3.12.1 > [5] lumiMouseIDMapping_1.10.0 RSQLite_0.11.1 DBI_0.2-5 > AnnotationDbi_1.18.1 > [9] lumi_2.8.0 nleqslv_1.9.3 methylumi_2.2.0 > ggplot2_0.9.1 > [13] reshape2_1.2.1 scales_0.2.1 Biobase_2.16.0 > BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] affy_1.34.0 affyio_1.24.0 bigmemory_4.2.11 > BiocInstaller_1.4.7 Biostrings_2.24.1 > [6] bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1 > dichromat_1.2-4 digest_0.5.2 > [11] DNAcopy_1.30.0 GenomicRanges_1.8.7 genoset_1.6.0 > grid_2.15.0 hdrcde_2.16 > [16] IRanges_1.14.4 KernSmooth_2.23-8 labeling_0.1 > lattice_0.20-6 MASS_7.3-19 > [21] Matrix_1.0-7 memoise_0.1 mgcv_1.7-18 > munsell_0.3 nlme_3.1-104 > [26] plyr_1.7.1 preprocessCore_1.18.0 proto_0.3-9.2 > RColorBrewer_1.0-5 RCurl_1.91-1.1 > [31] Rsamtools_1.8.5 rtracklayer_1.16.2 stats4_2.15.0 > stringr_0.6 tools_2.15.0 > [36] XML_3.9-4.1 xtable_1.7-0 zlibbioc_1.2.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Thanks but there is still something that i miss, here is the code that I used, I set up sampletype, 4 samples, in triplicates, respect to the position if the dataMatrix with my normalized expression values: dataMatrix <- exprs(lumi.N.Q) presentCount <- detectionCall(x.lumi) selDataMatrix <- dataMatrix[presentCount > 0,] probeList <- rownames(selDataMatrix) sampleType <- c('A','A','A','B','B','B','C','D','D','C','C','D') then I created a design matrix. design <- model.matrix(~ factor(sampleType)) colnames(design) <- c('A','B','C','D') fit1 <- lmFit(selDataMatrix, design) constrast.matrix <- makeContrasts (A-B,C-D,levels=design) fit1_2 <- contrasts.fit(fit1,constrast.matrix) fit1_2 <- eBayes(fit1_2) topTable(fit1_2,coef=2, adjust="BH") and for instance I get top genes in coef=2 -> C-D I tried to change sampletype excluding sample C like this: sampleType <- c('A','A','A','B','B','B','x','D','D','x','x,'D') run the same command above and I obtain some other topgenes, I would rather expect to encounter some error because sample C is not anymore defined, where am i wrong? 2012/7/12 Yao Chen <chenyao.bioinfor at="" gmail.com="">: > Hi Paolo, > > You just need to put all 4 groups in the design matrix. It's a clear example > in the Limma userguide.: > >> f <- factor(targets$Target, levels=c("RNA1","RNA2","RNA3")) >> design <- model.matrix(~0+f) >> colnames(design) <- c("RNA1","RNA2","RNA3") > To make all pair-wise comparisons between the three groups one could proceed >> fit <- lmFit(eset, design) >> contrast.matrix <- makeContrasts(RNA2-RNA1, RNA3-RNA2, RNA3-RNA1, > + levels=design) >> fit2 <- contrasts.fit(fit, contrast.matrix) >> fit2 <- eBayes(fit2) > A list of top genes for RNA2 versus RNA1 can be obtained from >> topTable(fit2, coef=1, adjust="BH") > > Best, > > Jack > > 2012/7/12 Paolo [guest] <guest at="" bioconductor.org=""> >> >> >> Hi all, >> I have an illumina dataset, VST transformed and normalized. 4 samples each >> in triplicates, >> this is the sampleType= >> >> sampleType <- c('A','A','A','B','B','B','C','D','D','C','C','D') >> Now I would like to do perform each pairwise comparison, A vs B, A vs C, A >> vs D...etc.) >> >> but I am confused how to set the colnames(design) >> >> here i what i did >> >> if (require(limma)) { >> sampleType <- >> c('TG_TAC','TG_TAC','TG_TAC','WT_TAC','WT_TAC','WT_TAC','WT_SHAM',' TG_SHAM','TG_SHAM','WT_SHAM','WT_SHAM','TG_SHAM') >> ## compare 'A' and 'B' >> design <- model.matrix(~ factor(sampleType)) >> colnames(design) <-c('A','B' ) >> fit <- lmFit(selDataMatrix, design) >> fit <- eBayes(fit) >> ## Add gene symbols to gene properties >> if (require(lumiMouseAll.db) & require(annotate)) { >> geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db') >> geneName <- sapply(lookUp(probeList, 'lumiMouseAll.db', >> 'GENENAME'), function(x) x[1]) >> fit$genes <- data.frame(ID= probeList, >> geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE) >> } >> # print the top 50 genes >> print(topTable(fit, adjust='fdr', number=5)) >> >> ## get significant gene list with FDR adjusted p.values less than 0.01 >> p.adj <- p.adjust(fit$p.value[,2]) >> sigGene.adj <- probeList[ p.adj < 0.01] >> ## without FDR adjustment >> sigGene <- probeList[ fit$p.value[,2] < 0.01] >> } >> >> >> >> how do I properly set up each pairwise comparison? >> >> thanks >> paolo >> >> >> >> >> >> >> -- output of sessionInfo(): >> >> R version 2.15.0 (2012-03-30) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 >> LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C >> [5] LC_TIME=Italian_Italy.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] annotate_1.34.1 lumiMouseAll.db_1.18.0 >> org.Mm.eg.db_2.7.1 limma_3.12.1 >> [5] lumiMouseIDMapping_1.10.0 RSQLite_0.11.1 DBI_0.2-5 >> AnnotationDbi_1.18.1 >> [9] lumi_2.8.0 nleqslv_1.9.3 methylumi_2.2.0 >> ggplot2_0.9.1 >> [13] reshape2_1.2.1 scales_0.2.1 Biobase_2.16.0 >> BiocGenerics_0.2.0 >> >> loaded via a namespace (and not attached): >> [1] affy_1.34.0 affyio_1.24.0 bigmemory_4.2.11 >> BiocInstaller_1.4.7 Biostrings_2.24.1 >> [6] bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1 >> dichromat_1.2-4 digest_0.5.2 >> [11] DNAcopy_1.30.0 GenomicRanges_1.8.7 genoset_1.6.0 >> grid_2.15.0 hdrcde_2.16 >> [16] IRanges_1.14.4 KernSmooth_2.23-8 labeling_0.1 >> lattice_0.20-6 MASS_7.3-19 >> [21] Matrix_1.0-7 memoise_0.1 mgcv_1.7-18 >> munsell_0.3 nlme_3.1-104 >> [26] plyr_1.7.1 preprocessCore_1.18.0 proto_0.3-9.2 >> RColorBrewer_1.0-5 RCurl_1.91-1.1 >> [31] Rsamtools_1.8.5 rtracklayer_1.16.2 stats4_2.15.0 >> stringr_0.6 tools_2.15.0 >> [36] XML_3.9-4.1 xtable_1.7-0 zlibbioc_1.2.0 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY
0
Entering edit mode
Hi Paolo, Did you still use "colnames(design) <- c('A','B','C','D')"? 2012/7/12 Paolo Kunderfranco <paolo.kunderfranco@gmail.com> > Thanks but there is still something that i miss, here is the code that > I used, I set up sampletype, 4 samples, in triplicates, respect to the > position if the dataMatrix with my normalized expression values: > > dataMatrix <- exprs(lumi.N.Q) > presentCount <- detectionCall(x.lumi) > selDataMatrix <- dataMatrix[presentCount > 0,] > probeList <- rownames(selDataMatrix) > sampleType <- c('A','A','A','B','B','B','C','D','D','C','C','D') > > then I created a design matrix. > > design <- model.matrix(~ factor(sampleType)) > colnames(design) <- c('A','B','C','D') > fit1 <- lmFit(selDataMatrix, design) > constrast.matrix <- makeContrasts (A-B,C-D,levels=design) > fit1_2 <- contrasts.fit(fit1,constrast.matrix) > fit1_2 <- eBayes(fit1_2) > topTable(fit1_2,coef=2, adjust="BH") > > and for instance I get top genes in coef=2 -> C-D > I tried to change sampletype excluding sample C like this: > sampleType <- c('A','A','A','B','B','B','x','D','D','x','x,'D') > run the same command above and I obtain some other topgenes, I would > rather expect to encounter some error because sample C is not anymore > defined, where am i wrong? > > > > > > 2012/7/12 Yao Chen <chenyao.bioinfor@gmail.com>: > > Hi Paolo, > > > > You just need to put all 4 groups in the design matrix. It's a clear > example > > in the Limma userguide.: > > > >> f <- factor(targets$Target, levels=c("RNA1","RNA2","RNA3")) > >> design <- model.matrix(~0+f) > >> colnames(design) <- c("RNA1","RNA2","RNA3") > > To make all pair-wise comparisons between the three groups one could > proceed > >> fit <- lmFit(eset, design) > >> contrast.matrix <- makeContrasts(RNA2-RNA1, RNA3-RNA2, RNA3-RNA1, > > + levels=design) > >> fit2 <- contrasts.fit(fit, contrast.matrix) > >> fit2 <- eBayes(fit2) > > A list of top genes for RNA2 versus RNA1 can be obtained from > >> topTable(fit2, coef=1, adjust="BH") > > > > Best, > > > > Jack > > > > 2012/7/12 Paolo [guest] <guest@bioconductor.org> > >> > >> > >> Hi all, > >> I have an illumina dataset, VST transformed and normalized. 4 samples > each > >> in triplicates, > >> this is the sampleType= > >> > >> sampleType <- c('A','A','A','B','B','B','C','D','D','C','C','D') > >> Now I would like to do perform each pairwise comparison, A vs B, A vs > C, A > >> vs D...etc.) > >> > >> but I am confused how to set the colnames(design) > >> > >> here i what i did > >> > >> if (require(limma)) { > >> sampleType <- > >> > c('TG_TAC','TG_TAC','TG_TAC','WT_TAC','WT_TAC','WT_TAC','WT_SHAM','T G_SHAM','TG_SHAM','WT_SHAM','WT_SHAM','TG_SHAM') > >> ## compare 'A' and 'B' > >> design <- model.matrix(~ factor(sampleType)) > >> colnames(design) <-c('A','B' ) > >> fit <- lmFit(selDataMatrix, design) > >> fit <- eBayes(fit) > >> ## Add gene symbols to gene properties > >> if (require(lumiMouseAll.db) & require(annotate)) { > >> geneSymbol <- getSYMBOL(probeList, 'lumiMouseAll.db') > >> geneName <- sapply(lookUp(probeList, 'lumiMouseAll.db', > >> 'GENENAME'), function(x) x[1]) > >> fit$genes <- data.frame(ID= probeList, > >> geneSymbol=geneSymbol, geneName=geneName, stringsAsFactors=FALSE) > >> } > >> # print the top 50 genes > >> print(topTable(fit, adjust='fdr', number=5)) > >> > >> ## get significant gene list with FDR adjusted p.values less than 0.01 > >> p.adj <- p.adjust(fit$p.value[,2]) > >> sigGene.adj <- probeList[ p.adj < 0.01] > >> ## without FDR adjustment > >> sigGene <- probeList[ fit$p.value[,2] < 0.01] > >> } > >> > >> > >> > >> how do I properly set up each pairwise comparison? > >> > >> thanks > >> paolo > >> > >> > >> > >> > >> > >> > >> -- output of sessionInfo(): > >> > >> R version 2.15.0 (2012-03-30) > >> Platform: i386-pc-mingw32/i386 (32-bit) > >> > >> locale: > >> [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 > >> LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C > >> [5] LC_TIME=Italian_Italy.1252 > >> > >> attached base packages: > >> [1] stats graphics grDevices utils datasets methods base > >> > >> other attached packages: > >> [1] annotate_1.34.1 lumiMouseAll.db_1.18.0 > >> org.Mm.eg.db_2.7.1 limma_3.12.1 > >> [5] lumiMouseIDMapping_1.10.0 RSQLite_0.11.1 DBI_0.2-5 > >> AnnotationDbi_1.18.1 > >> [9] lumi_2.8.0 nleqslv_1.9.3 methylumi_2.2.0 > >> ggplot2_0.9.1 > >> [13] reshape2_1.2.1 scales_0.2.1 Biobase_2.16.0 > >> BiocGenerics_0.2.0 > >> > >> loaded via a namespace (and not attached): > >> [1] affy_1.34.0 affyio_1.24.0 bigmemory_4.2.11 > >> BiocInstaller_1.4.7 Biostrings_2.24.1 > >> [6] bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1 > >> dichromat_1.2-4 digest_0.5.2 > >> [11] DNAcopy_1.30.0 GenomicRanges_1.8.7 genoset_1.6.0 > >> grid_2.15.0 hdrcde_2.16 > >> [16] IRanges_1.14.4 KernSmooth_2.23-8 labeling_0.1 > >> lattice_0.20-6 MASS_7.3-19 > >> [21] Matrix_1.0-7 memoise_0.1 mgcv_1.7-18 > >> munsell_0.3 nlme_3.1-104 > >> [26] plyr_1.7.1 preprocessCore_1.18.0 proto_0.3-9.2 > >> RColorBrewer_1.0-5 RCurl_1.91-1.1 > >> [31] Rsamtools_1.8.5 rtracklayer_1.16.2 stats4_2.15.0 > >> stringr_0.6 tools_2.15.0 > >> [36] XML_3.9-4.1 xtable_1.7-0 zlibbioc_1.2.0 > >> > >> -- > >> Sent via the guest posting facility at bioconductor.org. > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 413 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6