SmearPlot edgeR - adding gene names to genes of interest
1
0
Entering edit mode
@christine-glaer-6291
Last seen 10.3 years ago
Dear all, I have used edgeR for analyzing differential expression. I also created a smear plot visualizing DGE results (code see below). I'd now like to highlight some genes of interest (they are significantly diff. expressed) in this plot by labeling them; do you know if there's a way to do so? Kind regards, and many thanks in advance, Christine Gl??er ###### Code ######## counts <- as.matrix(read.table(infile, header = TRUE, sep = "\t", row.names = 1, as.is = TRUE)) cpms<-cpm(counts) keep <- rowSums(cpms>1)>=4 ### at least 4 replicates counts <- counts[keep,] group <- factor(c(1,1,1,1,1,2,2,2,2,3,3,3,3)) d <- DGEList(counts=counts, group=group) d <- calcNormFactors(d) design <- model.matrix(~ 0+group) colnames(design) <- cols contrasts <- makeContrasts(con1 = CONTRAST1, con2 = CONTRAST2, levels=design) y <- estimateGLMCommonDisp(d,design) y <- estimateGLMTrendedDisp(y,design) y <- estimateGLMTagwiseDisp(y,design) fit <- glmFit(y,design) for (i in colnames(contrasts)) { print (i) outfilename <- paste(paste(outdir, i, sep="/")) outfilename <- paste(paste(outfilename, ".txt", sep="")) outfilesmearplot <- paste(paste(outdir, i, sep="/")) outfilesmearplot <- paste(paste(outfilesmearplot, "_smearplot.pdf", sep="")) lrt <- glmLRT(fit, contrast=contrasts[,i]) tt <- topTags(lrt, n=nrow(d), adjust.method="BH") rn <- rownames(tt$table) deg <- rn[tt$table$FDR < 0.05] pdf(outfilesmearplot) ###### SMEAR PLOT STARTS HERE ####### plotSmear(d, de.tags=deg) abline(h = c(-1, 1), col = "dodgerblue") dev.off() write.table(tt$table, file=outfilename, sep="\t") }
edgeR edgeR • 4.8k views
ADD COMMENT
0
Entering edit mode
@aliaksei-holik-4992
Last seen 8.9 years ago
Spain/Barcelona/Centre for Genomic Regu…
Hi Christine, There's probably a more elegant way, but that's how I do it. You should be able to use R's 'text' function by taking logCPM values of the genes you're interested in as 'x' argument, logFC values as 'y' argument, and the gene names or ID's as 'labels' argument. Say top.x is your resulting "TopTags" object and it contains dataframe called 'table', which contains among others the following columns: Symbols - logFC - logCPM You should be able to subset it to the genes you're interested in by: ids <- c("...", "...") gene.labels <- x$table[x$table$Symbols %in% ids,] Where 'ids' is a vector of gene symbols for genes of interest, but you can use Entrez Gene IDs or any other thing you can think of, for instance all genes with logFC > 1 etc. After you've plotted your smear plot from DGELRT or DGEExact object you can mark the genes of interest with the following: text(x=gene.labels$logCPM, y=gene.labels$logFC, labels=gene.labels$Symbols, cex=0.7, pos=1) Note that this will not work if you're plotting the SmearPlot from DGEList as your TopTags object will contain squeezed logCPM values different from those in DGEList. Hope it helps, Aliaksei. On 16/12/13 7:07 PM, Christine Gl??er wrote: > Dear all, > > I have used edgeR for analyzing differential expression. I also created > a smear plot visualizing DGE results (code see below). I'd now like to > highlight some genes of interest (they are significantly diff. > expressed) in this plot by labeling them; do you know if there's a way > to do so? > > Kind regards, and many thanks in advance, > > Christine Gl??er > > > ###### Code ######## > > counts <- as.matrix(read.table(infile, header = TRUE, sep = "\t", > row.names = 1, as.is = TRUE)) > cpms<-cpm(counts) > keep <- rowSums(cpms>1)>=4 ### at least 4 replicates > counts <- counts[keep,] > > group <- factor(c(1,1,1,1,1,2,2,2,2,3,3,3,3)) > > d <- DGEList(counts=counts, group=group) > d <- calcNormFactors(d) > > design <- model.matrix(~ 0+group) > colnames(design) <- cols > > contrasts <- makeContrasts(con1 = CONTRAST1, con2 = CONTRAST2, > levels=design) > > y <- estimateGLMCommonDisp(d,design) > y <- estimateGLMTrendedDisp(y,design) > y <- estimateGLMTagwiseDisp(y,design) > fit <- glmFit(y,design) > > for (i in colnames(contrasts)) > { > print (i) > > outfilename <- paste(paste(outdir, i, sep="/")) > outfilename <- paste(paste(outfilename, ".txt", sep="")) > outfilesmearplot <- paste(paste(outdir, i, sep="/")) > outfilesmearplot <- paste(paste(outfilesmearplot, "_smearplot.pdf", > sep="")) > > lrt <- glmLRT(fit, contrast=contrasts[,i]) > tt <- topTags(lrt, n=nrow(d), adjust.method="BH") > rn <- rownames(tt$table) > deg <- rn[tt$table$FDR < 0.05] > > pdf(outfilesmearplot) > ###### SMEAR PLOT STARTS HERE ####### > plotSmear(d, de.tags=deg) > abline(h = c(-1, 1), col = "dodgerblue") > dev.off() > write.table(tt$table, file=outfilename, sep="\t") > } > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD COMMENT
0
Entering edit mode
Am 16.12.2013 11:26, schrieb Aliaksei Holik: > Hi Christine, > > There's probably a more elegant way, but that's how I do it. > > You should be able to use R's 'text' function by taking logCPM values > of the genes you're interested in as 'x' argument, logFC values as 'y' > argument, and the gene names or ID's as 'labels' argument. > > Say top.x is your resulting "TopTags" object and it contains dataframe > called 'table', which contains among others the following columns: > Symbols - logFC - logCPM > > You should be able to subset it to the genes you're interested in by: > > ids <- c("...", "...") > gene.labels <- x$table[x$table$Symbols %in% ids,] > > Where 'ids' is a vector of gene symbols for genes of interest, but you > can use Entrez Gene IDs or any other thing you can think of, for > instance all genes with logFC > 1 etc. > > After you've plotted your smear plot from DGELRT or DGEExact object > you can mark the genes of interest with the following: > > text(x=gene.labels$logCPM, > y=gene.labels$logFC, > labels=gene.labels$Symbols, cex=0.7, pos=1) > > Note that this will not work if you're plotting the SmearPlot from > DGEList as your TopTags object will contain squeezed logCPM values > different from those in DGEList. > > Hope it helps, > > Aliaksei. > > On 16/12/13 7:07 PM, Christine Gl??er wrote: >> Dear all, >> >> I have used edgeR for analyzing differential expression. I also created >> a smear plot visualizing DGE results (code see below). I'd now like to >> highlight some genes of interest (they are significantly diff. >> expressed) in this plot by labeling them; do you know if there's a way >> to do so? >> >> Kind regards, and many thanks in advance, >> >> Christine Gl??er >> >> >> ###### Code ######## >> >> counts <- as.matrix(read.table(infile, header = TRUE, sep = "\t", >> row.names = 1, as.is = TRUE)) >> cpms<-cpm(counts) >> keep <- rowSums(cpms>1)>=4 ### at least 4 replicates >> counts <- counts[keep,] >> >> group <- factor(c(1,1,1,1,1,2,2,2,2,3,3,3,3)) >> >> d <- DGEList(counts=counts, group=group) >> d <- calcNormFactors(d) >> >> design <- model.matrix(~ 0+group) >> colnames(design) <- cols >> >> contrasts <- makeContrasts(con1 = CONTRAST1, con2 = CONTRAST2, >> levels=design) >> >> y <- estimateGLMCommonDisp(d,design) >> y <- estimateGLMTrendedDisp(y,design) >> y <- estimateGLMTagwiseDisp(y,design) >> fit <- glmFit(y,design) >> >> for (i in colnames(contrasts)) >> { >> print (i) >> >> outfilename <- paste(paste(outdir, i, sep="/")) >> outfilename <- paste(paste(outfilename, ".txt", sep="")) >> outfilesmearplot <- paste(paste(outdir, i, sep="/")) >> outfilesmearplot <- paste(paste(outfilesmearplot, "_smearplot.pdf", >> sep="")) >> >> lrt <- glmLRT(fit, contrast=contrasts[,i]) >> tt <- topTags(lrt, n=nrow(d), adjust.method="BH") >> rn <- rownames(tt$table) >> deg <- rn[tt$table$FDR < 0.05] >> >> pdf(outfilesmearplot) >> ###### SMEAR PLOT STARTS HERE ####### >> plotSmear(d, de.tags=deg) >> abline(h = c(-1, 1), col = "dodgerblue") >> dev.off() >> write.table(tt$table, file=outfilename, sep="\t") >> } >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> Dear Aliaksei, thank you very much for your help, it worked out. As an example, I have adapted the example given in the documentation in plotSmear: ################## y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4) d <- DGEList(counts=y, group=rep(1:2,each=2), lib.size=colSums(y)) rownames(d$counts) <- paste("tag",1:nrow(d$counts),sep=".") d <- estimateCommonDisp(d) # find differential expression de <- exactTest(d) # highlighting the top 500 most DE tags de.tags <- rownames(topTags(de, n=500)$table) plotSmear(d, de.tags=de.tags) de$genes <- rownames(d$counts) ids <- c("tag.2323", "tag.9") gene.labels <- de$table[de$genes %in% ids,] text(x=gene.labels$logCPM, y=gene.labels$logFC, labels=rownames(gene.labels), cex=0.7, pos=1) ################### I'd also like to add a line pointing at these specific genes (or highlight them a bit better, e.g. with a specific color) - otherwise the plot is not very informative, since there are many dots in close neighborhood to my genes of interest. Do you also know how that could be done? Thank you very much for your help, I greatly appreciate it! Kind regards, Christine Gl??er
ADD REPLY
0
Entering edit mode
Hi Christine, I'm glad it worked. You should be able to print your genes in any colour you can think of by adding 'col' argument to your 'text' function. You could also plot the points of any colour on top of your smear plot. Analogous to the way you use the 'text' function: points(x=gene.labels$logCPM, y=gene.labels$logFC, cex=1, col="red") Hope it helps, Aliaksei. On 16/12/13 11:53 PM, Christine Gl??er wrote: > Dear Aliaksei, > > thank you very much for your help, it worked out. As an example, I have > adapted the example given in the documentation in plotSmear: > > ################## > > y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4) > d <- DGEList(counts=y, group=rep(1:2,each=2), lib.size=colSums(y)) > rownames(d$counts) <- paste("tag",1:nrow(d$counts),sep=".") > d <- estimateCommonDisp(d) > > # find differential expression > de <- exactTest(d) > > # highlighting the top 500 most DE tags > de.tags <- rownames(topTags(de, n=500)$table) > > plotSmear(d, de.tags=de.tags) > > de$genes <- rownames(d$counts) > > ids <- c("tag.2323", "tag.9") > gene.labels <- de$table[de$genes %in% ids,] > text(x=gene.labels$logCPM, y=gene.labels$logFC, > labels=rownames(gene.labels), cex=0.7, pos=1) > > ################### > > I'd also like to add a line pointing at these specific genes (or > highlight them a bit better, e.g. with a specific color) - otherwise the > plot is not very informative, since there are many dots in close > neighborhood to my genes of interest. Do you also know how that could be > done? > > Thank you very much for your help, I greatly appreciate it! > > Kind regards, > > Christine Gl??er > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY
0
Entering edit mode
Am 16.12.2013 14:12, schrieb Aliaksei Holik: > Hi Christine, > > I'm glad it worked. You should be able to print your genes in any > colour you can think of by adding 'col' argument to your 'text' > function. You could also plot the points of any colour on top of your > smear plot. Analogous to the way you use the 'text' function: > points(x=gene.labels$logCPM, > y=gene.labels$logFC, > cex=1, col="red") > > Hope it helps, > > Aliaksei. > > On 16/12/13 11:53 PM, Christine Gl??er wrote: > >> Dear Aliaksei, >> >> thank you very much for your help, it worked out. As an example, I have >> adapted the example given in the documentation in plotSmear: >> >> ################## >> >> y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4) >> d <- DGEList(counts=y, group=rep(1:2,each=2), lib.size=colSums(y)) >> rownames(d$counts) <- paste("tag",1:nrow(d$counts),sep=".") >> d <- estimateCommonDisp(d) >> >> # find differential expression >> de <- exactTest(d) >> >> # highlighting the top 500 most DE tags >> de.tags <- rownames(topTags(de, n=500)$table) >> >> plotSmear(d, de.tags=de.tags) >> >> de$genes <- rownames(d$counts) >> >> ids <- c("tag.2323", "tag.9") >> gene.labels <- de$table[de$genes %in% ids,] >> text(x=gene.labels$logCPM, y=gene.labels$logFC, >> labels=rownames(gene.labels), cex=0.7, pos=1) >> >> ################### >> >> I'd also like to add a line pointing at these specific genes (or >> highlight them a bit better, e.g. with a specific color) - otherwise the >> plot is not very informative, since there are many dots in close >> neighborhood to my genes of interest. Do you also know how that could be >> done? >> >> Thank you very much for your help, I greatly appreciate it! >> >> Kind regards, >> >> Christine Gl??er >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> Hi Aliaksei, thank you for your fast and kind reply. Thank you for helping me out - I didn't think of the "points" or col function... That's indeed what I've been looking for. Kind regards, Christine
ADD REPLY
0
Entering edit mode
Just spotted a typo. It should have said: gene.labels <- x.top$table[x.top$table$Symbols %in% ids,] On 16/12/13 9:26 PM, Aliaksei Holik wrote: > Hi Christine, > > There's probably a more elegant way, but that's how I do it. > > You should be able to use R's 'text' function by taking logCPM values of > the genes you're interested in as 'x' argument, logFC values as 'y' > argument, and the gene names or ID's as 'labels' argument. > > Say top.x is your resulting "TopTags" object and it contains dataframe > called 'table', which contains among others the following columns: > Symbols - logFC - logCPM > > You should be able to subset it to the genes you're interested in by: > > ids <- c("...", "...") > gene.labels <- x$table[x$table$Symbols %in% ids,] > > Where 'ids' is a vector of gene symbols for genes of interest, but you > can use Entrez Gene IDs or any other thing you can think of, for > instance all genes with logFC > 1 etc. > > After you've plotted your smear plot from DGELRT or DGEExact object you > can mark the genes of interest with the following: > > text(x=gene.labels$logCPM, > y=gene.labels$logFC, > labels=gene.labels$Symbols, cex=0.7, pos=1) > > Note that this will not work if you're plotting the SmearPlot from > DGEList as your TopTags object will contain squeezed logCPM values > different from those in DGEList. > > Hope it helps, > > Aliaksei. > > On 16/12/13 7:07 PM, Christine Gl??er wrote: >> Dear all, >> >> I have used edgeR for analyzing differential expression. I also created >> a smear plot visualizing DGE results (code see below). I'd now like to >> highlight some genes of interest (they are significantly diff. >> expressed) in this plot by labeling them; do you know if there's a way >> to do so? >> >> Kind regards, and many thanks in advance, >> >> Christine Gl??er >> >> >> ###### Code ######## >> >> counts <- as.matrix(read.table(infile, header = TRUE, sep = "\t", >> row.names = 1, as.is = TRUE)) >> cpms<-cpm(counts) >> keep <- rowSums(cpms>1)>=4 ### at least 4 replicates >> counts <- counts[keep,] >> >> group <- factor(c(1,1,1,1,1,2,2,2,2,3,3,3,3)) >> >> d <- DGEList(counts=counts, group=group) >> d <- calcNormFactors(d) >> >> design <- model.matrix(~ 0+group) >> colnames(design) <- cols >> >> contrasts <- makeContrasts(con1 = CONTRAST1, con2 = CONTRAST2, >> levels=design) >> >> y <- estimateGLMCommonDisp(d,design) >> y <- estimateGLMTrendedDisp(y,design) >> y <- estimateGLMTagwiseDisp(y,design) >> fit <- glmFit(y,design) >> >> for (i in colnames(contrasts)) >> { >> print (i) >> >> outfilename <- paste(paste(outdir, i, sep="/")) >> outfilename <- paste(paste(outfilename, ".txt", sep="")) >> outfilesmearplot <- paste(paste(outdir, i, sep="/")) >> outfilesmearplot <- paste(paste(outfilesmearplot, "_smearplot.pdf", >> sep="")) >> >> lrt <- glmLRT(fit, contrast=contrasts[,i]) >> tt <- topTags(lrt, n=nrow(d), adjust.method="BH") >> rn <- rownames(tt$table) >> deg <- rn[tt$table$FDR < 0.05] >> >> pdf(outfilesmearplot) >> ###### SMEAR PLOT STARTS HERE ####### >> plotSmear(d, de.tags=deg) >> abline(h = c(-1, 1), col = "dodgerblue") >> dev.off() >> write.table(tt$table, file=outfilename, sep="\t") >> } >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY

Login before adding your answer.

Traffic: 344 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6