limma and Volcano Plots
3
0
Entering edit mode
@saket-choudhary-6350
Last seen 11.2 years ago
I am working on a Proteomics microarray data using only the Red Channel, though there are both R and G channels. The objective is find DE genes in Grade2 samples of cancer as compared to Controls. I created a gist here :
library(limma)
Cy5 <- "F635 Median"
Cy5b <- "B635 Mean"
targets <- readTargets("Grade2_targets.csv")
f <- factor(targets$Condition, levels = unique(targets$Condition))
design <- model.matrix(~0 + f)
cont.matrix <- makeContrasts(Grade2vsControl=Grade2-Control, levels=design)
colnames(design) <- levels(f)
RG <- read.maimages( targets$FileName,
source="genepix.custom", green.only=TRUE,
columns=list(G=Cy5,Gb=Cy5b))
RG.bc=backgroundCorrect(RG, method="normexp", offset=15)
controls <- grep("CONTROL", RG.bc$genes[,"ID"])
other.genes <- grep("empty|Empty|CONTROL", invert=TRUE, RG.bc$genes[,"ID"])
RG.nba <- normalizeBetweenArrays(RG.bc, method="quantile")
RG.final <- RG.nba[other.genes, ]
rownames(RG.final) <- RG.final$genes$ID
RG.final <- RG.final[order(rownames(RG.final)), ]
corfit <- duplicateCorrelation(RG.final, design, ndups=2, spacing=1)
fit <- lmFit(RG.final, design, ndups=2, correlation=corfit$consensus)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
#topTable(fit2, adjust="BH")
Targets file:
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
FileName Condition
Control_H-02_2000153953.gpr Control
Control_H-25_2000153952.gpr Control
Control_H-35_2000153943.gpr Control
Control_H-58_2000153935.gpr Control
Control_HC-25_2000153932.gpr Control
Grade II_CF 10729_2000154012.gpr Grade2
Grade II_CF 20468_2000153947.gpr Grade2
Grade II_CF 29538_2000154020.gpr Grade2
Grade II_CF 8551_2000153940.gpr Grade2
Grade II_CH 13717_2000144447.gpr Grade2
Grade II_CH 27095_2000154014.gpr Grade2
Grade II_CH 32225_2000154033.gpr Grade2
Grade II_CJ 16084_2000153931.gpr Grade2
Grade II_CJ 17734_2000154015.gpr Grade2
Grade II_CJ 33350_2000144452.gpr Grade2
Grade II_CJ 7132_2000154028.gpr Grade2
Grade II_CK 40_2000144445.gpr Grade2
Volcano Plot: http://share.pho.to/4e1QT I am a bit skeptical about the nature of my volcano plot, showing quite high log odds and skewed. Have I, in the process of playing around with the code, committed a mistake somewhere? Saket
Microarray Proteomics Cancer PROcess Microarray Proteomics Cancer PROcess • 2.6k views
ADD COMMENT
0
Entering edit mode
@saket-choudhary-6350
Last seen 11.2 years ago
Also, is there a more established way of dealing with Proteomics data using limma? On 28 January 2014 21:09, Saket Choudhary <saketkc at="" gmail.com=""> wrote: > I am working on a Proteomics microarray data using only the Red > Channel, though there are both R and G channels. The objective is find > DE genes in Grade2 samples of cancer as compared to Controls. > > I created a gist here :
library(limma)
Cy5 <- "F635 Median"
Cy5b <- "B635 Mean"
targets <- readTargets("Grade2_targets.csv")
f <- factor(targets$Condition, levels = unique(targets$Condition))
design <- model.matrix(~0 + f)
cont.matrix <- makeContrasts(Grade2vsControl=Grade2-Control, levels=design)
colnames(design) <- levels(f)
RG <- read.maimages( targets$FileName,
source="genepix.custom", green.only=TRUE,
columns=list(G=Cy5,Gb=Cy5b))
RG.bc=backgroundCorrect(RG, method="normexp", offset=15)
controls <- grep("CONTROL", RG.bc$genes[,"ID"])
other.genes <- grep("empty|Empty|CONTROL", invert=TRUE, RG.bc$genes[,"ID"])
RG.nba <- normalizeBetweenArrays(RG.bc, method="quantile")
RG.final <- RG.nba[other.genes, ]
rownames(RG.final) <- RG.final$genes$ID
RG.final <- RG.final[order(rownames(RG.final)), ]
corfit <- duplicateCorrelation(RG.final, design, ndups=2, spacing=1)
fit <- lmFit(RG.final, design, ndups=2, correlation=corfit$consensus)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
#topTable(fit2, adjust="BH")
> Targets file:
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
FileName Condition
Control_H-02_2000153953.gpr Control
Control_H-25_2000153952.gpr Control
Control_H-35_2000153943.gpr Control
Control_H-58_2000153935.gpr Control
Control_HC-25_2000153932.gpr Control
Grade II_CF 10729_2000154012.gpr Grade2
Grade II_CF 20468_2000153947.gpr Grade2
Grade II_CF 29538_2000154020.gpr Grade2
Grade II_CF 8551_2000153940.gpr Grade2
Grade II_CH 13717_2000144447.gpr Grade2
Grade II_CH 27095_2000154014.gpr Grade2
Grade II_CH 32225_2000154033.gpr Grade2
Grade II_CJ 16084_2000153931.gpr Grade2
Grade II_CJ 17734_2000154015.gpr Grade2
Grade II_CJ 33350_2000144452.gpr Grade2
Grade II_CJ 7132_2000154028.gpr Grade2
Grade II_CK 40_2000144445.gpr Grade2
> Volcano Plot: http://share.pho.to/4e1QT > > > I am a bit skeptical about the nature of my volcano plot, showing > quite high log odds and skewed. Have I, in the process of playing > around with the code, committed a mistake somewhere? > > > > Saket
ADD COMMENT
0
Entering edit mode
@saket-choudhary-6350
Last seen 11.2 years ago
Any comments on this? Saket On 28 January 2014 21:09, Saket Choudhary <saketkc at="" gmail.com=""> wrote: > I am working on a Proteomics microarray data using only the Red > Channel, though there are both R and G channels. The objective is find > DE genes in Grade2 samples of cancer as compared to Controls. > > I created a gist here :
library(limma)
Cy5 <- "F635 Median"
Cy5b <- "B635 Mean"
targets <- readTargets("Grade2_targets.csv")
f <- factor(targets$Condition, levels = unique(targets$Condition))
design <- model.matrix(~0 + f)
cont.matrix <- makeContrasts(Grade2vsControl=Grade2-Control, levels=design)
colnames(design) <- levels(f)
RG <- read.maimages( targets$FileName,
source="genepix.custom", green.only=TRUE,
columns=list(G=Cy5,Gb=Cy5b))
RG.bc=backgroundCorrect(RG, method="normexp", offset=15)
controls <- grep("CONTROL", RG.bc$genes[,"ID"])
other.genes <- grep("empty|Empty|CONTROL", invert=TRUE, RG.bc$genes[,"ID"])
RG.nba <- normalizeBetweenArrays(RG.bc, method="quantile")
RG.final <- RG.nba[other.genes, ]
rownames(RG.final) <- RG.final$genes$ID
RG.final <- RG.final[order(rownames(RG.final)), ]
corfit <- duplicateCorrelation(RG.final, design, ndups=2, spacing=1)
fit <- lmFit(RG.final, design, ndups=2, correlation=corfit$consensus)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
#topTable(fit2, adjust="BH")
> Targets file:
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
FileName Condition
Control_H-02_2000153953.gpr Control
Control_H-25_2000153952.gpr Control
Control_H-35_2000153943.gpr Control
Control_H-58_2000153935.gpr Control
Control_HC-25_2000153932.gpr Control
Grade II_CF 10729_2000154012.gpr Grade2
Grade II_CF 20468_2000153947.gpr Grade2
Grade II_CF 29538_2000154020.gpr Grade2
Grade II_CF 8551_2000153940.gpr Grade2
Grade II_CH 13717_2000144447.gpr Grade2
Grade II_CH 27095_2000154014.gpr Grade2
Grade II_CH 32225_2000154033.gpr Grade2
Grade II_CJ 16084_2000153931.gpr Grade2
Grade II_CJ 17734_2000154015.gpr Grade2
Grade II_CJ 33350_2000144452.gpr Grade2
Grade II_CJ 7132_2000154028.gpr Grade2
Grade II_CK 40_2000144445.gpr Grade2
> Volcano Plot: http://share.pho.to/4e1QT > > > I am a bit skeptical about the nature of my volcano plot, showing > quite high log odds and skewed. Have I, in the process of playing > around with the code, committed a mistake somewhere? > > > > Saket
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia
Dear Saket, You are right to be skeptical about the results, because the code is not testing the correct contrast. If you use the group-parametrization as in: design <- model.matrix(~0 + f) then you need to form a contrast between the two groups (Grade 2 vs Control) in order to test for differential expression. As it is, you are simply testing whether the mean for Grade 2 is equal to zero: topTable(ebayes, coef = 2, adjust = "BH", n = 100) and it is no surprise than everything is significant. Section 9.2 of the limma User's Guide explains two different ways you can form the design matrix. Either way is fine, but your code has combined a bit of one approach with a bit of the other. Best wishes Gordon of the > Date: Tue, 28 Jan 2014 21:09:32 +0530 > From: Saket Choudhary <saketkc at="" gmail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] limma and Volcano Plots > Content-Type: text/plain; charset=ISO-8859-1 > > I am working on a Proteomics microarray data using only the Red > Channel, though there are both R and G channels. The objective is find > DE genes in Grade2 samples of cancer as compared to Controls. > > I created a gist here :
library(limma)
Cy5 <- "F635 Median"
Cy5b <- "B635 Mean"
targets <- readTargets("Grade2_targets.csv")
f <- factor(targets$Condition, levels = unique(targets$Condition))
design <- model.matrix(~0 + f)
cont.matrix <- makeContrasts(Grade2vsControl=Grade2-Control, levels=design)
colnames(design) <- levels(f)
RG <- read.maimages( targets$FileName,
source="genepix.custom", green.only=TRUE,
columns=list(G=Cy5,Gb=Cy5b))
RG.bc=backgroundCorrect(RG, method="normexp", offset=15)
controls <- grep("CONTROL", RG.bc$genes[,"ID"])
other.genes <- grep("empty|Empty|CONTROL", invert=TRUE, RG.bc$genes[,"ID"])
RG.nba <- normalizeBetweenArrays(RG.bc, method="quantile")
RG.final <- RG.nba[other.genes, ]
rownames(RG.final) <- RG.final$genes$ID
RG.final <- RG.final[order(rownames(RG.final)), ]
corfit <- duplicateCorrelation(RG.final, design, ndups=2, spacing=1)
fit <- lmFit(RG.final, design, ndups=2, correlation=corfit$consensus)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
#topTable(fit2, adjust="BH")
> Targets file:
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
FileName Condition
Control_H-02_2000153953.gpr Control
Control_H-25_2000153952.gpr Control
Control_H-35_2000153943.gpr Control
Control_H-58_2000153935.gpr Control
Control_HC-25_2000153932.gpr Control
Grade II_CF 10729_2000154012.gpr Grade2
Grade II_CF 20468_2000153947.gpr Grade2
Grade II_CF 29538_2000154020.gpr Grade2
Grade II_CF 8551_2000153940.gpr Grade2
Grade II_CH 13717_2000144447.gpr Grade2
Grade II_CH 27095_2000154014.gpr Grade2
Grade II_CH 32225_2000154033.gpr Grade2
Grade II_CJ 16084_2000153931.gpr Grade2
Grade II_CJ 17734_2000154015.gpr Grade2
Grade II_CJ 33350_2000144452.gpr Grade2
Grade II_CJ 7132_2000154028.gpr Grade2
Grade II_CK 40_2000144445.gpr Grade2
> Volcano Plot: http://share.pho.to/4e1QT > > > I am a bit skeptical about the nature of my volcano plot, showing > quite high log odds and skewed. Have I, in the process of playing > around with the code, committed a mistake somewhere? > > > > Saket ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6