Question: Bioconductor Digest, Vol 121, Issue 29

0

Matthew Thornton •

**330**wrote:
"bioconductor-request at r-project.org" <bioconductor-request at="" r-project.org=""> wrote:
Send Bioconductor mailing list submissions to
bioconductor at r-project.org
To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
bioconductor-request at r-project.org
You can reach the person managing the list at
bioconductor-owner at r-project.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."
Today's Topics:
1. Heatmaps for Agilent Single Color 4x44k using limma and vsn.
(Cornish, Joseph (NIH/NIAID) [F])
2. Re: Heatmaps for Agilent Single Color 4x44k using limma and
vsn. (James W. MacDonald)
3. Re: Heatmaps for Agilent Single Color 4x44k using limma and
vsn. (Cornish, Joseph (NIH/NIAID) [F])
4. Re: Heatmaps for Agilent Single Color 4x44k using limma and
vsn. (Sean Davis)
5. Re: Heatmaps for Agilent Single Color 4x44k using limma and
vsn. (James W. MacDonald)
6. Re: Heatmaps for Agilent Single Color 4x44k using limma and
vsn. (Cornish, Joseph (NIH/NIAID) [F])
7. analysis of HuGene2.0-st array affymetrix -aroma package
(Andreia Fonseca)
8. Re: Heatmaps for Agilent Single Color 4x44k using limma and
vsn. (Sean Davis)
9. Interpreting DESeq2 results (Michael Muratet)
10. identifying drosophila miRNA targets (Fiona Ingleby)
11. Re: identifying drosophila miRNA targets (James W. MacDonald)
12. Re: Limit on number of sequence files for forging a BSgenome
(Blanchette, Marco)
13. Re: [R] Error in setMethod("combine"... was - Error when
installing globaltest package (Shields, Rusty (IMS))
14. Re: [R] Error in setMethod("combine"... was - Error when
installing globaltest package (Martin Morgan)
15. Re: [R] Error in setMethod("combine"... was - Error when
installing globaltest package (Shields, Rusty (IMS))
16. Re: Interpreting DESeq2 results (Michael Love)
17. Re: Interpreting DESeq2 results (somnath bandyopadhyay)
----------------------------------------------------------------------
Message: 1
Date: Thu, 28 Mar 2013 13:27:23 +0000
From: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish@nih.gov>
To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: [BioC] Heatmaps for Agilent Single Color 4x44k using limma
and vsn.
Message-ID: <e68ea43fbf80174c85a92bc24efaa3b84b7bc3 at="" mlbxv01.nih.gov="">
Content-Type: text/plain
I am attempting to write a script that will allow me to do bulk
benchmarking of various methods for normalization and background
correction. I am concerned that I am not processing the data correctly
and this may be leading to the abnormal heatmaps that I am seeing. For
a test case, I am using an experiment where I have two different
conditions with two different states with two replicates per state.
The heatmaps concern me because of the uniformity between them, the
values are all completely green or completely red, there are
differences between genes within a state according to the heatmap.
In the examples I have seen people have suggested that I use the
coefficients from the eBayes fitting of the data, however in my
example the eBayes LMarray object does not have two columns in the
results, only on (not sure if this is a sign of errors on my part). I
have instead used the lmFit coefficients since there are column for
each matrix.
To further confuse someone who is new to R, the simplified example and
full code generate different heatmaps for the same data and same
methods (normexp for background, cyclic loess for norm). The
differences being in the simplified version the gene IDs show up on
the right side but in the full version, only row number appears. The
color for some genes changes between states as well.
My questions are:
Why does the bayes model only have one column?
Am I approaching the processing correctly?
Why are there differences between the resulting heatmaps of each
version?
I know that the idea is to keep the amount of posted code to a minimum
but I don't see any other way here.
Here is the simplified example I have made:
library(limma)
library(vsn)
library(gplots)
set.seed(0xabcd) #from vsn manual
#constants for reading agi outs from our imager
COL__ <- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean) {A}")
ANO__ <- c("Position", "Name", "ID", "Description", "GeneName")
#parse data
targets <- readTargets("test.csv", sep = ",")
gal <- readGAL("026806_D_20110524.gal")
layout <- getLayout(gal)
agi <- read.maimages(targets, columns = COL__,
annotation = ANO__,
green.only = TRUE)
design <- model.matrix(~0+factor(c(1,1,2,2)))
colnames(design) <- c("s1", "s2")
#process
bg <- backgroundCorrect(agi, method = "normexp")
norm <- normalizeBetweenArrays(bg, method = "cyclicloess")
avg <- avereps(norm, ID=norm$genes$ProbeName)
#fit
contmat <- makeContrasts(s2-s1, levels = design)
fit_lm <- lmFit(avg$E, design)
fit_be <- eBayes(contrasts.fit(fit_lm, contmat))
#results
difexp <- topTable(fit_be, coef = 1, adjust = "BH")
res <- decideTests(fit_be)
comp <- vennDiagram(res)
#plot
pdf("example-heatmap.pdf")
genes <- as.numeric(rownames(topTable(fit_be, n = 100)))
heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
dev.off()
The full version:
library(vsn)
library(limma)
library(gplots)
set.seed(0xabcd) #from the vsn manual
#constants
A_SUB__ <- 1
A_NXP__ <- 2
A_MIN__ <- 3
A_VSN__ <- 1
A_QNT__ <- 2
A_LWS__ <- 3
M_SUB__ <- "subtract"
M_NXP__ <- "normexp"
M_MIN__ <- "minimum"
M_VSN__ <- "vsn"
M_LWS__ <- "cyclicloess"
M_QNT__ <- "quantile"
COL__ <- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean) {A}")
ANO__ <- c("Position", "Name", "ID", "Description", "GeneName")
N_BGM__ <- 3 #number of background correction methods
N_NRM__ <- 3 #number of normalization methods
A_BGM__ <- c(A_SUB__, A_NXP__, A_MIN__)
A_NRM__ <- c(A_VSN__, A_QNT__, A_LWS__)
#utility methods
#applies subtraction background correction with limma
bg_subtract <- function(x){
return(list(norm = x$norm,
bg = M_SUB__,
agi = backgroundCorrect(x$agi, method = M_SUB__)))
}
#applies normexp backgroud correction with limma
bg_normexp <- function(x){
return(list(norm = x$norm,
bg = M_NXP__,
agi = backgroundCorrect(x$agi, method = M_NXP__)))
}
#applies minimum background correction with limma
bg_minimum <- function(x){
return(list(norm = x$norm,
bg = M_MIN__,
agi = backgroundCorrect(x$agi, method = M_MIN__)))
}
#applies VSN normalization with VSN package
nb_vsn <- function(x){
return(list(norm = M_VSN__,
bg = x$bg,
agi = normalizeVSN(x$agi)))
}
#applies VSN normalization with VSN package
nb_vsn <- function(x){
return(list(norm = M_VSN__,
bg = x$bg,
agi = normalizeVSN(x$agi)))
}
#applies cyclic lowess normalization with limma
nb_lowess <- function(x){
return(list(norm = M_LWS__,
bg = x$bg,
agi = normalizeBetweenArrays(x$agi, method =
M_LWS__)))
}
#applies quantile normalization with limma
nb_quantile <- function(x){
return(list(norm = M_QNT__,
bg = x$bg,
agi = normalizeBetweenArrays(x$agi, method =
M_QNT__)))
}
#generates a plot of MA values with limma
pl_ma <- function(x){
name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="")
pdf(name)
plotMA(x[[1]]$agi$E)
dev.off()
}
#generates a plot of mean versus stdev with VS
pl_meansd <- function(x){
name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"),
collapse="")
pdf(name)
meanSdPlot(x[[1]]$agi$E)
dev.off()
}
#generate volcanoplot of log-fold versus log-odds
pl_volcano <- function(x){
name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"),
collapse="")
pdf(name)
volcanoplot(x[[1]]$bayes)
dev.off()
}
#generate plot of expression data over time
pl_lines <- function(x){
name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"),
collapse="")
pdf(name)
plotlines(x[[1]]$agi$E)
dev.off()
}
#generate heatmat of final results
pl_heatmap <- function(x, ntop){
name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"),
collapse="")
pdf(name)
genes <- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop)))
heatmap(x[[1]]$lm$coefficients[genes,],
col = redgreen(75),
key = TRUE)
dev.off()
}
#averages replicates of probes
m_repavg <- function(x){
return(list(agi = x[[1]]$agi,
avg = avereps(x[[1]]$agi,
ID=x[[1]]$agi$genes$ProbeName),
bg = x[[1]]$bg,
norm = x[[1]]$norm))
}
#generate contrasts
m_fit <- function(x, design){
contmat <- makeContrasts(s2-s1,
levels = design)
fit1 <- lmFit(x[[1]]$agi$E, design)
fit2 <- eBayes(contrasts.fit(fit1, contmat))
diff <- topTable(fit2, coef = 1, adjust = "BH")
res <- decideTests(fit2)
comp <- vennDiagram(res)
return(list(agi = x[[1]]$agi,
bg = x[[1]]$bg,
norm = x[[1]]$norm,
ave = x[[1]]$ave,
cont = contmat,
lm = fit1,
bayes = fit2,
diff = diff,
res = res,
comp = comp))
}
######################################################################
#########
# Function to parse array data into limma objects
# Args:
# tfile: target file (see limma documentation)
# tsep: limma defaults TSV, have to set manually for CSV and so
on
# galfile: GAL file for printer/layout
# Returns data object with targets, genes and printer layout
######################################################################
#########
parse <- function(tfile, tsep, galfile){
targets <- readTargets(tfile, sep=tsep)
gal <- readGAL(galfile)
layout <- getLayout(gal)
agi <- read.maimages(targets, columns = COL__,
annotation = ANO__,
green.only = TRUE)
bg <- "none"
norm <- "none"
desg <- model.matrix(~0+factor(c(1,1,2,2)))
colnames(desg) <- c("s1","s2")
d <- list(agi = agi,
gal = gal,
layout = layout,
targets = targets,
bg = bg,
desg = desg,
norm = norm)
processed <- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
processed <- apply(processed, 1:2, function(x){return(d)})
return(processed)
}
#parse command args
args <- commandArgs(trailingOnly = TRUE)
#load files into matrix
a <- parse(args[1], args[2], args[3])
#keepy a copy of a raw dataset
raw <- a[1,1]
print(raw[[1]]$desg)
#perform backgroung corrections
a[,A_SUB__] <- lapply(a[,A_SUB__], bg_subtract)
a[,A_NXP__] <- lapply(a[,A_NXP__], bg_normexp)
a[,A_MIN__] <- lapply(a[,A_MIN__], bg_minimum)
#perform normalization
a[A_VSN__,] <- lapply(a[A_VSN__,], nb_vsn)
a[A_LWS__,] <- lapply(a[A_LWS__,], nb_lowess)
a[A_QNT__,] <- lapply(a[A_QNT__,], nb_quantile)
#fit model and differentiate expression
a <- apply(a, 1:2, m_repavg)
a <- apply(a, 1:2, m_fit, design = raw[[1]]$desg)
#generate plots
apply(a, 1:2, pl_ma)
apply(a, 1:2, pl_meansd)
apply(a, 1:2, pl_volcano)
apply(a, 1:2, pl_lines)
apply(a, 1:2, pl_heatmap, args[4])
[[alternative HTML version deleted]]
------------------------------
Message: 2
Date: Thu, 28 Mar 2013 10:07:32 -0400
From: "James W. MacDonald" <jmacdon@uw.edu>
To: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at="" nih.gov="">
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
limma and vsn.
Message-ID: <51544EA4.5080805 at uw.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Joseph,
On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
> I am attempting to write a script that will allow me to do bulk
benchmarking of various methods for normalization and background
correction. I am concerned that I am not processing the data correctly
and this may be leading to the abnormal heatmaps that I am seeing. For
a test case, I am using an experiment where I have two different
conditions with two different states with two replicates per state.
>
> The heatmaps concern me because of the uniformity between them, the
values are all completely green or completely red, there are
differences between genes within a state according to the heatmap.
>
> In the examples I have seen people have suggested that I use the
coefficients from the eBayes fitting of the data, however in my
example the eBayes LMarray object does not have two columns in the
results, only on (not sure if this is a sign of errors on my part). I
have instead used the lmFit coefficients since there are column for
each matrix.
>
> To further confuse someone who is new to R, the simplified example
and full code generate different heatmaps for the same data and same
methods (normexp for background, cyclic loess for norm). The
differences being in the simplified version the gene IDs show up on
the right side but in the full version, only row number appears. The
color for some genes changes between states as well.
>
> My questions are:
> Why does the bayes model only have one column?
Because you are computing one contrast (which is all you can do). One
contrast means one test which means one column of results. In other
words, the coefficient you are estimating in the contrasts.fit() step
is
the difference in means between the two groups, which is one value per
gene.
> Am I approaching the processing correctly?
I tried to read through that business below, but got some weird
affliction due to an apparent overdose of underscores and had to stop.
> Why are there differences between the resulting heatmaps of each
version?
I don't know, as I worried my brain might explode if I read any more.
But you should note that it is fairly unusual to plot coefficients in
a
heatmap, as what you are then presenting are log ratios, and most
people
can't understand that.
Instead you might consider getting the list of genes, and then
creating
a heatmap using the normalized expression values for each sample
instead. You then would want to scale in some way. We often sweep the
mean of the controls out of the matrix (by row), so the colors can
then
be interpreted as differences between the treated samples and
controls.
Best,
Jim
>
> I know that the idea is to keep the amount of posted code to a
minimum but I don't see any other way here.
>
> Here is the simplified example I have made:
> library(limma)
> library(vsn)
> library(gplots)
>
> set.seed(0xabcd) #from vsn manual
>
> #constants for reading agi outs from our imager
> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean) {A}")
> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>
> #parse data
> targets<- readTargets("test.csv", sep = ",")
> gal<- readGAL("026806_D_20110524.gal")
> layout<- getLayout(gal)
> agi<- read.maimages(targets, columns = COL__,
> annotation = ANO__,
> green.only = TRUE)
> design<- model.matrix(~0+factor(c(1,1,2,2)))
> colnames(design)<- c("s1", "s2")
>
> #process
> bg<- backgroundCorrect(agi, method = "normexp")
> norm<- normalizeBetweenArrays(bg, method = "cyclicloess")
> avg<- avereps(norm, ID=norm$genes$ProbeName)
>
> #fit
> contmat<- makeContrasts(s2-s1, levels = design)
> fit_lm<- lmFit(avg$E, design)
> fit_be<- eBayes(contrasts.fit(fit_lm, contmat))
>
> #results
> difexp<- topTable(fit_be, coef = 1, adjust = "BH")
> res<- decideTests(fit_be)
> comp<- vennDiagram(res)
>
> #plot
> pdf("example-heatmap.pdf")
> genes<- as.numeric(rownames(topTable(fit_be, n = 100)))
> heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
> dev.off()
>
>
> The full version:
> library(vsn)
> library(limma)
> library(gplots)
> set.seed(0xabcd) #from the vsn manual
>
> #constants
> A_SUB__<- 1
> A_NXP__<- 2
> A_MIN__<- 3
> A_VSN__<- 1
> A_QNT__<- 2
> A_LWS__<- 3
> M_SUB__<- "subtract"
> M_NXP__<- "normexp"
> M_MIN__<- "minimum"
> M_VSN__<- "vsn"
> M_LWS__<- "cyclicloess"
> M_QNT__<- "quantile"
> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean) {A}")
> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
> N_BGM__<- 3 #number of background correction methods
> N_NRM__<- 3 #number of normalization methods
> A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__)
> A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__)
>
>
> #utility methods
> #applies subtraction background correction with limma
> bg_subtract<- function(x){
> return(list(norm = x$norm,
> bg = M_SUB__,
> agi = backgroundCorrect(x$agi, method = M_SUB__)))
> }
>
> #applies normexp backgroud correction with limma
> bg_normexp<- function(x){
> return(list(norm = x$norm,
> bg = M_NXP__,
> agi = backgroundCorrect(x$agi, method = M_NXP__)))
> }
>
> #applies minimum background correction with limma
> bg_minimum<- function(x){
> return(list(norm = x$norm,
> bg = M_MIN__,
> agi = backgroundCorrect(x$agi, method = M_MIN__)))
> }
>
> #applies VSN normalization with VSN package
> nb_vsn<- function(x){
> return(list(norm = M_VSN__,
> bg = x$bg,
> agi = normalizeVSN(x$agi)))
> }
>
> #applies VSN normalization with VSN package
> nb_vsn<- function(x){
> return(list(norm = M_VSN__,
> bg = x$bg,
> agi = normalizeVSN(x$agi)))
> }
>
> #applies cyclic lowess normalization with limma
> nb_lowess<- function(x){
> return(list(norm = M_LWS__,
> bg = x$bg,
> agi = normalizeBetweenArrays(x$agi, method =
M_LWS__)))
> }
>
> #applies quantile normalization with limma
> nb_quantile<- function(x){
> return(list(norm = M_QNT__,
> bg = x$bg,
> agi = normalizeBetweenArrays(x$agi, method =
M_QNT__)))
> }
>
> #generates a plot of MA values with limma
> pl_ma<- function(x){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="")
> pdf(name)
> plotMA(x[[1]]$agi$E)
> dev.off()
> }
>
> #generates a plot of mean versus stdev with VS
> pl_meansd<- function(x){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"),
collapse="")
> pdf(name)
> meanSdPlot(x[[1]]$agi$E)
> dev.off()
> }
>
> #generate volcanoplot of log-fold versus log-odds
> pl_volcano<- function(x){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"),
collapse="")
> pdf(name)
> volcanoplot(x[[1]]$bayes)
> dev.off()
> }
>
> #generate plot of expression data over time
> pl_lines<- function(x){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"),
collapse="")
> pdf(name)
> plotlines(x[[1]]$agi$E)
> dev.off()
> }
>
> #generate heatmat of final results
> pl_heatmap<- function(x, ntop){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"),
collapse="")
> pdf(name)
> genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop)))
> heatmap(x[[1]]$lm$coefficients[genes,],
> col = redgreen(75),
> key = TRUE)
> dev.off()
> }
>
> #averages replicates of probes
> m_repavg<- function(x){
> return(list(agi = x[[1]]$agi,
> avg = avereps(x[[1]]$agi,
ID=x[[1]]$agi$genes$ProbeName),
> bg = x[[1]]$bg,
> norm = x[[1]]$norm))
> }
>
> #generate contrasts
> m_fit<- function(x, design){
> contmat<- makeContrasts(s2-s1,
> levels = design)
> fit1<- lmFit(x[[1]]$agi$E, design)
> fit2<- eBayes(contrasts.fit(fit1, contmat))
> diff<- topTable(fit2, coef = 1, adjust = "BH")
> res<- decideTests(fit2)
> comp<- vennDiagram(res)
> return(list(agi = x[[1]]$agi,
> bg = x[[1]]$bg,
> norm = x[[1]]$norm,
> ave = x[[1]]$ave,
> cont = contmat,
> lm = fit1,
> bayes = fit2,
> diff = diff,
> res = res,
> comp = comp))
> }
>
> ####################################################################
###########
> # Function to parse array data into limma objects
> # Args:
> # tfile: target file (see limma documentation)
> # tsep: limma defaults TSV, have to set manually for CSV and
so on
> # galfile: GAL file for printer/layout
> # Returns data object with targets, genes and printer layout
> ####################################################################
###########
> parse<- function(tfile, tsep, galfile){
> targets<- readTargets(tfile, sep=tsep)
> gal<- readGAL(galfile)
> layout<- getLayout(gal)
> agi<- read.maimages(targets, columns = COL__,
> annotation = ANO__,
> green.only = TRUE)
> bg<- "none"
> norm<- "none"
> desg<- model.matrix(~0+factor(c(1,1,2,2)))
> colnames(desg)<- c("s1","s2")
> d<- list(agi = agi,
> gal = gal,
> layout = layout,
> targets = targets,
> bg = bg,
> desg = desg,
> norm = norm)
>
> processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
> processed<- apply(processed, 1:2, function(x){return(d)})
> return(processed)
> }
>
> #parse command args
> args<- commandArgs(trailingOnly = TRUE)
>
> #load files into matrix
> a<- parse(args[1], args[2], args[3])
>
> #keepy a copy of a raw dataset
> raw<- a[1,1]
> print(raw[[1]]$desg)
>
> #perform backgroung corrections
> a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract)
> a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp)
> a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum)
>
> #perform normalization
> a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn)
> a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess)
> a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile)
>
> #fit model and differentiate expression
> a<- apply(a, 1:2, m_repavg)
> a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg)
>
> #generate plots
> apply(a, 1:2, pl_ma)
> apply(a, 1:2, pl_meansd)
> apply(a, 1:2, pl_volcano)
> apply(a, 1:2, pl_lines)
> apply(a, 1:2, pl_heatmap, args[4])
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
------------------------------
Message: 3
Date: Thu, 28 Mar 2013 14:22:32 +0000
From: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish@nih.gov>
To: "James W. MacDonald" <jmacdon at="" uw.edu="">
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
limma and vsn.
Message-ID: <e68ea43fbf80174c85a92bc24efaa3b84b7bdf at="" mlbxv01.nih.gov="">
Content-Type: text/plain; charset="us-ascii"
Hello James,
Are you referring to function names or constants? The constants are an
old habit I picked up long ago in my CS classes, there are a ton but
it makes them stand out for me in vim more readily. I've come to avoid
camel case since switching to linux, it makes searching for things a
nightmare, it can also help if I group functions with a prefix (eg
plot functions start with "pl_"). If you want I can find+replace all
of the constants with their values...
The differences came from using the unaveraged replicates in the
larger script versus the average in the example.
I'm not sure what you mean by "mean of the controls", would this
simply be filtering out the control spots?
I'll generate the heatmaps using the normalized expression values. I
suppose in that case it would be worthwhile to filter ahead of time by
p-value and fold change (the coef in topTable).
_________thanks_____________
_______Joe______
-----Original Message-----
From: James W. MacDonald [mailto:jmacdon@uw.edu]
Sent: Thursday, March 28, 2013 10:08 AM
To: Cornish, Joseph (NIH/NIAID) [F]
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
limma and vsn.
Hi Joseph,
On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
> I am attempting to write a script that will allow me to do bulk
benchmarking of various methods for normalization and background
correction. I am concerned that I am not processing the data correctly
and this may be leading to the abnormal heatmaps that I am seeing. For
a test case, I am using an experiment where I have two different
conditions with two different states with two replicates per state.
>
> The heatmaps concern me because of the uniformity between them, the
values are all completely green or completely red, there are
differences between genes within a state according to the heatmap.
>
> In the examples I have seen people have suggested that I use the
coefficients from the eBayes fitting of the data, however in my
example the eBayes LMarray object does not have two columns in the
results, only on (not sure if this is a sign of errors on my part). I
have instead used the lmFit coefficients since there are column for
each matrix.
>
> To further confuse someone who is new to R, the simplified example
and full code generate different heatmaps for the same data and same
methods (normexp for background, cyclic loess for norm). The
differences being in the simplified version the gene IDs show up on
the right side but in the full version, only row number appears. The
color for some genes changes between states as well.
>
> My questions are:
> Why does the bayes model only have one column?
Because you are computing one contrast (which is all you can do). One
contrast means one test which means one column of results. In other
words, the coefficient you are estimating in the contrasts.fit() step
is the difference in means between the two groups, which is one value
per gene.
> Am I approaching the processing correctly?
I tried to read through that business below, but got some weird
affliction due to an apparent overdose of underscores and had to stop.
> Why are there differences between the resulting heatmaps of each
version?
I don't know, as I worried my brain might explode if I read any more.
But you should note that it is fairly unusual to plot coefficients in
a heatmap, as what you are then presenting are log ratios, and most
people can't understand that.
Instead you might consider getting the list of genes, and then
creating a heatmap using the normalized expression values for each
sample instead. You then would want to scale in some way. We often
sweep the mean of the controls out of the matrix (by row), so the
colors can then be interpreted as differences between the treated
samples and controls.
Best,
Jim
>
> I know that the idea is to keep the amount of posted code to a
minimum but I don't see any other way here.
>
> Here is the simplified example I have made:
> library(limma)
> library(vsn)
> library(gplots)
>
> set.seed(0xabcd) #from vsn manual
>
> #constants for reading agi outs from our imager
> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean)
> {A}")
> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>
> #parse data
> targets<- readTargets("test.csv", sep = ",")
> gal<- readGAL("026806_D_20110524.gal")
> layout<- getLayout(gal)
> agi<- read.maimages(targets, columns = COL__,
> annotation = ANO__,
> green.only = TRUE)
> design<- model.matrix(~0+factor(c(1,1,2,2)))
> colnames(design)<- c("s1", "s2")
>
> #process
> bg<- backgroundCorrect(agi, method = "normexp")
> norm<- normalizeBetweenArrays(bg, method = "cyclicloess")
> avg<- avereps(norm, ID=norm$genes$ProbeName)
>
> #fit
> contmat<- makeContrasts(s2-s1, levels = design)
> fit_lm<- lmFit(avg$E, design)
> fit_be<- eBayes(contrasts.fit(fit_lm, contmat))
>
> #results
> difexp<- topTable(fit_be, coef = 1, adjust = "BH")
> res<- decideTests(fit_be)
> comp<- vennDiagram(res)
>
> #plot
> pdf("example-heatmap.pdf")
> genes<- as.numeric(rownames(topTable(fit_be, n = 100)))
> heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
> dev.off()
>
>
> The full version:
> library(vsn)
> library(limma)
> library(gplots)
> set.seed(0xabcd) #from the vsn manual
>
> #constants
> A_SUB__<- 1
> A_NXP__<- 2
> A_MIN__<- 3
> A_VSN__<- 1
> A_QNT__<- 2
> A_LWS__<- 3
> M_SUB__<- "subtract"
> M_NXP__<- "normexp"
> M_MIN__<- "minimum"
> M_VSN__<- "vsn"
> M_LWS__<- "cyclicloess"
> M_QNT__<- "quantile"
> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean)
> {A}")
> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
> N_BGM__<- 3 #number of background correction methods
> N_NRM__<- 3 #number of normalization methods
> A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__)
> A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__)
>
>
> #utility methods
> #applies subtraction background correction with limma
> bg_subtract<- function(x){
> return(list(norm = x$norm,
> bg = M_SUB__,
> agi = backgroundCorrect(x$agi, method = M_SUB__)))
}
>
> #applies normexp backgroud correction with limma
> bg_normexp<- function(x){
> return(list(norm = x$norm,
> bg = M_NXP__,
> agi = backgroundCorrect(x$agi, method = M_NXP__)))
}
>
> #applies minimum background correction with limma
> bg_minimum<- function(x){
> return(list(norm = x$norm,
> bg = M_MIN__,
> agi = backgroundCorrect(x$agi, method = M_MIN__)))
}
>
> #applies VSN normalization with VSN package
> nb_vsn<- function(x){
> return(list(norm = M_VSN__,
> bg = x$bg,
> agi = normalizeVSN(x$agi))) }
>
> #applies VSN normalization with VSN package
> nb_vsn<- function(x){
> return(list(norm = M_VSN__,
> bg = x$bg,
> agi = normalizeVSN(x$agi))) }
>
> #applies cyclic lowess normalization with limma
> nb_lowess<- function(x){
> return(list(norm = M_LWS__,
> bg = x$bg,
> agi = normalizeBetweenArrays(x$agi, method =
> M_LWS__))) }
>
> #applies quantile normalization with limma
> nb_quantile<- function(x){
> return(list(norm = M_QNT__,
> bg = x$bg,
> agi = normalizeBetweenArrays(x$agi, method =
> M_QNT__))) }
>
> #generates a plot of MA values with limma
> pl_ma<- function(x){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="")
> pdf(name)
> plotMA(x[[1]]$agi$E)
> dev.off()
> }
>
> #generates a plot of mean versus stdev with VS
> pl_meansd<- function(x){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"),
collapse="")
> pdf(name)
> meanSdPlot(x[[1]]$agi$E)
> dev.off()
> }
>
> #generate volcanoplot of log-fold versus log-odds
> pl_volcano<- function(x){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"),
collapse="")
> pdf(name)
> volcanoplot(x[[1]]$bayes)
> dev.off()
> }
>
> #generate plot of expression data over time
> pl_lines<- function(x){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"),
collapse="")
> pdf(name)
> plotlines(x[[1]]$agi$E)
> dev.off()
> }
>
> #generate heatmat of final results
> pl_heatmap<- function(x, ntop){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"),
collapse="")
> pdf(name)
> genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop)))
> heatmap(x[[1]]$lm$coefficients[genes,],
> col = redgreen(75),
> key = TRUE)
> dev.off()
> }
>
> #averages replicates of probes
> m_repavg<- function(x){
> return(list(agi = x[[1]]$agi,
> avg = avereps(x[[1]]$agi,
ID=x[[1]]$agi$genes$ProbeName),
> bg = x[[1]]$bg,
> norm = x[[1]]$norm))
> }
>
> #generate contrasts
> m_fit<- function(x, design){
> contmat<- makeContrasts(s2-s1,
> levels = design)
> fit1<- lmFit(x[[1]]$agi$E, design)
> fit2<- eBayes(contrasts.fit(fit1, contmat))
> diff<- topTable(fit2, coef = 1, adjust = "BH")
> res<- decideTests(fit2)
> comp<- vennDiagram(res)
> return(list(agi = x[[1]]$agi,
> bg = x[[1]]$bg,
> norm = x[[1]]$norm,
> ave = x[[1]]$ave,
> cont = contmat,
> lm = fit1,
> bayes = fit2,
> diff = diff,
> res = res,
> comp = comp))
> }
>
>
######################################################################
> ######### # Function to parse array data into limma objects # Args:
> # tfile: target file (see limma documentation)
> # tsep: limma defaults TSV, have to set manually for CSV and
so on
> # galfile: GAL file for printer/layout
> # Returns data object with targets, genes and printer layout
>
######################################################################
> #########
> parse<- function(tfile, tsep, galfile){
> targets<- readTargets(tfile, sep=tsep)
> gal<- readGAL(galfile)
> layout<- getLayout(gal)
> agi<- read.maimages(targets, columns = COL__,
> annotation = ANO__,
> green.only = TRUE)
> bg<- "none"
> norm<- "none"
> desg<- model.matrix(~0+factor(c(1,1,2,2)))
> colnames(desg)<- c("s1","s2")
> d<- list(agi = agi,
> gal = gal,
> layout = layout,
> targets = targets,
> bg = bg,
> desg = desg,
> norm = norm)
>
> processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
> processed<- apply(processed, 1:2, function(x){return(d)})
> return(processed)
> }
>
> #parse command args
> args<- commandArgs(trailingOnly = TRUE)
>
> #load files into matrix
> a<- parse(args[1], args[2], args[3])
>
> #keepy a copy of a raw dataset
> raw<- a[1,1]
> print(raw[[1]]$desg)
>
> #perform backgroung corrections
> a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract)
> a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp)
> a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum)
>
> #perform normalization
> a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn)
> a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess)
> a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile)
>
> #fit model and differentiate expression
> a<- apply(a, 1:2, m_repavg)
> a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg)
>
> #generate plots
> apply(a, 1:2, pl_ma)
> apply(a, 1:2, pl_meansd)
> apply(a, 1:2, pl_volcano)
> apply(a, 1:2, pl_lines)
> apply(a, 1:2, pl_heatmap, args[4])
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
------------------------------
Message: 4
Date: Thu, 28 Mar 2013 10:33:51 -0400
From: Sean Davis <sdavis2@mail.nih.gov>
To: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at="" nih.gov="">
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">,
"James
W. MacDonald" <jmacdon at="" uw.edu="">
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
limma and vsn.
Message-ID:
<caneavbkyv+mb61jvne0aklicxovxyvelusbhgowwr61_q9s-dw at="" mail.gmail.com="">
Content-Type: text/plain
On Thu, Mar 28, 2013 at 10:22 AM, Cornish, Joseph (NIH/NIAID) [F] <
joseph.cornish at nih.gov> wrote:
> Hello James,
>
> Are you referring to function names or constants? The constants are
an old
> habit I picked up long ago in my CS classes, there are a ton but it
makes
> them stand out for me in vim more readily. I've come to avoid camel
case
> since switching to linux, it makes searching for things a nightmare,
it can
> also help if I group functions with a prefix (eg plot functions
start with
> "pl_"). If you want I can find+replace all of the constants with
their
> values...
>
>
Hi, Joe.
I think I could help to translate part of what James said. If you
could
boil your example down to a few lines of reproducible code that show
the
problem, that would be simpler to interpret and easier to comment on
directly. Sometimes in the process of producing such a simplified
example,
I have found my own misunderstandings. It also sometimes helps to get
someone local to give a second pair of eyes.
> The differences came from using the unaveraged replicates in the
larger
> script versus the average in the example.
>
> I'm not sure what you mean by "mean of the controls", would this
simply be
> filtering out the control spots?
>
>
No. I think he means for you to subtract the mean of the control
samples
(or one group of samples) for each gene from all the normalized
expression
values.
> I'll generate the heatmaps using the normalized expression values. I
> suppose in that case it would be worthwhile to filter ahead of time
by
> p-value and fold change (the coef in topTable).
>
>
Yes.
Sean
>
> _________thanks_____________
> _______Joe______
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at uw.edu]
> Sent: Thursday, March 28, 2013 10:08 AM
> To: Cornish, Joseph (NIH/NIAID) [F]
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
limma
> and vsn.
>
> Hi Joseph,
>
>
> On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
> > I am attempting to write a script that will allow me to do bulk
> benchmarking of various methods for normalization and background
> correction. I am concerned that I am not processing the data
correctly and
> this may be leading to the abnormal heatmaps that I am seeing. For a
test
> case, I am using an experiment where I have two different conditions
with
> two different states with two replicates per state.
> >
> > The heatmaps concern me because of the uniformity between them,
the
> values are all completely green or completely red, there are
differences
> between genes within a state according to the heatmap.
> >
> > In the examples I have seen people have suggested that I use the
> coefficients from the eBayes fitting of the data, however in my
example the
> eBayes LMarray object does not have two columns in the results, only
on
> (not sure if this is a sign of errors on my part). I have instead
used the
> lmFit coefficients since there are column for each matrix.
> >
> > To further confuse someone who is new to R, the simplified example
and
> full code generate different heatmaps for the same data and same
methods
> (normexp for background, cyclic loess for norm). The differences
being in
> the simplified version the gene IDs show up on the right side but in
the
> full version, only row number appears. The color for some genes
changes
> between states as well.
> >
> > My questions are:
> > Why does the bayes model only have one column?
>
> Because you are computing one contrast (which is all you can do).
One
> contrast means one test which means one column of results. In other
words,
> the coefficient you are estimating in the contrasts.fit() step is
the
> difference in means between the two groups, which is one value per
gene.
>
> > Am I approaching the processing correctly?
>
> I tried to read through that business below, but got some weird
affliction
> due to an apparent overdose of underscores and had to stop.
>
> > Why are there differences between the resulting heatmaps of each
version?
>
> I don't know, as I worried my brain might explode if I read any
more.
> But you should note that it is fairly unusual to plot coefficients
in a
> heatmap, as what you are then presenting are log ratios, and most
people
> can't understand that.
>
> Instead you might consider getting the list of genes, and then
creating a
> heatmap using the normalized expression values for each sample
instead. You
> then would want to scale in some way. We often sweep the mean of the
> controls out of the matrix (by row), so the colors can then be
interpreted
> as differences between the treated samples and controls.
>
> Best,
>
> Jim
>
>
> >
> > I know that the idea is to keep the amount of posted code to a
minimum
> but I don't see any other way here.
> >
> > Here is the simplified example I have made:
> > library(limma)
> > library(vsn)
> > library(gplots)
> >
> > set.seed(0xabcd) #from vsn manual
> >
> > #constants for reading agi outs from our imager
> > COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean)
> > {A}")
> > ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
> >
> > #parse data
> > targets<- readTargets("test.csv", sep = ",")
> > gal<- readGAL("026806_D_20110524.gal")
> > layout<- getLayout(gal)
> > agi<- read.maimages(targets, columns = COL__,
> > annotation = ANO__,
> > green.only = TRUE)
> > design<- model.matrix(~0+factor(c(1,1,2,2)))
> > colnames(design)<- c("s1", "s2")
> >
> > #process
> > bg<- backgroundCorrect(agi, method = "normexp")
> > norm<- normalizeBetweenArrays(bg, method = "cyclicloess")
> > avg<- avereps(norm, ID=norm$genes$ProbeName)
> >
> > #fit
> > contmat<- makeContrasts(s2-s1, levels = design)
> > fit_lm<- lmFit(avg$E, design)
> > fit_be<- eBayes(contrasts.fit(fit_lm, contmat))
> >
> > #results
> > difexp<- topTable(fit_be, coef = 1, adjust = "BH")
> > res<- decideTests(fit_be)
> > comp<- vennDiagram(res)
> >
> > #plot
> > pdf("example-heatmap.pdf")
> > genes<- as.numeric(rownames(topTable(fit_be, n = 100)))
> > heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
> > dev.off()
> >
> >
> > The full version:
> > library(vsn)
> > library(limma)
> > library(gplots)
> > set.seed(0xabcd) #from the vsn manual
> >
> > #constants
> > A_SUB__<- 1
> > A_NXP__<- 2
> > A_MIN__<- 3
> > A_VSN__<- 1
> > A_QNT__<- 2
> > A_LWS__<- 3
> > M_SUB__<- "subtract"
> > M_NXP__<- "normexp"
> > M_MIN__<- "minimum"
> > M_VSN__<- "vsn"
> > M_LWS__<- "cyclicloess"
> > M_QNT__<- "quantile"
> > COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean)
> > {A}")
> > ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
> > N_BGM__<- 3 #number of background correction methods
> > N_NRM__<- 3 #number of normalization methods
> > A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__)
> > A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__)
> >
> >
> > #utility methods
> > #applies subtraction background correction with limma
> > bg_subtract<- function(x){
> > return(list(norm = x$norm,
> > bg = M_SUB__,
> > agi = backgroundCorrect(x$agi, method =
M_SUB__))) }
> >
> > #applies normexp backgroud correction with limma
> > bg_normexp<- function(x){
> > return(list(norm = x$norm,
> > bg = M_NXP__,
> > agi = backgroundCorrect(x$agi, method =
M_NXP__))) }
> >
> > #applies minimum background correction with limma
> > bg_minimum<- function(x){
> > return(list(norm = x$norm,
> > bg = M_MIN__,
> > agi = backgroundCorrect(x$agi, method =
M_MIN__))) }
> >
> > #applies VSN normalization with VSN package
> > nb_vsn<- function(x){
> > return(list(norm = M_VSN__,
> > bg = x$bg,
> > agi = normalizeVSN(x$agi))) }
> >
> > #applies VSN normalization with VSN package
> > nb_vsn<- function(x){
> > return(list(norm = M_VSN__,
> > bg = x$bg,
> > agi = normalizeVSN(x$agi))) }
> >
> > #applies cyclic lowess normalization with limma
> > nb_lowess<- function(x){
> > return(list(norm = M_LWS__,
> > bg = x$bg,
> > agi = normalizeBetweenArrays(x$agi, method =
> > M_LWS__))) }
> >
> > #applies quantile normalization with limma
> > nb_quantile<- function(x){
> > return(list(norm = M_QNT__,
> > bg = x$bg,
> > agi = normalizeBetweenArrays(x$agi, method =
> > M_QNT__))) }
> >
> > #generates a plot of MA values with limma
> > pl_ma<- function(x){
> > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"),
collapse="")
> > pdf(name)
> > plotMA(x[[1]]$agi$E)
> > dev.off()
> > }
> >
> > #generates a plot of mean versus stdev with VS
> > pl_meansd<- function(x){
> > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"),
collapse="")
> > pdf(name)
> > meanSdPlot(x[[1]]$agi$E)
> > dev.off()
> > }
> >
> > #generate volcanoplot of log-fold versus log-odds
> > pl_volcano<- function(x){
> > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"),
collapse="")
> > pdf(name)
> > volcanoplot(x[[1]]$bayes)
> > dev.off()
> > }
> >
> > #generate plot of expression data over time
> > pl_lines<- function(x){
> > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"),
collapse="")
> > pdf(name)
> > plotlines(x[[1]]$agi$E)
> > dev.off()
> > }
> >
> > #generate heatmat of final results
> > pl_heatmap<- function(x, ntop){
> > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"),
collapse="")
> > pdf(name)
> > genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n =
ntop)))
> > heatmap(x[[1]]$lm$coefficients[genes,],
> > col = redgreen(75),
> > key = TRUE)
> > dev.off()
> > }
> >
> > #averages replicates of probes
> > m_repavg<- function(x){
> > return(list(agi = x[[1]]$agi,
> > avg = avereps(x[[1]]$agi,
> ID=x[[1]]$agi$genes$ProbeName),
> > bg = x[[1]]$bg,
> > norm = x[[1]]$norm))
> > }
> >
> > #generate contrasts
> > m_fit<- function(x, design){
> > contmat<- makeContrasts(s2-s1,
> > levels = design)
> > fit1<- lmFit(x[[1]]$agi$E, design)
> > fit2<- eBayes(contrasts.fit(fit1, contmat))
> > diff<- topTable(fit2, coef = 1, adjust = "BH")
> > res<- decideTests(fit2)
> > comp<- vennDiagram(res)
> > return(list(agi = x[[1]]$agi,
> > bg = x[[1]]$bg,
> > norm = x[[1]]$norm,
> > ave = x[[1]]$ave,
> > cont = contmat,
> > lm = fit1,
> > bayes = fit2,
> > diff = diff,
> > res = res,
> > comp = comp))
> > }
> >
> >
######################################################################
> > ######### # Function to parse array data into limma objects #
Args:
> > # tfile: target file (see limma documentation)
> > # tsep: limma defaults TSV, have to set manually for CSV
and so on
> > # galfile: GAL file for printer/layout
> > # Returns data object with targets, genes and printer layout
> >
######################################################################
> > #########
> > parse<- function(tfile, tsep, galfile){
> > targets<- readTargets(tfile, sep=tsep)
> > gal<- readGAL(galfile)
> > layout<- getLayout(gal)
> > agi<- read.maimages(targets, columns = COL__,
> > annotation = ANO__,
> > green.only = TRUE)
> > bg<- "none"
> > norm<- "none"
> > desg<- model.matrix(~0+factor(c(1,1,2,2)))
> > colnames(desg)<- c("s1","s2")
> > d<- list(agi = agi,
> > gal = gal,
> > layout = layout,
> > targets = targets,
> > bg = bg,
> > desg = desg,
> > norm = norm)
> >
> > processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
> > processed<- apply(processed, 1:2, function(x){return(d)})
> > return(processed)
> > }
> >
> > #parse command args
> > args<- commandArgs(trailingOnly = TRUE)
> >
> > #load files into matrix
> > a<- parse(args[1], args[2], args[3])
> >
> > #keepy a copy of a raw dataset
> > raw<- a[1,1]
> > print(raw[[1]]$desg)
> >
> > #perform backgroung corrections
> > a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract)
> > a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp)
> > a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum)
> >
> > #perform normalization
> > a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn)
> > a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess)
> > a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile)
> >
> > #fit model and differentiate expression
> > a<- apply(a, 1:2, m_repavg)
> > a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg)
> >
> > #generate plots
> > apply(a, 1:2, pl_ma)
> > apply(a, 1:2, pl_meansd)
> > apply(a, 1:2, pl_volcano)
> > apply(a, 1:2, pl_lines)
> > apply(a, 1:2, pl_heatmap, args[4])
> >
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
------------------------------
Message: 5
Date: Thu, 28 Mar 2013 10:45:21 -0400
From: "James W. MacDonald" <jmacdon@uw.edu>
To: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at="" nih.gov="">
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
limma and vsn.
Message-ID: <51545781.7000100 at uw.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Joe,
I don't mean to be critical. If you like to use gobtons of
underscores,
then rock on. I would just find it very distracting.
Anyway, by 'mean of the controls', I mean the average value of the
control samples. But this could be generalized in any way. For
example,
your contrast was s2 - s1, so you could do
mat <- avg$E[<index of="" interesting="" genes="">,]
mn <- fit_lm[<index of="" interesting="" genes="">,"s1"]
swept <- sweep(mat, 1, mn, "-")
and then do the heatmap. In this case the colors would be interpreted
as
the log fold difference between any sample and the mean of the s1
samples.
But the general take home message here is that you can't just use the
raw expression values, as you will then have a solid color to your
heatmap. Other possibilities are to use the scale argument to
heatmap.2() to scale by row (convert to z-scores), in which case the
interpretation of the colors is less obvious.
Best,
Jim
On 3/28/2013 10:22 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
> Hello James,
>
> Are you referring to function names or constants? The constants are
an old habit I picked up long ago in my CS classes, there are a ton
but it makes them stand out for me in vim more readily. I've come to
avoid camel case since switching to linux, it makes searching for
things a nightmare, it can also help if I group functions with a
prefix (eg plot functions start with "pl_"). If you want I can
find+replace all of the constants with their values...
>
> The differences came from using the unaveraged replicates in the
larger script versus the average in the example.
>
> I'm not sure what you mean by "mean of the controls", would this
simply be filtering out the control spots?
>
> I'll generate the heatmaps using the normalized expression values. I
suppose in that case it would be worthwhile to filter ahead of time by
p-value and fold change (the coef in topTable).
>
>
> _________thanks_____________
> _______Joe______
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at uw.edu]
> Sent: Thursday, March 28, 2013 10:08 AM
> To: Cornish, Joseph (NIH/NIAID) [F]
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
limma and vsn.
>
> Hi Joseph,
>
>
> On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
>> I am attempting to write a script that will allow me to do bulk
benchmarking of various methods for normalization and background
correction. I am concerned that I am not processing the data correctly
and this may be leading to the abnormal heatmaps that I am seeing. For
a test case, I am using an experiment where I have two different
conditions with two different states with two replicates per state.
>>
>> The heatmaps concern me because of the uniformity between them, the
values are all completely green or completely red, there are
differences between genes within a state according to the heatmap.
>>
>> In the examples I have seen people have suggested that I use the
coefficients from the eBayes fitting of the data, however in my
example the eBayes LMarray object does not have two columns in the
results, only on (not sure if this is a sign of errors on my part). I
have instead used the lmFit coefficients since there are column for
each matrix.
>>
>> To further confuse someone who is new to R, the simplified example
and full code generate different heatmaps for the same data and same
methods (normexp for background, cyclic loess for norm). The
differences being in the simplified version the gene IDs show up on
the right side but in the full version, only row number appears. The
color for some genes changes between states as well.
>>
>> My questions are:
>> Why does the bayes model only have one column?
> Because you are computing one contrast (which is all you can do).
One contrast means one test which means one column of results. In
other words, the coefficient you are estimating in the contrasts.fit()
step is the difference in means between the two groups, which is one
value per gene.
>
>> Am I approaching the processing correctly?
> I tried to read through that business below, but got some weird
affliction due to an apparent overdose of underscores and had to stop.
>
>> Why are there differences between the resulting heatmaps of each
version?
> I don't know, as I worried my brain might explode if I read any
more.
> But you should note that it is fairly unusual to plot coefficients
in a heatmap, as what you are then presenting are log ratios, and most
people can't understand that.
>
> Instead you might consider getting the list of genes, and then
creating a heatmap using the normalized expression values for each
sample instead. You then would want to scale in some way. We often
sweep the mean of the controls out of the matrix (by row), so the
colors can then be interpreted as differences between the treated
samples and controls.
>
> Best,
>
> Jim
>
>
>> I know that the idea is to keep the amount of posted code to a
minimum but I don't see any other way here.
>>
>> Here is the simplified example I have made:
>> library(limma)
>> library(vsn)
>> library(gplots)
>>
>> set.seed(0xabcd) #from vsn manual
>>
>> #constants for reading agi outs from our imager
>> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean)
>> {A}")
>> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>>
>> #parse data
>> targets<- readTargets("test.csv", sep = ",")
>> gal<- readGAL("026806_D_20110524.gal")
>> layout<- getLayout(gal)
>> agi<- read.maimages(targets, columns = COL__,
>> annotation = ANO__,
>> green.only = TRUE)
>> design<- model.matrix(~0+factor(c(1,1,2,2)))
>> colnames(design)<- c("s1", "s2")
>>
>> #process
>> bg<- backgroundCorrect(agi, method = "normexp")
>> norm<- normalizeBetweenArrays(bg, method = "cyclicloess")
>> avg<- avereps(norm, ID=norm$genes$ProbeName)
>>
>> #fit
>> contmat<- makeContrasts(s2-s1, levels = design)
>> fit_lm<- lmFit(avg$E, design)
>> fit_be<- eBayes(contrasts.fit(fit_lm, contmat))
>>
>> #results
>> difexp<- topTable(fit_be, coef = 1, adjust = "BH")
>> res<- decideTests(fit_be)
>> comp<- vennDiagram(res)
>>
>> #plot
>> pdf("example-heatmap.pdf")
>> genes<- as.numeric(rownames(topTable(fit_be, n = 100)))
>> heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
>> dev.off()
>>
>>
>> The full version:
>> library(vsn)
>> library(limma)
>> library(gplots)
>> set.seed(0xabcd) #from the vsn manual
>>
>> #constants
>> A_SUB__<- 1
>> A_NXP__<- 2
>> A_MIN__<- 3
>> A_VSN__<- 1
>> A_QNT__<- 2
>> A_LWS__<- 3
>> M_SUB__<- "subtract"
>> M_NXP__<- "normexp"
>> M_MIN__<- "minimum"
>> M_VSN__<- "vsn"
>> M_LWS__<- "cyclicloess"
>> M_QNT__<- "quantile"
>> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean)
>> {A}")
>> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>> N_BGM__<- 3 #number of background correction methods
>> N_NRM__<- 3 #number of normalization methods
>> A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__)
>> A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__)
>>
>>
>> #utility methods
>> #applies subtraction background correction with limma
>> bg_subtract<- function(x){
>> return(list(norm = x$norm,
>> bg = M_SUB__,
>> agi = backgroundCorrect(x$agi, method =
M_SUB__))) }
>>
>> #applies normexp backgroud correction with limma
>> bg_normexp<- function(x){
>> return(list(norm = x$norm,
>> bg = M_NXP__,
>> agi = backgroundCorrect(x$agi, method =
M_NXP__))) }
>>
>> #applies minimum background correction with limma
>> bg_minimum<- function(x){
>> return(list(norm = x$norm,
>> bg = M_MIN__,
>> agi = backgroundCorrect(x$agi, method =
M_MIN__))) }
>>
>> #applies VSN normalization with VSN package
>> nb_vsn<- function(x){
>> return(list(norm = M_VSN__,
>> bg = x$bg,
>> agi = normalizeVSN(x$agi))) }
>>
>> #applies VSN normalization with VSN package
>> nb_vsn<- function(x){
>> return(list(norm = M_VSN__,
>> bg = x$bg,
>> agi = normalizeVSN(x$agi))) }
>>
>> #applies cyclic lowess normalization with limma
>> nb_lowess<- function(x){
>> return(list(norm = M_LWS__,
>> bg = x$bg,
>> agi = normalizeBetweenArrays(x$agi, method =
>> M_LWS__))) }
>>
>> #applies quantile normalization with limma
>> nb_quantile<- function(x){
>> return(list(norm = M_QNT__,
>> bg = x$bg,
>> agi = normalizeBetweenArrays(x$agi, method =
>> M_QNT__))) }
>>
>> #generates a plot of MA values with limma
>> pl_ma<- function(x){
>> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"),
collapse="")
>> pdf(name)
>> plotMA(x[[1]]$agi$E)
>> dev.off()
>> }
>>
>> #generates a plot of mean versus stdev with VS
>> pl_meansd<- function(x){
>> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"),
collapse="")
>> pdf(name)
>> meanSdPlot(x[[1]]$agi$E)
>> dev.off()
>> }
>>
>> #generate volcanoplot of log-fold versus log-odds
>> pl_volcano<- function(x){
>> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"),
collapse="")
>> pdf(name)
>> volcanoplot(x[[1]]$bayes)
>> dev.off()
>> }
>>
>> #generate plot of expression data over time
>> pl_lines<- function(x){
>> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"),
collapse="")
>> pdf(name)
>> plotlines(x[[1]]$agi$E)
>> dev.off()
>> }
>>
>> #generate heatmat of final results
>> pl_heatmap<- function(x, ntop){
>> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"),
collapse="")
>> pdf(name)
>> genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n =
ntop)))
>> heatmap(x[[1]]$lm$coefficients[genes,],
>> col = redgreen(75),
>> key = TRUE)
>> dev.off()
>> }
>>
>> #averages replicates of probes
>> m_repavg<- function(x){
>> return(list(agi = x[[1]]$agi,
>> avg = avereps(x[[1]]$agi,
ID=x[[1]]$agi$genes$ProbeName),
>> bg = x[[1]]$bg,
>> norm = x[[1]]$norm))
>> }
>>
>> #generate contrasts
>> m_fit<- function(x, design){
>> contmat<- makeContrasts(s2-s1,
>> levels = design)
>> fit1<- lmFit(x[[1]]$agi$E, design)
>> fit2<- eBayes(contrasts.fit(fit1, contmat))
>> diff<- topTable(fit2, coef = 1, adjust = "BH")
>> res<- decideTests(fit2)
>> comp<- vennDiagram(res)
>> return(list(agi = x[[1]]$agi,
>> bg = x[[1]]$bg,
>> norm = x[[1]]$norm,
>> ave = x[[1]]$ave,
>> cont = contmat,
>> lm = fit1,
>> bayes = fit2,
>> diff = diff,
>> res = res,
>> comp = comp))
>> }
>>
>>
######################################################################
>> ######### # Function to parse array data into limma objects # Args:
>> # tfile: target file (see limma documentation)
>> # tsep: limma defaults TSV, have to set manually for CSV and
so on
>> # galfile: GAL file for printer/layout
>> # Returns data object with targets, genes and printer layout
>>
######################################################################
>> #########
>> parse<- function(tfile, tsep, galfile){
>> targets<- readTargets(tfile, sep=tsep)
>> gal<- readGAL(galfile)
>> layout<- getLayout(gal)
>> agi<- read.maimages(targets, columns = COL__,
>> annotation = ANO__,
>> green.only = TRUE)
>> bg<- "none"
>> norm<- "none"
>> desg<- model.matrix(~0+factor(c(1,1,2,2)))
>> colnames(desg)<- c("s1","s2")
>> d<- list(agi = agi,
>> gal = gal,
>> layout = layout,
>> targets = targets,
>> bg = bg,
>> desg = desg,
>> norm = norm)
>>
>> processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
>> processed<- apply(processed, 1:2, function(x){return(d)})
>> return(processed)
>> }
>>
>> #parse command args
>> args<- commandArgs(trailingOnly = TRUE)
>>
>> #load files into matrix
>> a<- parse(args[1], args[2], args[3])
>>
>> #keepy a copy of a raw dataset
>> raw<- a[1,1]
>> print(raw[[1]]$desg)
>>
>> #perform backgroung corrections
>> a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract)
>> a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp)
>> a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum)
>>
>> #perform normalization
>> a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn)
>> a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess)
>> a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile)
>>
>> #fit model and differentiate expression
>> a<- apply(a, 1:2, m_repavg)
>> a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg)
>>
>> #generate plots
>> apply(a, 1:2, pl_ma)
>> apply(a, 1:2, pl_meansd)
>> apply(a, 1:2, pl_volcano)
>> apply(a, 1:2, pl_lines)
>> apply(a, 1:2, pl_heatmap, args[4])
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
------------------------------
Message: 6
Date: Thu, 28 Mar 2013 15:00:49 +0000
From: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish@nih.gov>
To: "Davis, Sean (NIH/NCI) [E]" <sdavis2 at="" mail.nih.gov="">
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">,
"James
W. MacDonald" <jmacdon at="" uw.edu="">
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
limma and vsn.
Message-ID: <e68ea43fbf80174c85a92bc24efaa3b84b7c02 at="" mlbxv01.nih.gov="">
Content-Type: text/plain
Hello Davis,
I posted a simple example, which after fixing a problem in the
longer example no produces the same results. I only posted the longer
one because I wasn???t able to reproduce the results of the longer
version initially. I???ve been sort of banging my head on this for a
while so I missed the difference between ???agi??? and ???avg???.
I???m thankful anyone bothered to look after posting the code, so
thanks. If I sounded offended by the underscores, I was only admitting
that I know they???re terrible. c:
Thanks for clarifying the subtraction of the mean aspect. That has
been one aspect that has been troubling when it comes to learning this
analysis, there seem to be plenty of ways to do things but, without
experience, no way to tell what is better.
I will give the mean subtraction and the z-score scaling a try. Am I
reading too much into the usefulness of these heatmaps? In most papers
I???ve seen they seem to be there because the look nice or the
attached clustering might be of some use.
Thanks again!
From: Davis, Sean (NCI) On Behalf Of Davis, Sean (NIH/NCI) [E]
Sent: Thursday, March 28, 2013 10:34 AM
To: Cornish, Joseph (NIH/NIAID) [F]
Cc: James W. MacDonald; bioconductor at r-project.org
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
limma and vsn.
On Thu, Mar 28, 2013 at 10:22 AM, Cornish, Joseph (NIH/NIAID) [F]
<joseph.cornish at="" nih.gov<mailto:joseph.cornish="" at="" nih.gov="">> wrote:
Hello James,
Are you referring to function names or constants? The constants are an
old habit I picked up long ago in my CS classes, there are a ton but
it makes them stand out for me in vim more readily. I've come to avoid
camel case since switching to linux, it makes searching for things a
nightmare, it can also help if I group functions with a prefix (eg
plot functions start with "pl_"). If you want I can find+replace all
of the constants with their values...
Hi, Joe.
I think I could help to translate part of what James said. If you
could boil your example down to a few lines of reproducible code that
show the problem, that would be simpler to interpret and easier to
comment on directly. Sometimes in the process of producing such a
simplified example, I have found my own misunderstandings. It also
sometimes helps to get someone local to give a second pair of eyes.
The differences came from using the unaveraged replicates in the
larger script versus the average in the example.
I'm not sure what you mean by "mean of the controls", would this
simply be filtering out the control spots?
No. I think he means for you to subtract the mean of the control
samples (or one group of samples) for each gene from all the
normalized expression values.
I'll generate the heatmaps using the normalized expression values. I
suppose in that case it would be worthwhile to filter ahead of time by
p-value and fold change (the coef in topTable).
Yes.
Sean
_________thanks_____________
_______Joe______
-----Original Message-----
From: James W. MacDonald
[mailto:jmacdon@uw.edu<mailto:jmacdon@uw.edu>]
Sent: Thursday, March 28, 2013 10:08 AM
To: Cornish, Joseph (NIH/NIAID) [F]
Cc: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
limma and vsn.
Hi Joseph,
On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
> I am attempting to write a script that will allow me to do bulk
benchmarking of various methods for normalization and background
correction. I am concerned that I am not processing the data correctly
and this may be leading to the abnormal heatmaps that I am seeing. For
a test case, I am using an experiment where I have two different
conditions with two different states with two replicates per state.
>
> The heatmaps concern me because of the uniformity between them, the
values are all completely green or completely red, there are
differences between genes within a state according to the heatmap.
>
> In the examples I have seen people have suggested that I use the
coefficients from the eBayes fitting of the data, however in my
example the eBayes LMarray object does not have two columns in the
results, only on (not sure if this is a sign of errors on my part). I
have instead used the lmFit coefficients since there are column for
each matrix.
>
> To further confuse someone who is new to R, the simplified example
and full code generate different heatmaps for the same data and same
methods (normexp for background, cyclic loess for norm). The
differences being in the simplified version the gene IDs show up on
the right side but in the full version, only row number appears. The
color for some genes changes between states as well.
>
> My questions are:
> Why does the bayes model only have one column?
Because you are computing one contrast (which is all you can do). One
contrast means one test which means one column of results. In other
words, the coefficient you are estimating in the contrasts.fit() step
is the difference in means between the two groups, which is one value
per gene.
> Am I approaching the processing correctly?
I tried to read through that business below, but got some weird
affliction due to an apparent overdose of underscores and had to stop.
> Why are there differences between the resulting heatmaps of each
version?
I don't know, as I worried my brain might explode if I read any more.
But you should note that it is fairly unusual to plot coefficients in
a heatmap, as what you are then presenting are log ratios, and most
people can't understand that.
Instead you might consider getting the list of genes, and then
creating a heatmap using the normalized expression values for each
sample instead. You then would want to scale in some way. We often
sweep the mean of the controls out of the matrix (by row), so the
colors can then be interpreted as differences between the treated
samples and controls.
Best,
Jim
>
> I know that the idea is to keep the amount of posted code to a
minimum but I don't see any other way here.
>
> Here is the simplified example I have made:
> library(limma)
> library(vsn)
> library(gplots)
>
> set.seed(0xabcd) #from vsn manual
>
> #constants for reading agi outs from our imager
> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean)
> {A}")
> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>
> #parse data
> targets<- readTargets("test.csv", sep = ",")
> gal<- readGAL("026806_D_20110524.gal")
> layout<- getLayout(gal)
> agi<- read.maimages(targets, columns = COL__,
> annotation = ANO__,
> green.only = TRUE)
> design<- model.matrix(~0+factor(c(1,1,2,2)))
> colnames(design)<- c("s1", "s2")
>
> #process
> bg<- backgroundCorrect(agi, method = "normexp")
> norm<- normalizeBetweenArrays(bg, method = "cyclicloess")
> avg<- avereps(norm, ID=norm$genes$ProbeName)
>
> #fit
> contmat<- makeContrasts(s2-s1, levels = design)
> fit_lm<- lmFit(avg$E, design)
> fit_be<- eBayes(contrasts.fit(fit_lm, contmat))
>
> #results
> difexp<- topTable(fit_be, coef = 1, adjust = "BH")
> res<- decideTests(fit_be)
> comp<- vennDiagram(res)
>
> #plot
> pdf("example-heatmap.pdf")
> genes<- as.numeric(rownames(topTable(fit_be, n = 100)))
> heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
> dev.off()
>
>
> The full version:
> library(vsn)
> library(limma)
> library(gplots)
> set.seed(0xabcd) #from the vsn manual
>
> #constants
> A_SUB__<- 1
> A_NXP__<- 2
> A_MIN__<- 3
> A_VSN__<- 1
> A_QNT__<- 2
> A_LWS__<- 3
> M_SUB__<- "subtract"
> M_NXP__<- "normexp"
> M_MIN__<- "minimum"
> M_VSN__<- "vsn"
> M_LWS__<- "cyclicloess"
> M_QNT__<- "quantile"
> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean)
> {A}")
> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
> N_BGM__<- 3 #number of background correction methods
> N_NRM__<- 3 #number of normalization methods
> A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__)
> A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__)
>
>
> #utility methods
> #applies subtraction background correction with limma
> bg_subtract<- function(x){
> return(list(norm = x$norm,
> bg = M_SUB__,
> agi = backgroundCorrect(x$agi, method = M_SUB__)))
}
>
> #applies normexp backgroud correction with limma
> bg_normexp<- function(x){
> return(list(norm = x$norm,
> bg = M_NXP__,
> agi = backgroundCorrect(x$agi, method = M_NXP__)))
}
>
> #applies minimum background correction with limma
> bg_minimum<- function(x){
> return(list(norm = x$norm,
> bg = M_MIN__,
> agi = backgroundCorrect(x$agi, method = M_MIN__)))
}
>
> #applies VSN normalization with VSN package
> nb_vsn<- function(x){
> return(list(norm = M_VSN__,
> bg = x$bg,
> agi = normalizeVSN(x$agi))) }
>
> #applies VSN normalization with VSN package
> nb_vsn<- function(x){
> return(list(norm = M_VSN__,
> bg = x$bg,
> agi = normalizeVSN(x$agi))) }
>
> #applies cyclic lowess normalization with limma
> nb_lowess<- function(x){
> return(list(norm = M_LWS__,
> bg = x$bg,
> agi = normalizeBetweenArrays(x$agi, method =
> M_LWS__))) }
>
> #applies quantile normalization with limma
> nb_quantile<- function(x){
> return(list(norm = M_QNT__,
> bg = x$bg,
> agi = normalizeBetweenArrays(x$agi, method =
> M_QNT__))) }
>
> #generates a plot of MA values with limma
> pl_ma<- function(x){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="")
> pdf(name)
> plotMA(x[[1]]$agi$E)
> dev.off()
> }
>
> #generates a plot of mean versus stdev with VS
> pl_meansd<- function(x){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"),
collapse="")
> pdf(name)
> meanSdPlot(x[[1]]$agi$E)
> dev.off()
> }
>
> #generate volcanoplot of log-fold versus log-odds
> pl_volcano<- function(x){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"),
collapse="")
> pdf(name)
> volcanoplot(x[[1]]$bayes)
> dev.off()
> }
>
> #generate plot of expression data over time
> pl_lines<- function(x){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"),
collapse="")
> pdf(name)
> plotlines(x[[1]]$agi$E)
> dev.off()
> }
>
> #generate heatmat of final results
> pl_heatmap<- function(x, ntop){
> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"),
collapse="")
> pdf(name)
> genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop)))
> heatmap(x[[1]]$lm$coefficients[genes,],
> col = redgreen(75),
> key = TRUE)
> dev.off()
> }
>
> #averages replicates of probes
> m_repavg<- function(x){
> return(list(agi = x[[1]]$agi,
> avg = avereps(x[[1]]$agi,
ID=x[[1]]$agi$genes$ProbeName),
> bg = x[[1]]$bg,
> norm = x[[1]]$norm))
> }
>
> #generate contrasts
> m_fit<- function(x, design){
> contmat<- makeContrasts(s2-s1,
> levels = design)
> fit1<- lmFit(x[[1]]$agi$E, design)
> fit2<- eBayes(contrasts.fit(fit1, contmat))
> diff<- topTable(fit2, coef = 1, adjust = "BH")
> res<- decideTests(fit2)
> comp<- vennDiagram(res)
> return(list(agi = x[[1]]$agi,
> bg = x[[1]]$bg,
> norm = x[[1]]$norm,
> ave = x[[1]]$ave,
> cont = contmat,
> lm = fit1,
> bayes = fit2,
> diff = diff,
> res = res,
> comp = comp))
> }
>
>
######################################################################
> ######### # Function to parse array data into limma objects # Args:
> # tfile: target file (see limma documentation)
> # tsep: limma defaults TSV, have to set manually for CSV and
so on
> # galfile: GAL file for printer/layout
> # Returns data object with targets, genes and printer layout
>
######################################################################
> #########
> parse<- function(tfile, tsep, galfile){
> targets<- readTargets(tfile, sep=tsep)
> gal<- readGAL(galfile)
> layout<- getLayout(gal)
> agi<- read.maimages(targets, columns = COL__,
> annotation = ANO__,
> green.only = TRUE)
> bg<- "none"
> norm<- "none"
> desg<- model.matrix(~0+factor(c(1,1,2,2)))
> colnames(desg)<- c("s1","s2")
> d<- list(agi = agi,
> gal = gal,
> layout = layout,
> targets = targets,
> bg = bg,
> desg = desg,
> norm = norm)
>
> processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
> processed<- apply(processed, 1:2, function(x){return(d)})
> return(processed)
> }
>
> #parse command args
> args<- commandArgs(trailingOnly = TRUE)
>
> #load files into matrix
> a<- parse(args[1], args[2], args[3])
>
> #keepy a copy of a raw dataset
> raw<- a[1,1]
> print(raw[[1]]$desg)
>
> #perform backgroung corrections
> a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract)
> a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp)
> a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum)
>
> #perform normalization
> a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn)
> a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess)
> a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile)
>
> #fit model and differentiate expression
> a<- apply(a, 1:2, m_repavg)
> a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg)
>
> #generate plots
> apply(a, 1:2, pl_ma)
> apply(a, 1:2, pl_meansd)
> apply(a, 1:2, pl_volcano)
> apply(a, 1:2, pl_lines)
> apply(a, 1:2, pl_heatmap, args[4])
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]
------------------------------
Message: 7
Date: Thu, 28 Mar 2013 15:18:59 +0000
From: Andreia Fonseca <andreia.fonseca@gmail.com>
To: bioconductor <bioconductor at="" stat.math.ethz.ch="">
Subject: [BioC] analysis of HuGene2.0-st array affymetrix -aroma
package
Message-ID:
<cag43sel74j9eypo8izygsy7=c0jye5bjw5egcohkug9mmyp_fq at="" mail.gmail.com="">
Content-Type: text/plain
Dear all,
I am analysing HuGene2.0-st array affymetrix using aroma and limma
package.
I have 2 treatments and 3 arrays per condition. I have getting 11
genes
that are differentially expressed, but these loose significance after
applying fdr correction. I want to apply a filter out genes with
similar
distributions between treatments before applying fdr correction to see
I
am loosing significance due to overcorrection. Can someone advise me
on a
function or package to do this? I have explored genefilter and I could
not
find a function for this.
Thanks for the help.
Kind regards,
Andreia
----------------------------------------------------------------------
-------------------------
Andreia J. Amaral, PhD
BioFIG - Center for Biodiversity, Functional and Integrative Genomics
Instituto de Medicina Molecular
University of Lisbon
Tel: +352 217500000 (ext. office: 28253)
email:andreiaamaral at fm.ul.pt ; andreiaamaral at fc.ul.pt
[[alternative HTML version deleted]]
------------------------------
Message: 8
Date: Thu, 28 Mar 2013 11:47:12 -0400
From: Sean Davis <sdavis2@mail.nih.gov>
To: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at="" nih.gov="">
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">,
"James
W. MacDonald" <jmacdon at="" uw.edu="">
Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
limma and vsn.
Message-ID:
<caneavb=xn9c2bjqk6gf5dptghbltgrcov+oh0t05qsmd1s92dg at="" mail.gmail.com="">
Content-Type: text/plain; charset=UTF-8
On Thu, Mar 28, 2013 at 11:00 AM, Cornish, Joseph (NIH/NIAID) [F]
<joseph.cornish at="" nih.gov=""> wrote:
> Hello Davis,
>
> I posted a simple example, which after fixing a problem in the
longer example no produces the same results. I only posted the longer
one because I wasn?t able to reproduce the results of the longer
version initially. I?ve been sort of banging my head on this for a
while so I missed the difference between ?agi? and ?avg?. I?m thankful
anyone bothered to look after posting the code, so thanks. If I
sounded offended by the underscores, I was only admitting that I know
they?re terrible. c:
>
> Thanks for clarifying the subtraction of the mean aspect. That has
been one aspect that has been troubling when it comes to learning this
analysis, there seem to be plenty of ways to do things but, without
experience, no way to tell what is better.
>
There is not a "right" way to do this since the only metric I have
seen for the evaluation of the color scale in a heatmap is subjective
and qualitative.
> I will give the mean subtraction and the z-score scaling a try. Am I
reading too much into the usefulness of these heatmaps? In most papers
I?ve seen they seem to be there because the look nice or the attached
clustering might be of some use.
>
Heatmaps are very good visualization tools. I would not try to attach
more utility to them than that.
Sean
> Thanks again!
>
> From: Davis, Sean (NCI) On Behalf Of Davis, Sean (NIH/NCI) [E]
> Sent: Thursday, March 28, 2013 10:34 AM
> To: Cornish, Joseph (NIH/NIAID) [F]
> Cc: James W. MacDonald; bioconductor at r-project.org
> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
limma and vsn.
>
>
>
> On Thu, Mar 28, 2013 at 10:22 AM, Cornish, Joseph (NIH/NIAID) [F]
<joseph.cornish at="" nih.gov<mailto:joseph.cornish="" at="" nih.gov="">> wrote:
> Hello James,
>
> Are you referring to function names or constants? The constants are
an old habit I picked up long ago in my CS classes, there are a ton
but it makes them stand out for me in vim more readily. I've come to
avoid camel case since switching to linux, it makes searching for
things a nightmare, it can also help if I group functions with a
prefix (eg plot functions start with "pl_"). If you want I can
find+replace all of the constants with their values...
>
> Hi, Joe.
>
> I think I could help to translate part of what James said. If you
could boil your example down to a few lines of reproducible code that
show the problem, that would be simpler to interpret and easier to
comment on directly. Sometimes in the process of producing such a
simplified example, I have found my own misunderstandings. It also
sometimes helps to get someone local to give a second pair of eyes.
>
> The differences came from using the unaveraged replicates in the
larger script versus the average in the example.
>
> I'm not sure what you mean by "mean of the controls", would this
simply be filtering out the control spots?
>
> No. I think he means for you to subtract the mean of the control
samples (or one group of samples) for each gene from all the
normalized expression values.
>
> I'll generate the heatmaps using the normalized expression values. I
suppose in that case it would be worthwhile to filter ahead of time by
p-value and fold change (the coef in topTable).
>
> Yes.
>
> Sean
>
>
> _________thanks_____________
> _______Joe______
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at uw.edu<mailto:jmacdon at="" uw.edu="">]
> Sent: Thursday, March 28, 2013 10:08 AM
> To: Cornish, Joseph (NIH/NIAID) [F]
> Cc: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">
> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using
limma and vsn.
>
> Hi Joseph,
>
>
> On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote:
>> I am attempting to write a script that will allow me to do bulk
benchmarking of various methods for normalization and background
correction. I am concerned that I am not processing the data correctly
and this may be leading to the abnormal heatmaps that I am seeing. For
a test case, I am using an experiment where I have two different
conditions with two different states with two replicates per state.
>>
>> The heatmaps concern me because of the uniformity between them, the
values are all completely green or completely red, there are
differences between genes within a state according to the heatmap.
>>
>> In the examples I have seen people have suggested that I use the
coefficients from the eBayes fitting of the data, however in my
example the eBayes LMarray object does not have two columns in the
results, only on (not sure if this is a sign of errors on my part). I
have instead used the lmFit coefficients since there are column for
each matrix.
>>
>> To further confuse someone who is new to R, the simplified example
and full code generate different heatmaps for the same data and same
methods (normexp for background, cyclic loess for norm). The
differences being in the simplified version the gene IDs show up on
the right side but in the full version, only row number appears. The
color for some genes changes between states as well.
>>
>> My questions are:
>> Why does the bayes model only have one column?
>
> Because you are computing one contrast (which is all you can do).
One contrast means one test which means one column of results. In
other words, the coefficient you are estimating in the contrasts.fit()
step is the difference in means between the two groups, which is one
value per gene.
>
>> Am I approaching the processing correctly?
>
> I tried to read through that business below, but got some weird
affliction due to an apparent overdose of underscores and had to stop.
>
>> Why are there differences between the resulting heatmaps of each
version?
>
> I don't know, as I worried my brain might explode if I read any
more.
> But you should note that it is fairly unusual to plot coefficients
in a heatmap, as what you are then presenting are log ratios, and most
people can't understand that.
>
> Instead you might consider getting the list of genes, and then
creating a heatmap using the normalized expression values for each
sample instead. You then would want to scale in some way. We often
sweep the mean of the controls out of the matrix (by row), so the
colors can then be interpreted as differences between the treated
samples and controls.
>
> Best,
>
> Jim
>
>
>>
>> I know that the idea is to keep the amount of posted code to a
minimum but I don't see any other way here.
>>
>> Here is the simplified example I have made:
>> library(limma)
>> library(vsn)
>> library(gplots)
>>
>> set.seed(0xabcd) #from vsn manual
>>
>> #constants for reading agi outs from our imager
>> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean)
>> {A}")
>> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>>
>> #parse data
>> targets<- readTargets("test.csv", sep = ",")
>> gal<- readGAL("026806_D_20110524.gal")
>> layout<- getLayout(gal)
>> agi<- read.maimages(targets, columns = COL__,
>> annotation = ANO__,
>> green.only = TRUE)
>> design<- model.matrix(~0+factor(c(1,1,2,2)))
>> colnames(design)<- c("s1", "s2")
>>
>> #process
>> bg<- backgroundCorrect(agi, method = "normexp")
>> norm<- normalizeBetweenArrays(bg, method = "cyclicloess")
>> avg<- avereps(norm, ID=norm$genes$ProbeName)
>>
>> #fit
>> contmat<- makeContrasts(s2-s1, levels = design)
>> fit_lm<- lmFit(avg$E, design)
>> fit_be<- eBayes(contrasts.fit(fit_lm, contmat))
>>
>> #results
>> difexp<- topTable(fit_be, coef = 1, adjust = "BH")
>> res<- decideTests(fit_be)
>> comp<- vennDiagram(res)
>>
>> #plot
>> pdf("example-heatmap.pdf")
>> genes<- as.numeric(rownames(topTable(fit_be, n = 100)))
>> heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE)
>> dev.off()
>>
>>
>> The full version:
>> library(vsn)
>> library(limma)
>> library(gplots)
>> set.seed(0xabcd) #from the vsn manual
>>
>> #constants
>> A_SUB__<- 1
>> A_NXP__<- 2
>> A_MIN__<- 3
>> A_VSN__<- 1
>> A_QNT__<- 2
>> A_LWS__<- 3
>> M_SUB__<- "subtract"
>> M_NXP__<- "normexp"
>> M_MIN__<- "minimum"
>> M_VSN__<- "vsn"
>> M_LWS__<- "cyclicloess"
>> M_QNT__<- "quantile"
>> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background
(tr.mean)
>> {A}")
>> ANO__<- c("Position", "Name", "ID", "Description", "GeneName")
>> N_BGM__<- 3 #number of background correction methods
>> N_NRM__<- 3 #number of normalization methods
>> A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__)
>> A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__)
>>
>>
>> #utility methods
>> #applies subtraction background correction with limma
>> bg_subtract<- function(x){
>> return(list(norm = x$norm,
>> bg = M_SUB__,
>> agi = backgroundCorrect(x$agi, method =
M_SUB__))) }
>>
>> #applies normexp backgroud correction with limma
>> bg_normexp<- function(x){
>> return(list(norm = x$norm,
>> bg = M_NXP__,
>> agi = backgroundCorrect(x$agi, method =
M_NXP__))) }
>>
>> #applies minimum background correction with limma
>> bg_minimum<- function(x){
>> return(list(norm = x$norm,
>> bg = M_MIN__,
>> agi = backgroundCorrect(x$agi, method =
M_MIN__))) }
>>
>> #applies VSN normalization with VSN package
>> nb_vsn<- function(x){
>> return(list(norm = M_VSN__,
>> bg = x$bg,
>> agi = normalizeVSN(x$agi))) }
>>
>> #applies VSN normalization with VSN package
>> nb_vsn<- function(x){
>> return(list(norm = M_VSN__,
>> bg = x$bg,
>> agi = normalizeVSN(x$agi))) }
>>
>> #applies cyclic lowess normalization with limma
>> nb_lowess<- function(x){
>> return(list(norm = M_LWS__,
>> bg = x$bg,
>> agi = normalizeBetweenArrays(x$agi, method =
>> M_LWS__))) }
>>
>> #applies quantile normalization with limma
>> nb_quantile<- function(x){
>> return(list(norm = M_QNT__,
>> bg = x$bg,
>> agi = normalizeBetweenArrays(x$agi, method =
>> M_QNT__))) }
>>
>> #generates a plot of MA values with limma
>> pl_ma<- function(x){
>> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"),
collapse="")
>> pdf(name)
>> plotMA(x[[1]]$agi$E)
>> dev.off()
>> }
>>
>> #generates a plot of mean versus stdev with VS
>> pl_meansd<- function(x){
>> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"),
collapse="")
>> pdf(name)
>> meanSdPlot(x[[1]]$agi$E)
>> dev.off()
>> }
>>
>> #generate volcanoplot of log-fold versus log-odds
>> pl_volcano<- function(x){
>> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"),
collapse="")
>> pdf(name)
>> volcanoplot(x[[1]]$bayes)
>> dev.off()
>> }
>>
>> #generate plot of expression data over time
>> pl_lines<- function(x){
>> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"),
collapse="")
>> pdf(name)
>> plotlines(x[[1]]$agi$E)
>> dev.off()
>> }
>>
>> #generate heatmat of final results
>> pl_heatmap<- function(x, ntop){
>> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"),
collapse="")
>> pdf(name)
>> genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop)))
>> heatmap(x[[1]]$lm$coefficients[genes,],
>> col = redgreen(75),
>> key = TRUE)
>> dev.off()
>> }
>>
>> #averages replicates of probes
>> m_repavg<- function(x){
>> return(list(agi = x[[1]]$agi,
>> avg = avereps(x[[1]]$agi,
ID=x[[1]]$agi$genes$ProbeName),
>> bg = x[[1]]$bg,
>> norm = x[[1]]$norm))
>> }
>>
>> #generate contrasts
>> m_fit<- function(x, design){
>> contmat<- makeContrasts(s2-s1,
>> levels = design)
>> fit1<- lmFit(x[[1]]$agi$E, design)
>> fit2<- eBayes(contrasts.fit(fit1, contmat))
>> diff<- topTable(fit2, coef = 1, adjust = "BH")
>> res<- decideTests(fit2)
>> comp<- vennDiagram(res)
>> return(list(agi = x[[1]]$agi,
>> bg = x[[1]]$bg,
>> norm = x[[1]]$norm,
>> ave = x[[1]]$ave,
>> cont = contmat,
>> lm = fit1,
>> bayes = fit2,
>> diff = diff,
>> res = res,
>> comp = comp))
>> }
>>
>>
######################################################################
>> ######### # Function to parse array data into limma objects # Args:
>> # tfile: target file (see limma documentation)
>> # tsep: limma defaults TSV, have to set manually for CSV and
so on
>> # galfile: GAL file for printer/layout
>> # Returns data object with targets, genes and printer layout
>>
######################################################################
>> #########
>> parse<- function(tfile, tsep, galfile){
>> targets<- readTargets(tfile, sep=tsep)
>> gal<- readGAL(galfile)
>> layout<- getLayout(gal)
>> agi<- read.maimages(targets, columns = COL__,
>> annotation = ANO__,
>> green.only = TRUE)
>> bg<- "none"
>> norm<- "none"
>> desg<- model.matrix(~0+factor(c(1,1,2,2)))
>> colnames(desg)<- c("s1","s2")
>> d<- list(agi = agi,
>> gal = gal,
>> layout = layout,
>> targets = targets,
>> bg = bg,
>> desg = desg,
>> norm = norm)
>>
>> processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__)
>> processed<- apply(processed, 1:2, function(x){return(d)})
>> return(processed)
>> }
>>
>> #parse command args
>> args<- commandArgs(trailingOnly = TRUE)
>>
>> #load files into matrix
>> a<- parse(args[1], args[2], args[3])
>>
>> #keepy a copy of a raw dataset
>> raw<- a[1,1]
>> print(raw[[1]]$desg)
>>
>> #perform backgroung corrections
>> a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract)
>> a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp)
>> a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum)
>>
>> #perform normalization
>> a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn)
>> a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess)
>> a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile)
>>
>> #fit model and differentiate expression
>> a<- apply(a, 1:2, m_repavg)
>> a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg)
>>
>> #generate plots
>> apply(a, 1:2, pl_ma)
>> apply(a, 1:2, pl_meansd)
>> apply(a, 1:2, pl_volcano)
>> apply(a, 1:2, pl_lines)
>> apply(a, 1:2, pl_heatmap, args[4])
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 9
Date: Thu, 28 Mar 2013 11:00:17 -0500
From: Michael Muratet <mmuratet@hudsonalpha.org>
To: bioconductor at r-project.org
Subject: [BioC] Interpreting DESeq2 results
Message-ID: <14DFA994-3719-4FDF-BD71-4290D4149457 at hudsonalpha.org>
Content-Type: text/plain; charset=us-ascii
Greetings
I have an experiment:
> design(dse)
~ factor1 + factor2 + factor3
where factor1 has two levels, factor2 has three levels and factor3 has
three levels. I extract a gene of interest from the results for each
term (I've changed the indices to reflect the condition):
> lapply(resultsNames(dse),function(u) results(dse,u)["gene_A",])
[["Intercept"]]
baseMean log2FoldChange pvalue FDR
gene_A 1596.548 10.77485 3.309439e-216 7.025442e-216
[["factor1_level2"]]
baseMean log2FoldChange pvalue FDR
gene_A 1596.548 0.3386776 0.1307309 0.3587438
[["factor2_level2"]]
baseMean log2FoldChange pvalue FDR
gene_A 1596.548 -0.6882543 0.0613569 0.1007896
[["factor2_level3"]]
baseMean log2FoldChange pvalue FDR
gene_A 1596.548 0.2393368 0.513216 0.6589575
[["factor3_level2"]]
baseMean log2FoldChange pvalue FDR
gene_A 1596.548 0.1584153 0.6423634 0.8503163
[["factor3_level3]]
baseMean log2FoldChange pvalue FDR
gene_A 1596.548 -1.627898 1.823141e-06 0.001409384
I want to be sure I understand the output format. Is it true that the
coefficients (the vector beta) from the fit are the baseMean value
scaled by the log2FoldChange? Is the true intercept value
1596.548*2^10.77485=2797274.13?
mcols() tells me that the baseMean term is calculated over "all rows".
The baseMean is different for different genes although it is the same
for each gene across all the conditions, I'm not seeing how the rows
are selected.
Thanks
Mike
Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)
Room 4005
601 Genome Way
Huntsville, Alabama 35806
------------------------------
Message: 10
Date: Thu, 28 Mar 2013 16:30:12 +0000
From: Fiona Ingleby <fiona.ingleby@gmail.com>
To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: [BioC] identifying drosophila miRNA targets
Message-ID: <04CBBC25-9E0F-4E2B-8008-D3EB093E2A10 at gmail.com>
Content-Type: text/plain
Hi everyone,
I am working with mRNA data from Affy 'drosophila2' arrays and miRNA
data from Affy 'mirna3' arrays. I have identified a list of
differentially expressed mRNAs and miRNAs. I'm having a bit of trouble
with some downstream analyses and I'm hoping someone might be able to
offer some help.
I would like to use my list of differentially expressed miRNAs to
access online databases (e.g. miRBase, microRNA.org?) and extract the
names of all the potential target mRNAs. Then I'd like to use this
list of mRNAs to look through my mRNA expression data. I'm aware of
packages like 'RmiR' and 'microRNA' which have built-in functions for
finding miRNA targets, but as far as I can tell, 'RmiR' uses miRNA
databases for humans only and 'microRNA' works with human and mouse
data only. So is there a package I am unaware of (or another
application of 'RmiR'/'microRNA' that I am unaware of) for looking at
drosophila data?
So far I have also considered the 'biomaRt' package to see if the
database query function on there can help me, but I haven't had much
luck. For instance, if I try an example list of miRNAs:
mirna<-c("dme-miR-1002","dme-miR-312","dme-miR-973")
library(biomaRt)
ensembl<-useMart("ensembl",dataset="dmelanogaster_gene_ensembl")
getBM(attributes="mirbase_accession",filters="mirbase_id",values=mirna
,mart=ensembl)
then 'logical(0)' is returned, as if there are no records for those
miRNAs - but by searching the database manually I know the records are
there.
Alternatively I can try:
miRNA <- getBM(c("mirbase_accession","mirbase_id", "ensembl_gene_id",
"start_position", "chromosome_name"), filters = c("with_mirbase"),
values = list(T), mart = ensembl)
which returns a table of various bits of information on miRNAs, but I
cannot adapt this command to just look at my list of miRNAs of
interest (ie. the 'mirna' vector above). I've included the
sessionInfo() output for these at the bottom of the email, but I
suspect my problem is more to do with the fact I'm not going about
this the right way (as opposed to a problem with package versions and
coding etc.). I'm not even sure that using 'biomaRt' will give me the
information I eventually want (the target mRNAs of these miRNAs), I
was just trying it out, to see what it was capable of in terms of
querying these databases. So I apologise for the vagueness. Since I
haven't managed to get very far by myself then it's difficult to be
more specific, but I'd really appreciate it if anyone could offer some
advice, even just to point me in the direction of a useful package
which might have gone unnoticed by me.
Many thanks,
Fiona
Dr Fiona C Ingleby
Postdoctoral Research Fellow
University of Sussex
Email: F.Ingleby at sussex.ac.uk
Website: fionaingleby.weebly.com
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.14.0 affy_1.36.1 Biobase_2.18.0
BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] affyio_1.26.0 BiocInstaller_1.8.3 grid_2.15.2
lattice_0.20-14 Matrix_1.0-11 MCMCglmm_2.17
[7] preprocessCore_1.20.0 RCurl_1.95-4.1 tools_2.15.2
XML_3.95-0.2 zlibbioc_1.4.0
[[alternative HTML version deleted]]
------------------------------
Message: 11
Date: Thu, 28 Mar 2013 12:43:14 -0400
From: "James W. MacDonald" <jmacdon@uw.edu>
To: Fiona Ingleby <fiona.ingleby at="" gmail.com="">
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: Re: [BioC] identifying drosophila miRNA targets
Message-ID: <51547322.6020803 at uw.edu>
Content-Type: text/plain; charset=windows-1252; format=flowed
Hi Fiona,
I have a function called mirna2mrna (yeah, I know, lame function
name...) in my affycoretools package that does this, based on the
sanger
microcosm targets data that you can download here:
http://www.ebi.ac.uk/enright-srv/microcosm/cgi-
bin/targets/v5/download.pl
there is also a function makeHmap() that will create a heatmap with
the
miRNA/mRNA pairs, where the color of the cells is based on the
correlation between the two RNA species (with the intent to show
negative correlations, indicating that the miRNA is hypothetically
causing premature degradation of the mRNA).
I think the help pages for these two functions are reasonable, but
please let me know if you have any questions.
Best,
Jim
On 3/28/2013 12:30 PM, Fiona Ingleby wrote:
> Hi everyone,
>
> I am working with mRNA data from Affy 'drosophila2' arrays and miRNA
data from Affy 'mirna3' arrays. I have identified a list of
differentially expressed mRNAs and miRNAs. I'm having a bit of trouble
with some downstream analyses and I'm hoping someone might be able to
offer some help.
>
> I would like to use my list of differentially expressed miRNAs to
access online databases (e.g. miRBase, microRNA.org?) and extract the
names of all the potential target mRNAs. Then I'd like to use this
list of mRNAs to look through my mRNA expression data. I'm aware of
packages like 'RmiR' and 'microRNA' which have built-in functions for
finding miRNA targets, but as far as I can tell, 'RmiR' uses miRNA
databases for humans only and 'microRNA' works with human and mouse
data only. So is there a package I am unaware of (or another
application of 'RmiR'/'microRNA' that I am unaware of) for looking at
drosophila data?
>
> So far I have also considered the 'biomaRt' package to see if the
database query function on there can help me, but I haven't had much
luck. For instance, if I try an example list of miRNAs:
>
> mirna<-c("dme-miR-1002","dme-miR-312","dme-miR-973")
> library(biomaRt)
> ensembl<-useMart("ensembl",dataset="dmelanogaster_gene_ensembl")
> getBM(attributes="mirbase_accession",filters="mirbase_id",values=mir
na,mart=ensembl)
>
> then 'logical(0)' is returned, as if there are no records for those
miRNAs - but by searching the database manually I know the records are
there.
>
> Alternatively I can try:
>
> miRNA<- getBM(c("mirbase_accession","mirbase_id", "ensembl_gene_id",
"start_position", "chromosome_name"), filters = c("with_mirbase"),
values = list(T), mart = ensembl)
>
> which returns a table of various bits of information on miRNAs, but
I cannot adapt this command to just look at my list of miRNAs of
interest (ie. the 'mirna' vector above). I've included the
sessionInfo() output for these at the bottom of the email, but I
suspect my problem is more to do with the fact I'm not going about
this the right way (as opposed to a problem with package versions and
coding etc.). I'm not even sure that using 'biomaRt' will give me the
information I eventually want (the target mRNAs of these miRNAs), I
was just trying it out, to see what it was capable of in terms of
querying these databases. So I apologise for the vagueness. Since I
haven't managed to get very far by myself then it's difficult to be
more specific, but I'd really appreciate it if anyone could offer some
advice, even just to point me in the direction of a useful package
which might have gone unnoticed by me.
>
> Many thanks,
>
> Fiona
>
> Dr Fiona C Ingleby
> Postdoctoral Research Fellow
> University of Sussex
> Email: F.Ingleby at sussex.ac.uk
> Website: fionaingleby.weebly.com
>
>
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] biomaRt_2.14.0 affy_1.36.1 Biobase_2.18.0
BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.26.0 BiocInstaller_1.8.3 grid_2.15.2
lattice_0.20-14 Matrix_1.0-11 MCMCglmm_2.17
> [7] preprocessCore_1.20.0 RCurl_1.95-4.1 tools_2.15.2
XML_3.95-0.2 zlibbioc_1.4.0
> [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
------------------------------
Message: 12
Date: Thu, 28 Mar 2013 16:58:21 +0000
From: "Blanchette, Marco" <mab@stowers.org>
To: Kasper Daniel Hansen <kasperdanielhansen at="" gmail.com="">
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: Re: [BioC] Limit on number of sequence files for forging a
BSgenome
Message-ID: <1CE7C98DF956594C98E0956F4747178136D97D at
MBSRV02.sgc.loc>
Content-Type: text/plain; charset="us-ascii"
Kasper,
I see your line of thought, is there a particular fasta file causing
forgeBSgenomeDataPkg() to break?
The answer is no. Once I reach a certain number of fasta files, adding
one
more contig breaks the function. For instance, taking the first 454
contigs of C. brenneri breaks while removing the last or the first
fasta
file from the list (keeping only 453) compile without a problem
(neither
the last or the first fasta files are responsible for breaking the
function, the number of file is the trigger)
What's even more puzzling is that the number that breaks is not a
fixed
number. Selecting a random selection of contigs or changing genome
will
change the number that triggers the function to break... However it's
always around 440 files, which might be due to the size of the fasta
files
being all of very similar sizes.
Any clues?
-- Marco Blanchette, Ph.D.
Stowers Institute for Medical Research
1000 East 50th Street
Kansas City MO 64110
www.stowers.org
Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018
On 3/27/13 8:22 PM, "Kasper Daniel Hansen" <kasperdanielhansen at="" gmail.com="">
wrote:
>Marco,
>
>You are probably right in diagnosing the problem, but sometimes I
>think I have seen FASTA files with the entire sequence on a single
>line, instead of (say) 80 nucleotides and then a newline. I could
>believe that a really long contig on a single line without a newline,
>could cause an error like this. You could quickly check if there is a
>suspicious file by
> wc -l *
>and look for files with #lines like 2-3. Somehow 460 seems a weird
>number to fail at.
>
>This may not be your problem, and I am sure Herve will respond in due
>time.
>
>Best,
>Kasper
>
>On Wed, Mar 27, 2013 at 4:28 PM, Blanchette, Marco <mab at="" stowers.org="">
>wrote:
>> Hi,
>>
>> Is there a maximum number of sequence files (chromosomes or contigs
in
>>my case) that can be fed to the forgeBSgenomeDataPkg() function? I
am
>>trying to build a BSgenome for C. brenneri and C. japonica available
>>from EnsemblGenomes. These genomes are made from thousands of
contigs
>>with genes annotated to them. Currently, I get the following error
when
>>running "Error: Line longer than buffer size" when running on the
full
>>set of contigs. However, it works fine on a seed file containing a
>>subset of the contigs (I can forge a genome with 450 contigs but not
>>with 460!)
>>
>> Any suggestions will be appreciated (I can provide a toy example
but I
>>am not sure what would be the merit of it at this point)
>>
>> Thanks
>>
>> -- Marco Blanchette, Ph.D.
>> Stowers Institute for Medical Research
>> 1000 East 50th Street
>> Kansas City MO 64110
>> www.stowers.org
>>
>> Tel: 816-926-4071
>> Cell: 816-726-8419
>> Fax: 816-926-2018
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 13
Date: Thu, 28 Mar 2013 19:07:19 +0000
From: "Shields, Rusty (IMS)" <shieldsr@imsweb.com>
To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: Re: [BioC] [R] Error in setMethod("combine"... was - Error
when installing globaltest package
Message-ID:
<e2e839b5bfa711438b481ad9906b0c340a6a70a8 at="" varuna.omni.imsweb.com="">
Content-Type: text/plain; charset="iso-8859-1"
Thanks Martin,
No apologies necessary, just happy to have some help.
Here's the info on Biobase:
> packageDescription("Biobase")
Package: Biobase
Title: Biobase: Base functions for Bioconductor
Version: 2.18.0
Author: R. Gentleman, V. Carey, M. Morgan, S. Falcon
Description: Functions that are needed by many other packages or which
replace R functions.
Suggests: tools, tkWidgets, ALL
Depends: R (>= 2.10), BiocGenerics (>= 0.3.2), utils
Imports: methods, BiocGenerics
Maintainer: Bioconductor Package Maintainer
<maintainer at="" bioconductor.org="">
License: Artistic-2.0
Collate: tools.R strings.R environment.R vignettes.R packages.R
AllGenerics.R VersionsClass.R VersionedClasses.R
methods-VersionsNull.R methods-VersionedClass.R DataClasses.R
methods-aggregator.R methods-container.R methods-MIAxE.R
methods-MIAME.R methods-AssayData.R
methods-AnnotatedDataFrame.R methods-eSet.R
methods-ExpressionSet.R methods-MultiSet.R methods-SnpSet.R
methods-NChannelSet.R anyMissing.R rowOp-methods.R
updateObjectTo.R methods-ScalarObject.R zzz.R
LazyLoad: yes
biocViews: Infrastructure, Bioinformatics
Packaged: 2012-10-02 02:41:50 UTC; biocbuild
Built: R 2.14.0; x86_64-unknown-linux-gnu; 2013-03-22 15:01:20 UTC;
unix
-- File: /usr/local/R-2.14.0/lib64/R/library/Biobase/Meta/package.rds
-----Original Message-----
From: Martin Morgan [mailto:mtmorgan@fhcrc.org]
Sent: Thursday, March 28, 2013 2:52 PM
To: Shields, Rusty (IMS)
Cc: r-help at r-project.org
Subject: Re: [R] Error in setMethod("combine"... was - Error when
installing globaltest package
On 3/28/2013 11:05 AM, Shields, Rusty (IMS) wrote:
> Hi All,
>
> I posted this on the bioconductor list and didn't get a response
there, so I'm hoping someone here can help.
>
> I don't know a heck of a lot about R, so I apologize if this seems
like a trivial issue. This error comes up when trying to install the
bioconductor globaltest package.
>
Sorry that you didn't get a response on the Bioc mailing list; I'd
actually
suggest returning to the original thread there and I'll see that it
gets answered.
My guess is that you have a version of Biobase that is too new
compared to the
version (2.14.0) expected in the version of Bioconductor you are
using. What
does packageDescription("Biobase") say?
Martin
> Any clues?
>
> Thanks!
> Rusty
>
> -----Original Message-----
> From: bioconductor-bounces at r-project.org [mailto:bioconductor-
bounces at r-project.org] On Behalf Of Shields, Rusty (IMS)
> Sent: Tuesday, March 26, 2013 12:34 PM
> To: bioconductor at r-project.org
> Subject: [BioC] Error when installing globaltest package
>
> Hi all,
>
> I've run into a problem when attempting to install the globaltest
package from Bioconductor. I'm using R 2.14.0 on 64bit SLES 11. Let
me know what other information you might need about my system to
troubleshoot this.
>
> Using the method described for this installation on the Bioconductor
website:
>
> source("http://bioconductor.org/biocLite.R")
> biocLite("globaltest")
>
> I get and the following result, which I can't find an reference to
on the list archives:
>
>> biocLite("globaltest")
> BioC_mirror: 'http://www.bioconductor.org'
> Using R version 2.14, BiocInstaller version 1.2.1.
> Installing package(s) 'globaltest'
> trying URL 'http://www.bioconductor.org/packages/2.9/bioc/src/contri
b/globaltest_5.8.1.tar.gz'
> Content type 'application/x-gzip' length 956376 bytes (933 Kb)
> opened URL
> ==================================================
> downloaded 933 Kb
>
> * installing *source* package ?globaltest? ...
> ** R
> ** data
> ** inst
> ** preparing package for lazy loading
> Warning in .simpleDuplicateClass(def, prev) :
> A specification for class ?data.frameOrNULL?
> Creating a generic
function for ?sort? from package ?base? in package ?globaltest?
> Creating a generic function for ?model.matrix? from package ?stats?
in package ?globaltest?
> Creating a generic function for ?coefficients? from package ?stats?
in package ?globaltest?
> Creating a generic function for ?fitted.values? from package ?stats?
in package ?globaltest?
> Creating a generic function for ?residuals? from package ?stats? in
package ?globaltest?
> Error in setMethod("combine", signature(x = "gt.result", y =
"gt.result"), :
> no existing definition for function ?combine?
> Error : unable to load R code in package ?globaltest?
> ERROR: lazy loading failed for package ?globaltest?
> * removing ?/usr/local/R-2.14.0/lib64/R/library/globaltest?
>
> The downloaded packages are in
> ?/tmp/RtmpdCwjNB/downloaded_packages?
> Updating HTML index of packages in '.Library'
> Making packages.html ... done
> Old packages: 'caret'
> Update all/some/none? [a/s/n]:
>
> ________________________________
>
> Information in this e-mail may be confidential. It is intended only
for the addressee(s) identified above. If you are not the
addressee(s), or an employee or agent of the addressee(s), please note
that any dissemination, distribution, or copying of this communication
is strictly prohibited. If you have received this e-mail in error,
please notify the sender of the error.
>
> [[alternative HTML version deleted]]
>
>
> ________________________________
>
> Information in this e-mail may be confidential. It is intended only
for the addressee(s) identified above. If you are not the
addressee(s), or an employee or agent of the addressee(s), please note
that any dissemination, distribution, or copying of this communication
is strictly prohibited. If you have received this e-mail in error,
please notify the sender of the error.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
________________________________
Information in this e-mail may be confidential. It is intended only
for the addressee(s) identified above. If you are not the
addressee(s), or an employee or agent of the addressee(s), please note
that any dissemination, distribution, or copying of this communication
is strictly prohibited. If you have received this e-mail in error,
please notify the sender of the error.
------------------------------
Message: 14
Date: Thu, 28 Mar 2013 12:26:11 -0700
From: Martin Morgan <mtmorgan@fhcrc.org>
To: "Shields, Rusty (IMS)" <shieldsr at="" imsweb.com="">
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: Re: [BioC] [R] Error in setMethod("combine"... was - Error
when installing globaltest package
Message-ID: <51549953.6080003 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 03/28/2013 12:07 PM, Shields, Rusty (IMS) wrote:
> Thanks Martin,
>
> No apologies necessary, just happy to have some help.
>
> Here's the info on Biobase:
>
>> packageDescription("Biobase")
> Package: Biobase
> Title: Biobase: Base functions for Bioconductor
> Version: 2.18.0
yes, this is the version of Biobase from the current version of R /
Bioconductor, whereas you're using a relatively old R / Biocondcutor
(R 2.14,
Bioc 2.9, from the 'Previous versions' box at
http://bioconductor.org/help/)
where Biobase is at version 2.14. This doesn't bode well, as other
packages are
also likely at incorrect versions.
The easiest solution is to upgrade to a recent R, and start again with
biocLite('globaltest').
You could also try biocLite("Biobase") to get the correct version,
then
biocLite("globaltest"), but if other packages are also installed
incorrectly...
Something like
source("http://bioconductor.org/biocLite.R")
avail =
available.packages(contriburl=contrib.url(biocinstallRepos()))
inst = installed.packages(priority="NA")
idx = rownames(avail) %in% rownames(inst)
vers = avail[idx, "Version"]
vers[vers < inst[names(vers), "Version"]]
will give you the too-new packages that need to be re-installed.
It's hard to know how your system got to its current state; one
candidate is
that .libPaths() includes a path shared by several versions of R. but
using
biocLite() to install packages should help avoiding that in the
future.
Martin
> Author: R. Gentleman, V. Carey, M. Morgan, S. Falcon
> Description: Functions that are needed by many other packages or
which
> replace R functions.
> Suggests: tools, tkWidgets, ALL
> Depends: R (>= 2.10), BiocGenerics (>= 0.3.2), utils
> Imports: methods, BiocGenerics
> Maintainer: Bioconductor Package Maintainer
> <maintainer at="" bioconductor.org="">
> License: Artistic-2.0
> Collate: tools.R strings.R environment.R vignettes.R packages.R
> AllGenerics.R VersionsClass.R VersionedClasses.R
> methods-VersionsNull.R methods-VersionedClass.R
DataClasses.R
> methods-aggregator.R methods-container.R methods-MIAxE.R
> methods-MIAME.R methods-AssayData.R
> methods-AnnotatedDataFrame.R methods-eSet.R
> methods-ExpressionSet.R methods-MultiSet.R methods-SnpSet.R
> methods-NChannelSet.R anyMissing.R rowOp-methods.R
> updateObjectTo.R methods-ScalarObject.R zzz.R
> LazyLoad: yes
> biocViews: Infrastructure, Bioinformatics
> Packaged: 2012-10-02 02:41:50 UTC; biocbuild
> Built: R 2.14.0; x86_64-unknown-linux-gnu; 2013-03-22 15:01:20 UTC;
> unix
>
> -- File:
/usr/local/R-2.14.0/lib64/R/library/Biobase/Meta/package.rds
>
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Thursday, March 28, 2013 2:52 PM
> To: Shields, Rusty (IMS)
> Cc: r-help at r-project.org
> Subject: Re: [R] Error in setMethod("combine"... was - Error when
installing globaltest package
>
> On 3/28/2013 11:05 AM, Shields, Rusty (IMS) wrote:
>> Hi All,
>>
>> I posted this on the bioconductor list and didn't get a response
there, so I'm hoping someone here can help.
>>
>> I don't know a heck of a lot about R, so I apologize if this seems
like a trivial issue. This error comes up when trying to install the
bioconductor globaltest package.
>>
>
> Sorry that you didn't get a response on the Bioc mailing list; I'd
actually
> suggest returning to the original thread there and I'll see that it
gets answered.
>
> My guess is that you have a version of Biobase that is too new
compared to the
> version (2.14.0) expected in the version of Bioconductor you are
using. What
> does packageDescription("Biobase") say?
>
> Martin
>
>
>
>
>> Any clues?
>>
>> Thanks!
>> Rusty
>>
>> -----Original Message-----
>> From: bioconductor-bounces at r-project.org [mailto:bioconductor-
bounces at r-project.org] On Behalf Of Shields, Rusty (IMS)
>> Sent: Tuesday, March 26, 2013 12:34 PM
>> To: bioconductor at r-project.org
>> Subject: [BioC] Error when installing globaltest package
>>
>> Hi all,
>>
>> I've run into a problem when attempting to install the globaltest
package from Bioconductor. I'm using R 2.14.0 on 64bit SLES 11. Let
me know what other information you might need about my system to
troubleshoot this.
>>
>> Using the method described for this installation on the
Bioconductor website:
>>
>> source("http://bioconductor.org/biocLite.R")
>> biocLite("globaltest")
>>
>> I get and the following result, which I can't find an reference to
on the list archives:
>>
>>> biocLite("globaltest")
>> BioC_mirror: 'http://www.bioconductor.org'
>> Using R version 2.14, BiocInstaller version 1.2.1.
>> Installing package(s) 'globaltest'
>> trying URL 'http://www.bioconductor.org/packages/2.9/bioc/src/contr
ib/globaltest_5.8.1.tar.gz'
>> Content type 'application/x-gzip' length 956376 bytes (933 Kb)
>> opened URL
>> ==================================================
>> downloaded 933 Kb
>>
>> * installing *source* package ?globaltest? ...
>> ** R
>> ** data
>> ** inst
>> ** preparing package for lazy loading
>> Warning in .simpleDuplicateClass(def, prev) :
>> A specification for class ?data.frameOrNULL?
>> Creating a generic
function for ?sort? from package ?base? in package ?globaltest?
>> Creating a generic function for ?model.matrix? from package ?stats?
in package ?globaltest?
>> Creating a generic function for ?coefficients? from package ?stats?
in package ?globaltest?
>> Creating a generic function for ?fitted.values? from package
?stats? in package ?globaltest?
>> Creating a generic function for ?residuals? from package ?stats? in
package ?globaltest?
>> Error in setMethod("combine", signature(x = "gt.result", y =
"gt.result"), :
>> no existing definition for function ?combine?
>> Error : unable to load R code in package ?globaltest?
>> ERROR: lazy loading failed for package ?globaltest?
>> * removing ?/usr/local/R-2.14.0/lib64/R/library/globaltest?
>>
>> The downloaded packages are in
>> ?/tmp/RtmpdCwjNB/downloaded_packages?
>> Updating HTML index of packages in '.Library'
>> Making packages.html ... done
>> Old packages: 'caret'
>> Update all/some/none? [a/s/n]:
>>
>> ________________________________
>>
>> Information in this e-mail may be confidential. It is intended only
for the addressee(s) identified above. If you are not the
addressee(s), or an employee or agent of the addressee(s), please note
that any dissemination, distribution, or copying of this communication
is strictly prohibited. If you have received this e-mail in error,
please notify the sender of the error.
>>
>> [[alternative HTML version deleted]]
>>
>>
>> ________________________________
>>
>> Information in this e-mail may be confidential. It is intended only
for the addressee(s) identified above. If you are not the
addressee(s), or an employee or agent of the addressee(s), please note
that any dissemination, distribution, or copying of this communication
is strictly prohibited. If you have received this e-mail in error,
please notify the sender of the error.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Dr. Martin Morgan, PhD
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> ________________________________
>
> Information in this e-mail may be confidential. It is intended only
for the addressee(s) identified above. If you are not the
addressee(s), or an employee or agent of the addressee(s), please note
that any dissemination, distribution, or copying of this communication
is strictly prohibited. If you have received this e-mail in error,
please notify the sender of the error.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
------------------------------
Message: 15
Date: Thu, 28 Mar 2013 20:00:24 +0000
From: "Shields, Rusty (IMS)" <shieldsr@imsweb.com>
To: Martin Morgan <mtmorgan at="" fhcrc.org="">
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: Re: [BioC] [R] Error in setMethod("combine"... was - Error
when installing globaltest package
Message-ID:
<e2e839b5bfa711438b481ad9906b0c340a6a78ad at="" varuna.omni.imsweb.com="">
Content-Type: text/plain; charset="iso-8859-1"
Correcting the version of Biobase seems to have fixed it, and after
that it doesn't appear that anything else is of the wrong version -
based on the code that you provided below.
globaltest now installs successfully through BiocLite.
As for how the wrong version of Biobase was installed, I have to take
the blame for that. Rather than following the instructions on the
bioconductor website to install using BiocLite, I made the mistake of
downloading the package source manually and installing via 'R CMD
INSTALL ...'. I believe I used the biocLite method to install all of
the other packages. It has been a very educational few days.
Thanks again.
-----Original Message-----
From: Martin Morgan [mailto:mtmorgan@fhcrc.org]
Sent: Thursday, March 28, 2013 3:26 PM
To: Shields, Rusty (IMS)
Cc: bioconductor at r-project.org
Subject: Re: [BioC] [R] Error in setMethod("combine"... was - Error
when installing globaltest package
On 03/28/2013 12:07 PM, Shields, Rusty (IMS) wrote:
> Thanks Martin,
>
> No apologies necessary, just happy to have some help.
>
> Here's the info on Biobase:
>
>> packageDescription("Biobase")
> Package: Biobase
> Title: Biobase: Base functions for Bioconductor
> Version: 2.18.0
yes, this is the version of Biobase from the current version of R /
Bioconductor, whereas you're using a relatively old R / Biocondcutor
(R 2.14,
Bioc 2.9, from the 'Previous versions' box at
http://bioconductor.org/help/)
where Biobase is at version 2.14. This doesn't bode well, as other
packages are
also likely at incorrect versions.
The easiest solution is to upgrade to a recent R, and start again with
biocLite('globaltest').
You could also try biocLite("Biobase") to get the correct version,
then
biocLite("globaltest"), but if other packages are also installed
incorrectly...
Something like
source("http://bioconductor.org/biocLite.R")
avail =
available.packages(contriburl=contrib.url(biocinstallRepos()))
inst = installed.packages(priority="NA")
idx = rownames(avail) %in% rownames(inst)
vers = avail[idx, "Version"]
vers[vers < inst[names(vers), "Version"]]
will give you the too-new packages that need to be re-installed.
It's hard to know how your system got to its current state; one
candidate is
that .libPaths() includes a path shared by several versions of R. but
using
biocLite() to install packages should help avoiding that in the
future.
Martin
> Author: R. Gentleman, V. Carey, M. Morgan, S. Falcon
> Description: Functions that are needed by many other packages or
which
> replace R functions.
> Suggests: tools, tkWidgets, ALL
> Depends: R (>= 2.10), BiocGenerics (>= 0.3.2), utils
> Imports: methods, BiocGenerics
> Maintainer: Bioconductor Package Maintainer
> <maintainer at="" bioconductor.org="">
> License: Artistic-2.0
> Collate: tools.R strings.R environment.R vignettes.R packages.R
> AllGenerics.R VersionsClass.R VersionedClasses.R
> methods-VersionsNull.R methods-VersionedClass.R
DataClasses.R
> methods-aggregator.R methods-container.R methods-MIAxE.R
> methods-MIAME.R methods-AssayData.R
> methods-AnnotatedDataFrame.R methods-eSet.R
> methods-ExpressionSet.R methods-MultiSet.R methods-SnpSet.R
> methods-NChannelSet.R anyMissing.R rowOp-methods.R
> updateObjectTo.R methods-ScalarObject.R zzz.R
> LazyLoad: yes
> biocViews: Infrastructure, Bioinformatics
> Packaged: 2012-10-02 02:41:50 UTC; biocbuild
> Built: R 2.14.0; x86_64-unknown-linux-gnu; 2013-03-22 15:01:20 UTC;
> unix
>
> -- File:
/usr/local/R-2.14.0/lib64/R/library/Biobase/Meta/package.rds
>
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Thursday, March 28, 2013 2:52 PM
> To: Shields, Rusty (IMS)
> Cc: r-help at r-project.org
> Subject: Re: [R] Error in setMethod("combine"... was - Error when
installing globaltest package
>
> On 3/28/2013 11:05 AM, Shields, Rusty (IMS) wrote:
>> Hi All,
>>
>> I posted this on the bioconductor list and didn't get a response
there, so I'm hoping someone here can help.
>>
>> I don't know a heck of a lot about R, so I apologize if this seems
like a trivial issue. This error comes up when trying to install the
bioconductor globaltest package.
>>
>
> Sorry that you didn't get a response on the Bioc mailing list; I'd
actually
> suggest returning to the original thread there and I'll see that it
gets answered.
>
> My guess is that you have a version of Biobase that is too new
compared to the
> version (2.14.0) expected in the version of Bioconductor you are
using. What
> does packageDescription("Biobase") say?
>
> Martin
>
>
>
>
>> Any clues?
>>
>> Thanks!
>> Rusty
>>
>> -----Original Message-----
>> From: bioconductor-bounces at r-project.org [mailto:bioconductor-
bounces at r-project.org] On Behalf Of Shields, Rusty (IMS)
>> Sent: Tuesday, March 26, 2013 12:34 PM
>> To: bioconductor at r-project.org
>> Subject: [BioC] Error when installing globaltest package
>>
>> Hi all,
>>
>> I've run into a problem when attempting to install the globaltest
package from Bioconductor. I'm using R 2.14.0 on 64bit SLES 11. Let
me know what other information you might need about my system to
troubleshoot this.
>>
>> Using the method described for this installation on the
Bioconductor website:
>>
>> source("http://bioconductor.org/biocLite.R")
>> biocLite("globaltest")
>>
>> I get and the following result, which I can't find an reference to
on the list archives:
>>
>>> biocLite("globaltest")
>> BioC_mirror: 'http://www.bioconductor.org'
>> Using R version 2.14, BiocInstaller version 1.2.1.
>> Installing package(s) 'globaltest'
>> trying URL 'http://www.bioconductor.org/packages/2.9/bioc/src/contr
ib/globaltest_5.8.1.tar.gz'
>> Content type 'application/x-gzip' length 956376 bytes (933 Kb)
>> opened URL
>> ==================================================
>> downloaded 933 Kb
>>
>> * installing *source* package ?globaltest? ...
>> ** R
>> ** data
>> ** inst
>> ** preparing package for lazy loading
>> Warning in .simpleDuplicateClass(def, prev) :
>> A specification for class ?data.frameOrNULL?
>> Creating a generic
function for ?sort? from package ?base? in package ?globaltest?
>> Creating a generic function for ?model.matrix? from package ?stats?
in package ?globaltest?
>> Creating a generic function for ?coefficients? from package ?stats?
in package ?globaltest?
>> Creating a generic function for ?fitted.values? from package
?stats? in package ?globaltest?
>> Creating a generic function for ?residuals? from package ?stats? in
package ?globaltest?
>> Error in setMethod("combine", signature(x = "gt.result", y =
"gt.result"), :
>> no existing definition for function ?combine?
>> Error : unable to load R code in package ?globaltest?
>> ERROR: lazy loading failed for package ?globaltest?
>> * removing ?/usr/local/R-2.14.0/lib64/R/library/globaltest?
>>
>> The downloaded packages are in
>> ?/tmp/RtmpdCwjNB/downloaded_packages?
>> Updating HTML index of packages in '.Library'
>> Making packages.html ... done
>> Old packages: 'caret'
>> Update all/some/none? [a/s/n]:
>>
>> ________________________________
>>
>> Information in this e-mail may be confidential. It is intended only
for the addressee(s) identified above. If you are not the
addressee(s), or an employee or agent of the addressee(s), please note
that any dissemination, distribution, or copying of this communication
is strictly prohibited. If you have received this e-mail in error,
please notify the sender of the error.
>>
>> [[alternative HTML version deleted]]
>>
>>
>> ________________________________
>>
>> Information in this e-mail may be confidential. It is intended only
for the addressee(s) identified above. If you are not the
addressee(s), or an employee or agent of the addressee(s), please note
that any dissemination, distribution, or copying of this communication
is strictly prohibited. If you have received this e-mail in error,
please notify the sender of the error.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Dr. Martin Morgan, PhD
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> ________________________________
>
> Information in this e-mail may be confidential. It is intended only
for the addressee(s) identified above. If you are not the
addressee(s), or an employee or agent of the addressee(s), please note
that any dissemination, distribution, or copying of this communication
is strictly prohibited. If you have received this e-mail in error,
please notify the sender of the error.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
________________________________
Information in this e-mail may be confidential. It is intended only
for the addressee(s) identified above. If you are not the
addressee(s), or an employee or agent of the addressee(s), please note
that any dissemination, distribution, or copying of this communication
is strictly prohibited. If you have received this e-mail in error,
please notify the sender of the error.
------------------------------
Message: 16
Date: Thu, 28 Mar 2013 22:19:57 +0100
From: Michael Love <michaelisaiahlove@gmail.com>
To: Michael Muratet <mmuratet at="" hudsonalpha.org="">
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Interpreting DESeq2 results
Message-ID:
<cadqzidv-qp+=nm_goxj2aso+2maevq_i_y0mfgu1vv3o8fzy1g at="" mail.gmail.com="">
Content-Type: text/plain
Hi Michael,
The baseMean column is not on the log scale; it is the mean of
normalized
counts for a gene. The intercept from the GLM is labelled intercept in
mcols(dse).
Mike
On Mar 28, 2013 5:00 PM, "Michael Muratet" <mmuratet at="" hudsonalpha.org=""> wrote:
> Greetings
>
> I have an experiment:
>
> > design(dse)
> ~ factor1 + factor2 + factor3
>
> where factor1 has two levels, factor2 has three levels and factor3
has
> three levels. I extract a gene of interest from the results for each
term
> (I've changed the indices to reflect the condition):
>
> > lapply(resultsNames(dse),function(u) results(dse,u)["gene_A",])
> [["Intercept"]]
> baseMean log2FoldChange pvalue FDR
> gene_A 1596.548 10.77485 3.309439e-216 7.025442e-216
> [["factor1_level2"]]
> baseMean log2FoldChange pvalue FDR
> gene_A 1596.548 0.3386776 0.1307309 0.3587438
> [["factor2_level2"]]
> baseMean log2FoldChange pvalue FDR
> gene_A 1596.548 -0.6882543 0.0613569 0.1007896
> [["factor2_level3"]]
> baseMean log2FoldChange pvalue FDR
> gene_A 1596.548 0.2393368 0.513216 0.6589575
> [["factor3_level2"]]
> baseMean log2FoldChange pvalue FDR
> gene_A 1596.548 0.1584153 0.6423634 0.8503163
> [["factor3_level3]]
> baseMean log2FoldChange pvalue FDR
> gene_A 1596.548 -1.627898 1.823141e-06 0.001409384
>
> I want to be sure I understand the output format. Is it true that
the
> coefficients (the vector beta) from the fit are the baseMean value
scaled
> by the log2FoldChange? Is the true intercept value
> 1596.548*2^10.77485=2797274.13?
>
> mcols() tells me that the baseMean term is calculated over "all
rows". The
> baseMean is different for different genes although it is the same
for each
> gene across all the conditions, I'm not seeing how the rows are
selected.
>
> Thanks
>
> Mike
>
> Michael Muratet, Ph.D.
> Senior Scientist
> HudsonAlpha Institute for Biotechnology
> mmuratet at hudsonalpha.org
> (256) 327-0473 (p)
> (256) 327-0966 (f)
>
> Room 4005
> 601 Genome Way
> Huntsville, Alabama 35806
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
------------------------------
Message: 17
Date: Thu, 28 Mar 2013 17:50:04 -0400
From: somnath bandyopadhyay <genome1976@hotmail.com>
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: Re: [BioC] Interpreting DESeq2 results
Message-ID: <bay175-w24d794c2988d81c509ef9fcdd20 at="" phx.gbl="">
Content-Type: text/plain
Will the LIMMA package work on sparse matrices?
Thanks,Som.
[[alternative HTML version deleted]]
------------------------------
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
End of Bioconductor Digest, Vol 121, Issue 29