Question: Agi4x44PreProcess for Bioconductor version 3.0
gravatar for dimitri.leonid.lindenwald
2.4 years ago by

Hallo everybody!

I am trying to install Agi4x44PreProcess on a Windows-mashine using R version 3.1.2 and Bioconductor version 3.0 (BiocInstaller 1.16.1).

As I understand Agi4x44 depends on R (>= 2.10), yet I am unable to install it.
Error message reads: "package ‘Agi4x44PreProcess’ is not available (for R version 3.1.2)"

Is there a work-around known to implement the functions of this package? Or is downgrading R to Version 2.10 the only option available?

Thank you in advance,
D.L. Lindenwald

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by dimitri.leonid.lindenwald50
gravatar for James W. MacDonald
2.4 years ago by
United States
James W. MacDonald43k wrote:

The package you are looking for was removed from Bioconductor as of 2.14 (so the last version of Bioc to contain this package was two releases ago). Not sure why, but do note that the limma package does pretty much everything that Agi4x44PreProcess did, so you could just use limma instead.

ADD COMMENTlink written 2.4 years ago by James W. MacDonald43k

Mr. MacDonald, thank you very much for the quick response.

The function CV.rep.probes (allows to access the reproducibility) is something i only found in the Agi4x44PreProcess package... Are other procedures to compute the coefficient of variation from non-control probes available?

ADD REPLYlink modified 2.4 years ago by Gordon Smyth30k • written 2.4 years ago by dimitri.leonid.lindenwald50

I don't know of one, but it wouldn't be difficult to roll your own. Say we have an EList that we created using read.maimages(), and we called it dat.olf.

> ind <- dat.olf$genes$ControlType %in% 0
> z <- dat.olf$genes$ProbeName[ind]
> zz <- split(as.numeric(row.names(dat.olf$genes))[ind],z)
> zzz <- zz[sapply(zz, length) == 5]
> cvs <- lapply(zzz, function(x) apply(dat.olf$E[x,], 2, sd, na.rm = TRUE)/colMeans(dat.olf$E[x,]))
> cvs <-
ADD REPLYlink written 2.4 years ago by James W. MacDonald43k

Great! Thank you, you are very helpfull!
Being a beginner I will need some time to fully understand what you did here :)

Edit_01: especially concerning the following string
 zzz <- zz[sapply(zz, length) == 5]

It returns a named list with length of zero. Probably due to the fact that the line "sapply(zz, length) == 5" produces a  matrix consisting of probe names and logical values "False"

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by dimitri.leonid.lindenwald50

Right. I am using a different array than you are, so I should have made that a bit more robust to different arrays. For my array, the duplicated (non-control) probes are all repeated five times each, so that makes sense for me. However, your array is obviously different. If you do

table(sapply(zz, length))

You will get a named vector, where the names correspond to the number of repeats, and the values correspond to the number of times a given number of repeats were observed. So for my array, I get

> table(sapply(zz, length))

1        5
14356    50

Which means that I have 14,356 probes that are only found one time on the array, and 50 probes that are found 5 times each. To make my code more portable, I could have done

reps <- as.numeric(names(table(sapply(zz, length))))
reps <- reps[reps > 1]
zzz <- zz[sapply(zz, length) %in% reps]


ADD REPLYlink written 2.4 years ago by James W. MacDonald43k

Now I understand a bit more! Thank you alot for your time.

However the lapply-step cvs<-lapply(...) produced the error "dim(X) must have a positive length".I have looked for the source of the error and found out that the variable $E does not correspond to any data in my RG-List.

My intuitive approach was changing $E to $R for gProcessedSignal. This produces a data frame with my microarray and corresponding numerical value (median CV)

The confusing part is - my microarrays were printed multiple times with varying median CV's, batxches separated by strings like this:

can 1W G                                                              0.08405851
can 1W K                                                              0.07700873
can 2W G                                                              0.14134329
can 2W K                                                              0.13306410
can 3W G                                                              0.25818046
can 3W K                                                              0.18594718
cat 1W G                                                              0.18594324
cat 1W K                                                              0.09951136
cat 2W G                                                              0.29593127
cat 2W K                                                              0.30717095
cat 3W G                                                              0.45109317
cat 3W K                                                              0.09883565
Kon 1M G                                                              0.04542627
Kon 1W K                                                              0.34784651
Kon 2W G                                                              0.14887225
Kon 2W K                                                              0.04292610
Kon 3W G                                                              0.29127059
Kon 3W K                                                              0.08773394
can 1W G                                                               0.02561703
can 1W K                                                               0.03003305
can 2W G                                 

Was using $R wrong? Or does my result still make sense?

ADD REPLYlink written 2.4 years ago by dimitri.leonid.lindenwald50
gravatar for Gordon Smyth
2.4 years ago by
Gordon Smyth30k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth30k wrote:

Two points.

First, plots are better than numerical summaries, so (using limma and continuing James' example) try this:

plotMA(dat.olf, array=1, status=dat.olf$genes$ControlType)
plotMA(dat.olf, array=2, status=dat.olf$genes$ControlType)

and so on for all the arrays in your experiment.

Second, limma assesses reproducibility from the standard deviations of the normalized log2-intensities as part of the DE analysis. So reproducibility will be evaluated directly anyway.


ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Gordon Smyth30k

Mr. Smyth, thank you for the answer!

I intended to use reproducibility as a mean of identifying outliers/insufficient qualuty arrays prior to further analysis (time-course experiment).
Interpretation of MA-plots for this purpose appears a bit difficult to me: they are very visual, but lack the definitive criterium (like CV-value) to choose one over the other... Or am i wrong?

ADD REPLYlink written 2.4 years ago by dimitri.leonid.lindenwald50

It's not really necessary to remove low quality arrays from an analysis if you are using limma. Simply use arrayWeights() to get estimates of array-level weights, and feed those values into your call to lmFit(). This eliminates the need to make ad hoc decisions about which arrays to exclude.

And note that you shouldn't be using an RGList for this step, particularly if you are using Agilent 4x44 arrays (which are in my experience single color), so you should be reading in using read.maimages() with the argument green.only = TRUE.

I find it useful to do PCA plots of the arrays, and then append the array weights to the plot to see which arrays are being down-weighted. You can do this using the affycoretools package:

wts <- arrayWeights(dat.olf, design)


plotPCA(dat.olf$E, <other arguments>, addtext = round(wts, 2))


ADD REPLYlink written 2.4 years ago by James W. MacDonald43k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 131 users visited in the last hour