Agi4x44PreProcess for Bioconductor version 3.0
2
0
Entering edit mode
@dimitrileonidlindenwald-7183
Last seen 9.1 years ago
Germany

Hallo everybody!

I am trying to install Agi4x44PreProcess on a Windows-mashine using R version 3.1.2 and Bioconductor version 3.0 (BiocInstaller 1.16.1).

As I understand Agi4x44 depends on R (>= 2.10), yet I am unable to install it.
Error message reads: "package ‘Agi4x44PreProcess’ is not available (for R version 3.1.2)"

Is there a work-around known to implement the functions of this package? Or is downgrading R to Version 2.10 the only option available?

Thank you in advance,
D.L. Lindenwald
 

Agi4x44PreProcess • 2.1k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 9 hours ago
United States

The package you are looking for was removed from Bioconductor as of 2.14 (so the last version of Bioc to contain this package was two releases ago). Not sure why, but do note that the limma package does pretty much everything that Agi4x44PreProcess did, so you could just use limma instead.

ADD COMMENT
0
Entering edit mode

Mr. MacDonald, thank you very much for the quick response.

The function CV.rep.probes (allows to access the reproducibility) is something i only found in the Agi4x44PreProcess package... Are other procedures to compute the coefficient of variation from non-control probes available?

ADD REPLY
2
Entering edit mode

I don't know of one, but it wouldn't be difficult to roll your own. Say we have an EList that we created using read.maimages(), and we called it dat.olf.

> ind <- dat.olf$genes$ControlType %in% 0
> z <- dat.olf$genes$ProbeName[ind]
> zz <- split(as.numeric(row.names(dat.olf$genes))[ind],z)
> zzz <- zz[sapply(zz, length) == 5]
> cvs <- lapply(zzz, function(x) apply(dat.olf$E[x,], 2, sd, na.rm = TRUE)/colMeans(dat.olf$E[x,]))
> cvs <- as.data.frame(cvs)
ADD REPLY
0
Entering edit mode

Great! Thank you, you are very helpfull!
Being a beginner I will need some time to fully understand what you did here :)


Edit_01: especially concerning the following string
 zzz <- zz[sapply(zz, length) == 5]

It returns a named list with length of zero. Probably due to the fact that the line "sapply(zz, length) == 5" produces a  matrix consisting of probe names and logical values "False"

ADD REPLY
0
Entering edit mode

Right. I am using a different array than you are, so I should have made that a bit more robust to different arrays. For my array, the duplicated (non-control) probes are all repeated five times each, so that makes sense for me. However, your array is obviously different. If you do

table(sapply(zz, length))

You will get a named vector, where the names correspond to the number of repeats, and the values correspond to the number of times a given number of repeats were observed. So for my array, I get

> table(sapply(zz, length))

1        5
14356    50

Which means that I have 14,356 probes that are only found one time on the array, and 50 probes that are found 5 times each. To make my code more portable, I could have done

reps <- as.numeric(names(table(sapply(zz, length))))
reps <- reps[reps > 1]
zzz <- zz[sapply(zz, length) %in% reps]

 

ADD REPLY
0
Entering edit mode

Now I understand a bit more! Thank you alot for your time.

However the lapply-step cvs<-lapply(...) produced the error "dim(X) must have a positive length".I have looked for the source of the error and found out that the variable $E does not correspond to any data in my RG-List.

My intuitive approach was changing $E to $R for gProcessedSignal. This produces a data frame with my microarray and corresponding numerical value (median CV)

The confusing part is - my microarrays were printed multiple times with varying median CV's, batxches separated by strings like this:

         structure.c.0.0840585083911646..0.0770087297194007..0.141343292808047..
can 1W G                                                              0.08405851
can 1W K                                                              0.07700873
can 2W G                                                              0.14134329
can 2W K                                                              0.13306410
can 3W G                                                              0.25818046
can 3W K                                                              0.18594718
cat 1W G                                                              0.18594324
cat 1W K                                                              0.09951136
cat 2W G                                                              0.29593127
cat 2W K                                                              0.30717095
cat 3W G                                                              0.45109317
cat 3W K                                                              0.09883565
Kon 1M G                                                              0.04542627
Kon 1W K                                                              0.34784651
Kon 2W G                                                              0.14887225
Kon 2W K                                                              0.04292610
Kon 3W G                                                              0.29127059
Kon 3W K                                                              0.08773394
         structure.c.0.0256170315904253..0.0300330532944272..0.0228013445289736..
can 1W G                                                               0.02561703
can 1W K                                                               0.03003305
can 2W G                                 

Was using $R wrong? Or does my result still make sense?

ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

Two points.

First, plots are better than numerical summaries, so (using limma and continuing James' example) try this:

plotMA(dat.olf, array=1, status=dat.olf$genes$ControlType)
plotMA(dat.olf, array=2, status=dat.olf$genes$ControlType)

and so on for all the arrays in your experiment.

Second, limma assesses reproducibility from the standard deviations of the normalized log2-intensities as part of the DE analysis. So reproducibility will be evaluated directly anyway.

 

ADD COMMENT
0
Entering edit mode

Mr. Smyth, thank you for the answer!

I intended to use reproducibility as a mean of identifying outliers/insufficient qualuty arrays prior to further analysis (time-course experiment).
Interpretation of MA-plots for this purpose appears a bit difficult to me: they are very visual, but lack the definitive criterium (like CV-value) to choose one over the other... Or am i wrong?

ADD REPLY
0
Entering edit mode

It's not really necessary to remove low quality arrays from an analysis if you are using limma. Simply use arrayWeights() to get estimates of array-level weights, and feed those values into your call to lmFit(). This eliminates the need to make ad hoc decisions about which arrays to exclude.

And note that you shouldn't be using an RGList for this step, particularly if you are using Agilent 4x44 arrays (which are in my experience single color), so you should be reading in using read.maimages() with the argument green.only = TRUE.

I find it useful to do PCA plots of the arrays, and then append the array weights to the plot to see which arrays are being down-weighted. You can do this using the affycoretools package:

wts <- arrayWeights(dat.olf, design)

library(affycoretools)

plotPCA(dat.olf$E, <other arguments>, addtext = round(wts, 2))


 

ADD REPLY

Login before adding your answer.

Traffic: 672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6