Search
Question: Probe summerization of Human HT-12 V4 BeadChip arrays
0
11 months ago by
Seymoo0
Oslo
Seymoo0 wrote:

I am not familiar with Iluumina arrays I need some hints because I am trying to work with a data set from Human HT-12 V4 BeadChip array deposited at GEO : "GSE73255"

I am following to 2 approaches to get the data

library(GEOquery)

gse <- getGEO(filename=filenm)

head(exprs(gse))

explained in GEO:

gset <- getGEO("GSE73255", GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL6947", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
gset <- exprs(gset)


based on the pData(gset)$data_processing this file has been normalized with by Bioconductor (3.0) lumi pipeline with loess normalization , if I am not mistaken?! When I try to summarize the expression to have one probe per gene using beadarray as follow: library("illuminaHumanv4.db") library(beadarray) summaryData <- as(gse, "ExpressionSetIllumina") orsummaryData <- as(gset, "ExpressionSetIllumina") I get error Error in object@channelData[[1]] : subscript out of bounds in R What am I doing wrong at this stage?? I also like to know if I can use the RAW data and perform RMA normalization on this type of data? I appreciate if anyone could help me with the answer. ADD COMMENTlink modified 11 months ago by James W. MacDonald48k • written 11 months ago by Seymoo0 I can only say I use limma code for importing and normalising this type of array and then use the limma avereps function to average to genes, it is easy, I never got on well with beadarray for some reason. ADD REPLYlink written 11 months ago by chris86380 Thanks for the hints Chris! I have always been working with Affy arrays so I have not much of idea about the the beadarrays. But I am gonna look into what you have suggested. ADD REPLYlink modified 11 months ago • written 11 months ago by Seymoo0 What version of Bioconductor are you using? It seems fine for me on Bioconductor 3.6. library(GEOquery) gse <- getGEO("GSE33126")[[1]] eset <- as(gse, "ExpressionSetIllumina") sessionInfo() I wouldn't recommend averaging the probes for the same gene though. Some of the probes on these arrays can be badly annotated, so by averaging you can dilute the signal for the gene. If you really want one measurement for a gene, what I usually do is pick the probe with the highest variance. By converting the GEOquery object to a beadarray one, you get all the information about the probe annotation table(fData(eset)$PROBEQUALITY)


Hi @Mark,

I am using

BioC_mirror: https://bioconductor.org
Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.2 (2016-10-31)

I have not manage to solve the problem yet. I manage to download the data matrix with

gse <- getGEO("GSE73255", GSEMatrix = FALSE)

but

eset <- as(gse, "ExpressionSetIllumina")

gives previous error!
Would it be possible for you to try with GSE73255 instead? I also appreciate if you could explain how can I proceed to pick the probe with highest variance for each gene?

0
11 months ago by
United States
James W. MacDonald48k wrote:

These data are from Illumina arrays, so by definition you cannot run RMA! That algorithm is intended for Affymetrix arrays, not Illumina.

You can use getGEOSuppFiles to download the raw data, but those data are simply a file where they have summarized the beads to an average detection value, as well as the detection p-value, so you don't get the IDAT files, and you will have to figure out how to stuff those data into a useful container. Probably the easiest thing to do would be to extract the AVG_Signal columns and put into a limma EList object, and then normalize using a loess normalization.

If you don't know what all that means, you would be better off to find somebody local who can help, as this is a non-trivial exercise for a newcomer.