Question: select negative control probes with GEOquery
gravatar for
2.4 years ago by
goldberg.jm10 wrote:

Hi All,

Using GEOquery and MBCB, I would like to background correct raw data from an experiment on the GPL10558 platform. Normalized data and metadata were got using gset <- getGEO(accession,GSEMatrix =TRUE). Raw data were got using getGEOSuppFiles(accession,baseDir = base_directory) followed by not_normal <- read.table(*_non-normalized.txt.gz,header=TRUE,sep="\t").

For the GPL10558 platform, negative control probes are flagged 'permuted_negative' in the 'Symbol' column of the *.soft file.

Using GEOquery, how do I access the 'permuted_negative' information within the getGEO object to get a list of IDs (like ILMN_1343296) to identify negative control rows in the non-normalized data table?

I am hoping this question is so simple you won't need the code, but here it is anyway:

gset <- getGEO("GSE34404", GSEMatrix =TRUE)
getGEOSuppFiles('GSE34404',baseDir = getwd())
not_normal <- read.table('GSE34404_non-normalized.txt.gz',header=TRUE,sep="\t") 
row.names(not_normal) <- not_normal[,1]
not_normal <- not_normal[,-1]
not_normal <- as.matrix(not_normal)
raw_vals.mat <- apply(not_normal, 2, as.numeric)
row.names(raw_vals.mat) <- row.names(not_normal)  # table to be subsetted

Once I have the subsetted matrix it is straightforward to background correct using MBCB. I could do the required subsetting by reading the soft file and using basic R code, but my guess it that there is a super-convenient way to do this built into GEOquery.

Thank you,

Jon Goldberg


ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by goldberg.jm10
gravatar for Gordon Smyth
2.4 years ago by
Gordon Smyth34k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth34k wrote:

Unless I am missing something, you cannot get the negative control probe intensities for GSE34404 because the experimenters did not upload them to GEO in the first place. You will find that there is no row in the expression data for negative control probes like ILMN_1343296. This arises because Illumina's Genome Studio software exports the control probe profiles into a separate file to that of the regular probes, and experimenters do not usually upload the control probe file to GEO. GEO's upload system doesn't provide any straightforward way to do that.

As far as I know, there is only one way to get at the negative control probes, and this is to infer them from the detection p-values. You can do this as follows. I will assume that you have downloaded and unzipped the supplementary file containing the non-normalized intensity values:

x <- read.ilmn("GSE34404_non-normalized.txt")
y <- neqc(x)

This method implements neqc normalization, which is essentially the same as the "non-parametric" method of the mbcb package. See this article:

for a discussion of neqc and the non-parametric mbcb method.

The neqc() function infers what the negative control probe values would have been from Illumina's detection p-values. The detection p-values are read automatically by the read.ilmn() function and stored as part of the expression object. The background corrected and normalized expression values are stored in y$E, although you don't usually need to know this. See also case study 17.3 in the limma User's Guide.


ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Gordon Smyth34k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 123 users visited in the last hour