Question

select negative control probes with GEOquery

2

Entering edit mode

goldberg.jm ▴ 10

@goldbergjm-9751

Last seen 5.8 years ago

Hi All,

Using GEOquery and MBCB, I would like to background correct raw data from an experiment on the GPL10558 platform. Normalized data and metadata were got using gset <- getGEO(accession,GSEMatrix =TRUE). Raw data were got using getGEOSuppFiles(accession,baseDir = base_directory) followed by not_normal <- read.table(*_non-normalized.txt.gz,header=TRUE,sep="\t").

For the GPL10558 platform, negative control probes are flagged 'permuted_negative' in the 'Symbol' column of the *.soft file.

Using GEOquery, how do I access the 'permuted_negative' information within the getGEO object to get a list of IDs (like ILMN_1343296) to identify negative control rows in the non-normalized data table?

I am hoping this question is so simple you won't need the code, but here it is anyway:

library(GEOquery)
gset <- getGEO("GSE34404", GSEMatrix =TRUE)
getGEOSuppFiles('GSE34404',baseDir = getwd())
not_normal <- read.table('GSE34404_non-normalized.txt.gz',header=TRUE,sep="\t") 
row.names(not_normal) <- not_normal[,1]
not_normal <- not_normal[,-1]
not_normal <- as.matrix(not_normal)
raw_vals.mat <- apply(not_normal, 2, as.numeric)
row.names(raw_vals.mat) <- row.names(not_normal)  # table to be subsetted

Once I have the subsetted matrix it is straightforward to background correct using MBCB. I could do the required subsetting by reading the soft file and using basic R code, but my guess it that there is a super-convenient way to do this built into GEOquery.

Thank you,

Jon Goldberg

GEOquery MBCB limma • 1.6k views

ADD COMMENT • link 8.2 years ago goldberg.jm ▴ 10

score 4 · Accepted Answer · 2016-02-18

Unless I am missing something, you cannot get the negative control probe intensities for GSE34404 because the experimenters did not upload them to GEO in the first place. You will find that there is no row in the expression data for negative control probes like ILMN_1343296. This arises because Illumina's Genome Studio software exports the control probe profiles into a separate file to that of the regular probes, and experimenters do not usually upload the control probe file to GEO. GEO's upload system doesn't provide any straightforward way to do that.

As far as I know, there is only one way to get at the negative control probes, and this is to infer them from the detection p-values. You can do this as follows. I will assume that you have downloaded and unzipped the supplementary file containing the non-normalized intensity values:

library(limma)
x <- read.ilmn("GSE34404_non-normalized.txt")
y <- neqc(x)

This method implements neqc normalization, which is essentially the same as the "non-parametric" method of the mbcb package. See this article:

https://nar.oxfordjournals.org/content/38/22/e204

for a discussion of neqc and the non-parametric mbcb method.

The neqc() function infers what the negative control probe values would have been from Illumina's detection p-values. The detection p-values are read automatically by the read.ilmn() function and stored as part of the expression object. The background corrected and normalized expression values are stored in y$E, although you don't usually need to know this. See also case study 17.3 in the limma User's Guide.