Search
Question: select negative control probes with GEOquery
1
gravatar for goldberg.jm
21 months ago by
goldberg.jm0 wrote:

Hi All,

Using GEOquery and MBCB, I would like to background correct raw data from an experiment on the GPL10558 platform. Normalized data and metadata were got using gset <- getGEO(accession,GSEMatrix =TRUE). Raw data were got using getGEOSuppFiles(accession,baseDir = base_directory) followed by not_normal <- read.table(*_non-normalized.txt.gz,header=TRUE,sep="\t").

For the GPL10558 platform, negative control probes are flagged 'permuted_negative' in the 'Symbol' column of the *.soft file.

Using GEOquery, how do I access the 'permuted_negative' information within the getGEO object to get a list of IDs (like ILMN_1343296) to identify negative control rows in the non-normalized data table?

I am hoping this question is so simple you won't need the code, but here it is anyway:

library(GEOquery)
gset <- getGEO("GSE34404", GSEMatrix =TRUE)
getGEOSuppFiles('GSE34404',baseDir = getwd())
not_normal <- read.table('GSE34404_non-normalized.txt.gz',header=TRUE,sep="\t") 
row.names(not_normal) <- not_normal[,1]
not_normal <- not_normal[,-1]
not_normal <- as.matrix(not_normal)
raw_vals.mat <- apply(not_normal, 2, as.numeric)
row.names(raw_vals.mat) <- row.names(not_normal)  # table to be subsetted

Once I have the subsetted matrix it is straightforward to background correct using MBCB. I could do the required subsetting by reading the soft file and using basic R code, but my guess it that there is a super-convenient way to do this built into GEOquery.

Thank you,

Jon Goldberg

 

ADD COMMENTlink modified 21 months ago • written 21 months ago by goldberg.jm0
3
gravatar for Gordon Smyth
21 months ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

Unless I am missing something, you cannot get the negative control probe intensities for GSE34404 because the experimenters did not upload them to GEO in the first place. You will find that there is no row in the expression data for negative control probes like ILMN_1343296. This arises because Illumina's Genome Studio software exports the control probe profiles into a separate file to that of the regular probes, and experimenters do not usually upload the control probe file to GEO. GEO's upload system doesn't provide any straightforward way to do that.

As far as I know, there is only one way to get at the negative control probes, and this is to infer them from the detection p-values. You can do this as follows. I will assume that you have downloaded and unzipped the supplementary file containing the non-normalized intensity values:

library(limma)
x <- read.ilmn("GSE34404_non-normalized.txt")
y <- neqc(x)

This method implements neqc normalization, which is essentially the same as the "non-parametric" method of the mbcb package. See this article:

https://nar.oxfordjournals.org/content/38/22/e204

for a discussion of neqc and the non-parametric mbcb method.

The neqc() function infers what the negative control probe values would have been from Illumina's detection p-values. The detection p-values are read automatically by the read.ilmn() function and stored as part of the expression object. The background corrected and normalized expression values are stored in y$E, although you don't usually need to know this. See also case study 17.3 in the limma User's Guide.

 

ADD COMMENTlink modified 21 months ago • written 21 months ago by Gordon Smyth32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 152 users visited in the last hour