Entering edit mode
Maxim
▴
170
@maxim-3843
Last seen 10.2 years ago
Hi,
I have a question concerning the analysis of some affymetrix chips. I
downloaded some of the data from GEO GSE11324 (see below). In doing so
I'm
stuck after I identified the probesets with significant changes. I
have
problems in assigning probeset specific gene names as well as getting
the
genomic coordinates. Furthermore I have no clue how to deal with the
fact,
that most genes have different probesets with differential
transcriptional
outcomes.
I did this based on the affy and limma manuals like:
targets file:
Name FileName Target
0h1 GSM286031.CEL control
0h2 GSM286032.CEL control
0h3 GSM286033.CEL control
3h1 GSM286034.CEL three
3h2 GSM286035.CEL three
3h3 GSM286036.CEL three
6h1 GSM286037.CEL six
6h2 GSM286038.CEL six
6h3 GSM286039.CEL six
library(affy)
library(limma)
library(vsn)
pd <- read.AnnotatedDataFrame("er_for_affy.txt", header = TRUE,
row.names =
2)
pData(pd)
#### load
a <- ReadAffy(filenames = rownames(pData(pd)), phenoData = pd, verbose
=
TRUE)
#### normalize
x <- expresso(a, bg.correct = FALSE, normalize.method = "vsn",
normalize.param
= list(subsample = 1000), pmcorrect.method = "pmonly", summary.method
=
"medianpolish")
### genes with highest variation
library(hgu133plus2.db)
rsd <- apply(exprs(x), 1, sd)
sel <- order(rsd, decreasing = TRUE)[1:250]
u<-mget(row.names(exprs(x)[sel,]),hgu133plus2SYMBOL)
heatmap(exprs(x)[sel, ], labRow=u)
### in this case it works to extract the gene symbol
### limma contrasts
design <- model.matrix(~ -1+factor(c(1,1,1,2,2,2,3,3,3)))
colnames(design) <- c("control", "three", "six")
fit <- lmFit(x, design)
meanSdPlot(x)
contrast.matrix <- makeContrasts(three-control, six-control,
levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
#### top list
topTable(fit2, coef=1, adjust="BH", number=20, sort.by="M")
library(hgu133plus2.db)
u<-mget(row.names(fit2),hgu133plus2SYMBOL)
How can I produce a topTable result with according gene names, somehow
I do
not understand the genelist argument? Obviously I still do not really
understand how to manipulate R-data objects. I tried:
d<-topTable(fit2, coef=1, adjust="BH", number=6600, sort.by
="P",genelist=fit$genes)
symbol<-mget(d[,1], hgu133plus2SYMBOL)
cbind(d, symbol)
but I get
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 6600, 1????
anyhow I feel this might be way more complicated than actually
necessary.
Next, I would like to produce a standard clustering of the "fold
changes"
observed within (averaged) contrasts 1 (three - control) and 2 (six -
control) and a heatmap presentation of the results. How to extract for
example all fold-changes of those genes with a p-value<0.001 in at
least one
of the two contrasts?
The coordinates of the genes I seem to get with
v<-mget(row.names(fit2),hgu133plus2CHRLOC)
v<-mget(row.names(fit2),hgu133plus2CHRLOCEND)
But again I do not know, how to implement it into my fit2 object or
topTable
results. Furthermore there are many probesets with multiple mappings,
should
these not be excluded from the analysis?
Actually, in the end I'd like to get the corresponding genes'
coordinates in
a way, that the maximum region size is given, eg in case of genes with
multiple TSSs and TESs - does this mean I have to write a little
function,
that checks the strand of a gene and then returns min and max for gene
start
and end?
As mentioned above, I do not know how to deal with the fact, that many
genes
are represented with mutliple probesets, often with different fold
changes -
is there a general recipe to deal with this question? Furthermore
there are
many probesets with multiple mappings, should these not be excluded
from the
analysis?
I know it's a lot of questions, so is there a general source of
information,
that may help me to overcome the hurdles?
Maxim
[[alternative HTML version deleted]]