Question

stuck with affy/limma

0

Entering edit mode

Maxim ▴ 170

@maxim-3843

Last seen 11.4 years ago

Hi, I have a question concerning the analysis of some affymetrix chips. I downloaded some of the data from GEO GSE11324 (see below). In doing so I'm stuck after I identified the probesets with significant changes. I have problems in assigning probeset specific gene names as well as getting the genomic coordinates. Furthermore I have no clue how to deal with the fact, that most genes have different probesets with differential transcriptional outcomes. I did this based on the affy and limma manuals like: targets file: Name FileName Target 0h1 GSM286031.CEL control 0h2 GSM286032.CEL control 0h3 GSM286033.CEL control 3h1 GSM286034.CEL three 3h2 GSM286035.CEL three 3h3 GSM286036.CEL three 6h1 GSM286037.CEL six 6h2 GSM286038.CEL six 6h3 GSM286039.CEL six library(affy) library(limma) library(vsn) pd <- read.AnnotatedDataFrame("er_for_affy.txt", header = TRUE, row.names = 2) pData(pd) #### load a <- ReadAffy(filenames = rownames(pData(pd)), phenoData = pd, verbose = TRUE) #### normalize x <- expresso(a, bg.correct = FALSE, normalize.method = "vsn", normalize.param = list(subsample = 1000), pmcorrect.method = "pmonly", summary.method = "medianpolish") ### genes with highest variation library(hgu133plus2.db) rsd <- apply(exprs(x), 1, sd) sel <- order(rsd, decreasing = TRUE)[1:250] u<-mget(row.names(exprs(x)[sel,]),hgu133plus2SYMBOL) heatmap(exprs(x)[sel, ], labRow=u) ### in this case it works to extract the gene symbol ### limma contrasts design <- model.matrix(~ -1+factor(c(1,1,1,2,2,2,3,3,3))) colnames(design) <- c("control", "three", "six") fit <- lmFit(x, design) meanSdPlot(x) contrast.matrix <- makeContrasts(three-control, six-control, levels=design) fit2 <- contrasts.fit(fit, contrast.matrix) fit2 <- eBayes(fit2) #### top list topTable(fit2, coef=1, adjust="BH", number=20, sort.by="M") library(hgu133plus2.db) u<-mget(row.names(fit2),hgu133plus2SYMBOL) How can I produce a topTable result with according gene names, somehow I do not understand the genelist argument? Obviously I still do not really understand how to manipulate R-data objects. I tried: d<-topTable(fit2, coef=1, adjust="BH", number=6600, sort.by ="P",genelist=fit$genes) symbol<-mget(d[,1], hgu133plus2SYMBOL) cbind(d, symbol) but I get Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 6600, 1???? anyhow I feel this might be way more complicated than actually necessary. Next, I would like to produce a standard clustering of the "fold changes" observed within (averaged) contrasts 1 (three - control) and 2 (six - control) and a heatmap presentation of the results. How to extract for example all fold-changes of those genes with a p-value<0.001 in at least one of the two contrasts? The coordinates of the genes I seem to get with v<-mget(row.names(fit2),hgu133plus2CHRLOC) v<-mget(row.names(fit2),hgu133plus2CHRLOCEND) But again I do not know, how to implement it into my fit2 object or topTable results. Furthermore there are many probesets with multiple mappings, should these not be excluded from the analysis? Actually, in the end I'd like to get the corresponding genes' coordinates in a way, that the maximum region size is given, eg in case of genes with multiple TSSs and TESs - does this mean I have to write a little function, that checks the strand of a gene and then returns min and max for gene start and end? As mentioned above, I do not know how to deal with the fact, that many genes are represented with mutliple probesets, often with different fold changes - is there a general recipe to deal with this question? Furthermore there are many probesets with multiple mappings, should these not be excluded from the analysis? I know it's a lot of questions, so is there a general source of information, that may help me to overcome the hurdles? Maxim [[alternative HTML version deleted]]

Clustering affy limma Clustering affy limma • 1.3k views

ADD COMMENT • link 15.9 years ago Maxim ▴ 170