stuck with affy/ limma

0

Entering edit mode

Maxim ▴ 170

@maxim-3843

Last seen 10.2 years ago

Hi, I have a question concerning the analysis of some affymetrix chips. I downloaded some of the data from GEO GSE11324 (see below). In doing so I'm stuck after I identified the probesets with significant changes. I have problems in assigning probeset specific gene names as well as getting the genomic coordinates. Furthermore I have no clue how to deal with the fact, that most genes have different probesets with differential transcriptional outcomes. I did this based on the affy and limma manuals like: targets file: Name FileName Target 0h1 GSM286031.CEL control 0h2 GSM286032.CEL control 0h3 GSM286033.CEL control 3h1 GSM286034.CEL three 3h2 GSM286035.CEL three 3h3 GSM286036.CEL three 6h1 GSM286037.CEL six 6h2 GSM286038.CEL six 6h3 GSM286039.CEL six library(affy) library(limma) library(vsn) pd <- read.AnnotatedDataFrame("er_for_affy.txt", header = TRUE, row.names = 2) pData(pd) #### load a <- ReadAffy(filenames = rownames(pData(pd)), phenoData = pd, verbose = TRUE) #### normalize x <- expresso(a, bg.correct = FALSE, normalize.method = "vsn", normalize.param = list(subsample = 1000), pmcorrect.method = "pmonly", summary.method = "medianpolish") ### genes with highest variation library(hgu133plus2.db) rsd <- apply(exprs(x), 1, sd) sel <- order(rsd, decreasing = TRUE)[1:250] u<-mget(row.names(exprs(x)[sel,]),hgu133plus2SYMBOL) heatmap(exprs(x)[sel, ], labRow=u) ### in this case it works to extract the gene symbol ### limma contrasts design <- model.matrix(~ -1+factor(c(1,1,1,2,2,2,3,3,3))) colnames(design) <- c("control", "three", "six") fit <- lmFit(x, design) meanSdPlot(x) contrast.matrix <- makeContrasts(three-control, six-control, levels=design) fit2 <- contrasts.fit(fit, contrast.matrix) fit2 <- eBayes(fit2) #### top list topTable(fit2, coef=1, adjust="BH", number=20, sort.by="M") library(hgu133plus2.db) u<-mget(row.names(fit2),hgu133plus2SYMBOL) How can I produce a topTable result with according gene names, somehow I do not understand the genelist argument? Next, I would like to produce a standard clustering of the "fold changes" observed within (averaged) contrasts 1 (three - control) and 2 (six - control) and a heatmap presentation of the results. How to extract for example all fold-changes of those genes with a p-value<0.001 in at least one of the two contrasts? The coordinates of the genes I seem to get with v<-mget(row.names(fit2),hgu133plus2CHRLOC) v<-mget(row.names(fit2),hgu133plus2CHRLOCEND) But again I do not know, how to implement it into my fit2 object or topTable results. Furthermore there are many probesets with multiple mappings, should these not be excluded from the analysis? Actually, in the end I'd like to get the corresponding genes' coordinates in a way, that the maximum region size is given, eg in case of genes with multiple TSSs. As mentioned above, I do not know how to deal with the fact, that many genes are represented with mutliple probesets, often with different fold changes - is there a general recipe to deal with this question? Furthermore there are many probesets with multiple mappings, should these not be excluded from the analysis? I know it's a lot of questions, so is there a general source of information, that may help me to overcome the hurdles? Maxim [[alternative HTML version deleted]]

Clustering affy limma Clustering affy limma • 1.3k views

ADD COMMENT • link updated 14.7 years ago by James W. MacDonald 67k • written 14.7 years ago by Maxim ▴ 170

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Maxim wrote: > Hi, > > I have a question concerning the analysis of some affymetrix chips. I > downloaded some of the data from GEO GSE11324 (see below). In doing so I'm > stuck after I identified the probesets with significant changes. I have > problems in assigning probeset specific gene names as well as getting the > genomic coordinates. Furthermore I have no clue how to deal with the fact, > that most genes have different probesets with differential transcriptional > outcomes. > > > I did this based on the affy and limma manuals like: > > targets file: > Name FileName Target > 0h1 GSM286031.CEL control > 0h2 GSM286032.CEL control > 0h3 GSM286033.CEL control > 3h1 GSM286034.CEL three > 3h2 GSM286035.CEL three > 3h3 GSM286036.CEL three > 6h1 GSM286037.CEL six > 6h2 GSM286038.CEL six > 6h3 GSM286039.CEL six > > > library(affy) > library(limma) > library(vsn) > > pd <- read.AnnotatedDataFrame("er_for_affy.txt", header = TRUE, row.names = > 2) > pData(pd) > #### load > a <- ReadAffy(filenames = rownames(pData(pd)), phenoData = pd, verbose = > TRUE) > #### normalize > x <- expresso(a, bg.correct = FALSE, normalize.method = "vsn", > normalize.param > = list(subsample = 1000), pmcorrect.method = "pmonly", summary.method = > "medianpolish") > ### genes with highest variation > library(hgu133plus2.db) > rsd <- apply(exprs(x), 1, sd) > sel <- order(rsd, decreasing = TRUE)[1:250] > > > u<-mget(row.names(exprs(x)[sel,]),hgu133plus2SYMBOL) > heatmap(exprs(x)[sel, ], labRow=u) > ### in this case it works to extract the gene symbol > > > ### limma contrasts > design <- model.matrix(~ -1+factor(c(1,1,1,2,2,2,3,3,3))) > colnames(design) <- c("control", "three", "six") > fit <- lmFit(x, design) > meanSdPlot(x) > contrast.matrix <- makeContrasts(three-control, six-control, levels=design) > fit2 <- contrasts.fit(fit, contrast.matrix) > fit2 <- eBayes(fit2) > #### top list > topTable(fit2, coef=1, adjust="BH", number=20, sort.by="M") > library(hgu133plus2.db) > u<-mget(row.names(fit2),hgu133plus2SYMBOL) > > How can I produce a topTable result with according gene names, somehow I do > not understand the genelist argument? The genelist argument expects you to supply a vector of gene names (which is what it says in the help page for this function). genelist <- unlist(mget(featureNames(x), hgu133plus2SYMBOL)) topTable(fit2, coef = 1, number = 20, sort.by = "M", genelist = genelist) > > Next, I would like to produce a standard clustering of the "fold changes" > observed within (averaged) contrasts 1 (three - control) and 2 (six - > control) and a heatmap presentation of the results. How to extract for > example all fold-changes of those genes with a p-value<0.001 in at least one > of the two contrasts? rslt <- decideTests(fit2, p.value = 0.001) ind <- apply(rslt, 1, function(x) any(x != 0)) fit2$coef[ind,] > > The coordinates of the genes I seem to get with > v<-mget(row.names(fit2),hgu133plus2CHRLOC) > v<-mget(row.names(fit2),hgu133plus2CHRLOCEND) > But again I do not know, how to implement it into my fit2 object or topTable > results. Furthermore there are many probesets with multiple mappings, should > these not be excluded from the analysis? That's up to you. > > Actually, in the end I'd like to get the corresponding genes' coordinates in > a way, that the maximum region size is given, eg in case of genes with > multiple TSSs. > > As mentioned above, I do not know how to deal with the fact, that many genes > are represented with mutliple probesets, often with different fold changes - > is there a general recipe to deal with this question? Furthermore there are > many probesets with multiple mappings, should these not be excluded from the > analysis? I believe the BioC case studies cover some of these points. One method you might consider is to select the reporters with the largest value for whatever test statistic you are using. See the findLargest() function in the genefilter package. Another alternative is to use the MBNI remapped cdfs, which essentially removes these redundancies. Best, Jim > > I know it's a lot of questions, so is there a general source of information, > that may help me to overcome the hurdles? > > Maxim > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-5646 734-936-8662 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 14.7 years ago James W. MacDonald 67k

Login before adding your answer.