Question

i need AGI code as my probset

0

Entering edit mode

Angel ▴ 40

@angel-7981

Last seen 7.1 years ago

Berlin

hi,

i have some CEL file and i did like below but i have X244901_at while i need AGI code even i used cdf

library(affy)
library(vsn)
library(limma)
library(altcdfenvs)
library(simpleaffy)
# listing the cel files
celFiles <- list.celfiles()
# assigning the cel files to affyraw variable
affyraw=ReadAffy(filenames = celFiles)
# making cdf file
tmp.env=make.cdf.env("ATH1121501_At_TAIRG.cdf")
# performing vsn normalization
vsn.data <- expresso(affyraw, normalize.method="vsn", bg.correct=F, pmcorrect.method="pmonly", summary.method="medianpolish")
# examining the normalization
boxplot(affyraw,col="red")
plot(exprs(affyraw)[,1:2], log = "xy", pch=".",
main="all")
# writing the result
write.table(vsn.data, file = "vsn1.txt", dec = ".", sep = "\t", quote = FALSE)

head(vsn.data[,1:2])
ExpressionSet (storageMode: lockedEnvironment)
assayData: 1 features, 2 samples
element names: exprs, se.exprs
protocolData
sampleNames: Col-0 24h primed.CEL.CEL Col-0 24h unprimed.CEL.CEL
varLabels: ScanDate
varMetadata: labelDescription
phenoData
sampleNames: Col-0 24h primed.CEL.CEL Col-0 24h unprimed.CEL.CEL
varLabels: sample
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: ath1121501
>

what is my fault please in the above code??? even i scared maybe i have written an incomplete normalized file

affy cdf probe • 1.8k views

ADD COMMENT • link 8.4 years ago • updated 8.2 years ago Angel ▴ 40

1

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 4 hours ago

United States

Well, you are sort of doing random things here. First off, you don't need to generate your own cdfenv - you can just get that from MBNI. We used to provide a way to do this via biocLite(), but I guess that went away.

> download.file("http://mbni.org/customcdf/20.0.0/tairg.download/ath1121501attairgcdf_20.0.0.tar.gz", "ath1121501attairgcdf_20.0.0.tar.gz")
trying URL 'http://mbni.org/customcdf/20.0.0/tairg.download/ath1121501attairgcdf_20.0.0.tar.gz'
Content type 'application/x-gzip' length 1573285 bytes (1.5 MB)
==================================================
downloaded 1.5 MB

> install.packages("ath1121501attairgcdf_20.0.0.tar.gz", repos=NULL, type="source")
* installing *source* package  ath1121501attairgcdf  ...
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (ath1121501attairgcdf)

then you can do

abatch <- ReadAffy(cdfname = "ath1121501attaircdf")

eset <- justvsn(abatch)

And if you really think you should export the data you can do

write.exprs(eset, "vsn1.txt")

But I would instead recommend you continue with your analysis inside of R rather than whatever you were planning to do with that file.

ADD COMMENT • link 8.4 years ago James W. MacDonald 65k

0

Entering edit mode

thank you,

my insisting on writing the data in a txt file is because i need the normalized file as an input for another tool for GRN inference anyway i did like below

> setwd("/usr/data/nfs6/izadi/Fereshteh thesis2/Data/Microarray/CEL files")

> download.file("http://mbni.org/customcdf/20.0.0/tairg.download/ath1121501attairgcdf_20.0.0.tar.gz", "ath1121501attairgcdf_20.0.0.tar.gz")
trying URL 'http://mbni.org/customcdf/20.0.0/tairg.download/ath1121501attairgcdf_20.0.0.tar.gz'
Content type 'application/x-gzip' length 1573285 bytes (1.5 MB)
==================================================
downloaded 1.5 MB

> install.packages("ath1121501attairgcdf_20.0.0.tar.gz", repos=NULL, type="source")
Installing package into ‘/usr/people/home/izadi/R/x86_64-redhat-linux-gnu-library/3.2’
(as ‘lib’ is unspecified)
* installing *source* package ‘ath1121501attairgcdf’ ...
** R
** data
** inst
** preparing package for lazy loading
Creating a generic function for ‘nchar’ from package ‘base’ in package ‘S4Vectors’
** help
*** installing help indices
  converting help for package ‘ath1121501attairgcdf’
    finding HTML links ... done
    ath1121501attairgcdf                    html  
    ath1121501attairgdim                    html  
    geometry                                html  
** building package indices
** testing if installed package can be loaded
Creating a generic function for ‘nchar’ from package ‘base’ in package ‘S4Vectors’
* DONE (ath1121501attairgcdf)



> library(affy)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

    xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, do.call, duplicated, eval, evalq, Filter, Find, get,
    intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist, unsplit

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")',
    and for packages 'citation("pkgname")'.

> library(vsn)


> celFiles <- list.celfiles()
> abatch <- ReadAffy(filenames = celFiles, cdfname = "ath1121501attaircdf")
> eset <- justvsn(abatch)
vsn2: 506944 x 164 matrix (1 stratum). Please use 'meanSdPlot' to verify the fit.
> boxplot(eset,col="red")
Error in getCdfInfo(object) : 
  Could not obtain CDF environment, problems encountered:
Specified environment does not contain ath1121501attaircdf
Library - package ath1121501attaircdf not installed
Bioconductor - ath1121501attaircdf not available

> write.table(eset, file = "eset.txt", dec = ".", sep = "\t", quote = FALSE)
Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class "structure("AffyBatch", package = "affy")" to a data.frame

i only need a vsn normalized file that the rownames are AGI code not _at

>

ADD REPLY • link 8.4 years ago Angel ▴ 40

score 1 · Accepted Answer · 2016-02-14

library (affy)

library (vsn)

Data<-ReadAffy()

eset <- expresso(Data, normalize.method="vsn", bg.correct=F, pmcorrect.method="pmonly", summary.method="medianpolish")

norm.data<-exprs(eset)

# The norm.data R object contains the normalized expression for every probeset in the ATH1 microarrays used in this example. In order to convert the probeset IDs to Arabidopsis gene identifiers, the fileftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2010-12-20.txt download from the TAIR database and place in the folder with the microarray data. In order to avoid ambiguous probeset associations (i.e. probesets that have multiple matches to genes), we only used probes that match only one gene in the Arabidopsis genome.
affy_names<-read.delim("affy_ATH1_array_elements-2010-12-20.txt",header=T)

# Select the columns that contain the probeset ID and corresponding AGI number. Please note that the positions used to index the matrix depend on the input format of the array elements file. You can change these numbers to index the corresponding columns if you are using a different format:
probe_agi<-as.matrix(affy_names[,c(1,5)])

# To associate the probeset with the corresponding AGI locus:
normalized.names<-merge(probe_agi,norm.data,by.x=1,by.y=0)[,-1]

# To remove probesets that do not match the Arabidopsis genome:
normalized.arabidopsis <-normalized.names[grep("AT",normalized.names[,1]),]

# To remove ambiguous probes:
normalized.arabidopsis.unambiguous<-normalized.arabidopsis[grep(pattern=";",normalized.arabidopsis[,1], invert=T),]

# In some cases, multiple probes match the same gene, due to updates in the annotation of the genome. To remove duplicated genes in the matrix:
normalized.agi.final<-normalized.arabidopsis.unambiguous[!duplicated(normalized.arabidopsis.unambiguous[,1]),]

# To assign the AGI number as row name:
rownames(normalized.agi.final)<-normalized.agi.final[,1]
normalized.agi.final<-normalized.agi.final[,-1]

#The resulting gene expression dataset contains unique row identifies (i.e. AGI locus), and different expression values obtained from different experiments on each column

# To export this data matrix from R to a tab-delimited file use the following command. The file will be written to the folder that you set up as your working directory in R using the setwd() command in line 1 above:
write.table (normalized.agi.final,"vsn.txt", sep="\t",col.names=NA,quote=F)