affy hugene 2.1 st

0

Entering edit mode

Dario Greco ▴ 310

@dario-greco-1536

Last seen 11.4 years ago

dear friends, we are starting a project using the new affymetrix human gene 2.1 st chips. i would like to know: 1) does anyone have yet any experience with them? any opinion/particular note analysing them? 2) what is the bioc roadmap for including the cdf/annotation packages for this? 3) what is the roadmap for the alternative cdf packages? thanks you so much for your kind reply. cheers dario

cdf cdf • 3.0k views

ADD COMMENT • link updated 13.5 years ago by Guest User ★ 13k • written 13.5 years ago by Dario Greco ▴ 310

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 8 hours ago

United States

Hi Dario, On 8/20/2012 1:36 AM, Dario Greco wrote: > dear friends, > we are starting a project using the new affymetrix human gene 2.1 st chips. > i would like to know: > 1) does anyone have yet any experience with them? any opinion/particular note analysing them? > 2) what is the bioc roadmap for including the cdf/annotation packages for this? There won't be a cdf package created by us. Instead there will be a pd.hugene.2.0.st.v1 package, intended for use with the oligo package. Note that there hasn't been an unsupported cdf file for any Gene ST chips after version 1.0, although Philip de Groot has been making cdf packages for the 1.1 chips, and may well make them for the 2.0 and 2.1. http://nmg-r.bioinformatics.nl/NuGO_R.html As for a roadmap, these packages will be part of the new BioC release. > 3) what is the roadmap for the alternative cdf packages? We don't make those; Manhong Dai at MBNI does. I suggest you ask him. You can get his email off their website: http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF _download.asp Best, Jim > > thanks you so much for your kind reply. > cheers > dario > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 13.5 years ago James W. MacDonald 68k

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.4 years ago

Dear Dario, For your purposes you can use package "xps", which I have just tested with the Human Gene 2.0 ST Array Data Set which is available for download from: http://www.affymetrix.com/support/downloads/demo_data/human2_0.zip 1, However, first you need to download the corresponding Affymetrix library files and annotation files for HuGene-2_1-st. You need these files to create the ROOT scheme file as follows: ### new R session: load library xps library(xps) ### define directories: # directory containing Affymetrix library files libdir <- "/Volumes/GigaDrive/Affy/libraryfiles" # directory containing Affymetrix annotation files anndir <- "/Volumes/GigaDrive/Affy/Annotation" # directory to store ROOT scheme files scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes" # HuGene-2_1-st: # use corrected annotation files scheme.hugene21st.na32 <- import.exon.scheme("hugene21stv1", filedir = file.path(scmdir, "na32"), file.path(libdir, "HuGene-2_1-st", "HuGene- 2_1-st.clf"), file.path(libdir, "HuGene-2_1-st", "HuGene-2_1-st.pgf"), file.path(anndir, "HuGene-2_1-st-v1.na32.hg19.probeset.csv", "HuGene- 2_1-st-v1.na32.hg19.probeset.corr.csv"), file.path(anndir, "HuGene- 2_1-st-v1.na32.hg19.transcript.csv", "HuGene- 2_1-st-v1.na32.hg19.transcript.corr.csv")) Since the Affymetrix annotation files for the new HuGene_2.x arrays have missing AFFX controls, you need first to add these controls. For this purpose I have created a Perl script (shown below) which adds the missing AFFX probesets and creates the corrected annotation files: - HuGene-2_1-st-v1.na32.hg19.probeset.corr.csv - HuGene-2_1-st-v1.na32.hg19.transcript.corr.csv Note: Affymetrix has promised to add the missing AFFX controls in version na33 of the annotation files. Alternatively, I can send you the finished ROOT scheme file "hugene21stv1.root", however it has a size of 52 MB. 2, After the creation of the ROOT scheme file "hugene21stv1.root" you are ready to import the CEL-files as follows: ### new R session: load library xps library(xps) ### define directories: # directory of ROOT scheme files scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes/na32" # directory to store ROOT raw data files datdir <- "/Volumes/GigaDrive/CRAN/Workspaces/ROOTData" # directory containing Tissues CEL files celdir <- "/Volumes/GigaDrive/ChipData/Exon/HuGene2/human2.0/HuGene2.1_Plate" ### HuGene-2_1-st data: import raw data # first, import ROOT scheme file scheme.genome <- root.scheme(file.path(scmdir, "hugene21stv1.root")) # subset of CEL files to import celfiles <- c("Liver_HuGene-2_1_GT_Rep1_A03_MC.CEL", "Liver_HuGene- 2_1_GT_Rep2_D06_MC.CEL", "Liver_HuGene-2_1_GT_Rep3_F02_MC.CEL", "Spleen_HuGene-2_1_GT_Rep1_A11_MC.CEL", "Spleen_HuGene- 2_1_GT_Rep2_C07_MC.CEL", "Spleen_HuGene-2_1_GT_Rep3_F04_MC.CEL") # rename CEL files celnames <- c("LiverRep1", "LiverRep2", "LiverRep3", "SpleenRep1", "SpleenRep2", "SpleenRep3") # import CEL files data.genome <- import.data(scheme.genome, "HuTissuesGenome21", filedir=datdir, celdir=celdir, celfiles=celfiles, celnames=celnames) 3, Now you are ready to convert the data to expression levels using RMA: ### new R session: load library xps library(xps) ### first, load ROOT scheme file and ROOT data file scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes/na32" scheme.genome <- root.scheme(file.path(scmdir, "hugene21stv1.root")) datdir <- "/Volumes/GigaDrive/CRAN/Workspaces/ROOTData" data.genome <- root.data(scheme.genome, paste(datdir, "HuTissuesGenome21_cel.root",sep="/")) ### preprocess raw data ### datdir <- getwd() # 1. RMA data.rma <- rma(data.genome, "HuGene21RMAcore", filedir=datdir, tmpdir="", background="antigenomic", normalize=TRUE, exonlevel="core+affx") # 2. DABG detection call call.dabg <- dabg.call(data.genome, "HuGene21DABGcore", filedir=datdir, exonlevel="core+affx") # get data.frames expr.rma <- validData(data.rma) pval.dabg <- pvalData(call.dabg) pres.dabg <- presCall(call.dabg) # density plots hist(data.rma) # boxplots boxplot(data.rma) # export expression data export.expr(data.rma, treename = "*", treetype="mdp", varlist="fUnitName:fName:fSymbol:fLevel", outfile="HuGene21RMAcoreNamesSymbols.txt", sep="\t", as.dataframe=FALSE, verbose=TRUE) I hope this info is helpful for you; below you find the Perl script. Best regards, Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ ### BEGIN perlscript "HuGene21_update_AFFX.pl" ### #!/usr/bin/perl # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Perl script to update AFFX controls of HuGene-2_1-st annotation files # # Copyright (c) 2012-2012 Christian Stratowa, Vienna, Austria. # All rights reserved. # # save HuGene-2_1-st pgf-file and annotation files in current directory # and run: # > perl HuGene21_update_AFFX.pl # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - use strict; use warnings; # get current working dir use Cwd; # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # intialize constants # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # input file names my $in_pgf = "/Volumes/GigaDrive/Affy/libraryfiles/HuGene-2_1-st /HuGene-2_1-st.pgf"; my $in_annot_tc = "/Volumes/GigaDrive/Affy/Annotation/HuGene- 2_1-st-v1.na32.hg19.transcript.csv/HuGene- 2_1-st-v1.na32.hg19.transcript.csv"; my $in_annot_ps = "/Volumes/GigaDrive/Affy/Annotation/HuGene- 2_1-st-v1.na32.hg19.probeset.csv/HuGene- 2_1-st-v1.na32.hg19.probeset.csv"; # output file names my $out_affx = "HuGene21.affx.csv"; my $out_annot_tc = "HuGene-2_1-st-v1.na32.hg19.transcript.corr.csv"; my $out_annot_ps = "HuGene-2_1-st-v1.na32.hg19.probeset.corr.csv"; # predefined strings my $na = "---"; my $beg_assignment_tc = "--- // --- // "; my $end_assignment_tc = " // --- // --- // --- // --- // --- // ---"; my $beg_assignment_ps = "--- // "; my $end_assignment_ps = " // --- // --- // --- // ---"; # variables my @array; my $idx; # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # read pgf-file and put control->affx into array # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - print("reading pgf-file and storing control->affx in array ... "); open(INFILE, $in_pgf) or die("Couldn't read $in_pgf: $!"); # fill array with [probeset_id,mrna_assignment,category, line_nr] $idx = 0; while (my $line = <infile>) { $idx++; if ($line =~ /control->affx/) { chomp($line); $line =~ s/\r//; # remove optional carriage return character my @tmp = split(/\t/, $line); push @array, [@tmp, $idx]; }#if }#while push @array, [0, "NA",, "NA", $idx+1]; close(INFILE) or die("Couldn't close $in_pgf: $!"); # replace "line_nr" with "total_probes" for (my $i=0; $i<$#array; $i++) { $array[$i][3] = ($array[$i+1][3] - $array[$i][3] - 1)/2; #very dirty workaround (would need to find number of lines between probeset_ids) # if ($array[$i][3] > 100) {$array[$i][3] = $array[$i-1][3];} if ($array[$i][3] > 100) {$array[$i][3] = 20;} }#for print("done.\n"); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # write control->affx array to out_affx (for testing purposes only) # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - print("writing control->affx to $out_affx ... "); open(OUTFILE, ">$out_affx") or die("Couldn't open $out_affx: $!"); for (my $i=0; $i<$#array; $i++) { my $tmp = join("\",\"", @{$array[$i]}); # print(OUTFILE "\"$tmp\"\n"); print(OUTFILE "\"$tmp\"\r\n"); }#for close(OUTFILE) or die("Couldn't close $out_affx: $!"); print("done.\n"); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # update control->affx lines of transcript annotation file out_annot_tc # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - print("appending control->affx lines to $out_annot_tc ... "); open(OUTFILE, ">$out_annot_tc") or die("Couldn't open $out_annot_tc: $!"); open(INFILE, $in_annot_tc) or die("Couldn't read $in_annot_tc: $!"); # delete old control->affx lines while (<infile>) { if (/control->affx/) {next;} print(OUTFILE $_); }#while # append new control->affx lines for (my $i=0; $i<$#array; $i++) { my $afx = join("", $beg_assignment_tc, $array[$i][2], $end_assignment_tc); my $tmp = join("\",\"", $array[$i][0], $array[$i][0], $na, $na, 0,0, $array[$i][3],$na, $afx, $na, $na, $na, $na, $na, $na, $na, $na, $array[$i][1]); print(OUTFILE "\"$tmp\"\n"); # print(OUTFILE "\"$tmp\"\r\n"); }#for close(INFILE) or die("Couldn't close $in_annot_tc: $!"); close(OUTFILE) or die("Couldn't close $out_annot_tc: $!"); print("done.\n"); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # update control->affx lines of probeset annotation file out_annot_ps # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - print("appending control->affx lines to $out_annot_ps ... "); open(OUTFILE, ">$out_annot_ps") or die("Couldn't open $out_annot_ps: $!"); open(INFILE, $in_annot_ps) or die("Couldn't read $in_annot_ps: $!"); # delete old control->affx lines while (<infile>) { if (/control->affx/) {next;} print(OUTFILE $_); }#while # append new control->affx lines for (my $i=0; $i<$#array; $i++) { my $afx = join("", $beg_assignment_ps, $array[$i][2], $end_assignment_ps); my $tmp = join("\",\"", $array[$i][0], $na, $na, 0, 0, $array[$i][3], 0, 0, 0, $na, $afx, 0, 0, 0, 0, $na, $na, $na, $na, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, $array[$i][1]); print(OUTFILE "\"$tmp\"\n"); # print(OUTFILE "\"$tmp\"\r\n"); }#for close(INFILE) or die("Couldn't close $in_annot_ps: $!"); close(OUTFILE) or die("Couldn't close $out_annot_ps: $!"); print("done.\n"); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### END perlscript "HuGene21_update_AFFX.pl" ### On 8/20/12 7:36 AM, Dario Greco wrote: > dear friends, > we are starting a project using the new affymetrix human gene 2.1 st chips. > i would like to know: > 1) does anyone have yet any experience with them? any opinion/particular note analysing them? > 2) what is the bioc roadmap for including the cdf/annotation packages for this? > 3) what is the roadmap for the alternative cdf packages? > > thanks you so much for your kind reply. > cheers > dario > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- output of sessionInfo(): > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] xps_1.17.1 loaded via a namespace (and not attached): [1] tools_2.15.0 > -- Sent via the guest posting facility at bioconductor.org.

ADD COMMENT • link 13.5 years ago Guest User ★ 13k

Login before adding your answer.