affy hugene 2.1 st
2
0
Entering edit mode
Dario Greco ▴ 310
@dario-greco-1536
Last seen 10.3 years ago
dear friends, we are starting a project using the new affymetrix human gene 2.1 st chips. i would like to know: 1) does anyone have yet any experience with them? any opinion/particular note analysing them? 2) what is the bioc roadmap for including the cdf/annotation packages for this? 3) what is the roadmap for the alternative cdf packages? thanks you so much for your kind reply. cheers dario
cdf cdf • 2.7k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Dario, On 8/20/2012 1:36 AM, Dario Greco wrote: > dear friends, > we are starting a project using the new affymetrix human gene 2.1 st chips. > i would like to know: > 1) does anyone have yet any experience with them? any opinion/particular note analysing them? > 2) what is the bioc roadmap for including the cdf/annotation packages for this? There won't be a cdf package created by us. Instead there will be a pd.hugene.2.0.st.v1 package, intended for use with the oligo package. Note that there hasn't been an unsupported cdf file for any Gene ST chips after version 1.0, although Philip de Groot has been making cdf packages for the 1.1 chips, and may well make them for the 2.0 and 2.1. http://nmg-r.bioinformatics.nl/NuGO_R.html As for a roadmap, these packages will be part of the new BioC release. > 3) what is the roadmap for the alternative cdf packages? We don't make those; Manhong Dai at MBNI does. I suggest you ask him. You can get his email off their website: http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF _download.asp Best, Jim > > thanks you so much for your kind reply. > cheers > dario > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.3 years ago
Dear Dario, For your purposes you can use package "xps", which I have just tested with the Human Gene 2.0 ST Array Data Set which is available for download from: http://www.affymetrix.com/support/downloads/demo_data/human2_0.zip 1, However, first you need to download the corresponding Affymetrix library files and annotation files for HuGene-2_1-st. You need these files to create the ROOT scheme file as follows: ### new R session: load library xps library(xps) ### define directories: # directory containing Affymetrix library files libdir <- "/Volumes/GigaDrive/Affy/libraryfiles" # directory containing Affymetrix annotation files anndir <- "/Volumes/GigaDrive/Affy/Annotation" # directory to store ROOT scheme files scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes" # HuGene-2_1-st: # use corrected annotation files scheme.hugene21st.na32 <- import.exon.scheme("hugene21stv1", filedir = file.path(scmdir, "na32"), file.path(libdir, "HuGene-2_1-st", "HuGene- 2_1-st.clf"), file.path(libdir, "HuGene-2_1-st", "HuGene-2_1-st.pgf"), file.path(anndir, "HuGene-2_1-st-v1.na32.hg19.probeset.csv", "HuGene- 2_1-st-v1.na32.hg19.probeset.corr.csv"), file.path(anndir, "HuGene- 2_1-st-v1.na32.hg19.transcript.csv", "HuGene- 2_1-st-v1.na32.hg19.transcript.corr.csv")) Since the Affymetrix annotation files for the new HuGene_2.x arrays have missing AFFX controls, you need first to add these controls. For this purpose I have created a Perl script (shown below) which adds the missing AFFX probesets and creates the corrected annotation files: - HuGene-2_1-st-v1.na32.hg19.probeset.corr.csv - HuGene-2_1-st-v1.na32.hg19.transcript.corr.csv Note: Affymetrix has promised to add the missing AFFX controls in version na33 of the annotation files. Alternatively, I can send you the finished ROOT scheme file "hugene21stv1.root", however it has a size of 52 MB. 2, After the creation of the ROOT scheme file "hugene21stv1.root" you are ready to import the CEL-files as follows: ### new R session: load library xps library(xps) ### define directories: # directory of ROOT scheme files scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes/na32" # directory to store ROOT raw data files datdir <- "/Volumes/GigaDrive/CRAN/Workspaces/ROOTData" # directory containing Tissues CEL files celdir <- "/Volumes/GigaDrive/ChipData/Exon/HuGene2/human2.0/HuGene2.1_Plate" ### HuGene-2_1-st data: import raw data # first, import ROOT scheme file scheme.genome <- root.scheme(file.path(scmdir, "hugene21stv1.root")) # subset of CEL files to import celfiles <- c("Liver_HuGene-2_1_GT_Rep1_A03_MC.CEL", "Liver_HuGene- 2_1_GT_Rep2_D06_MC.CEL", "Liver_HuGene-2_1_GT_Rep3_F02_MC.CEL", "Spleen_HuGene-2_1_GT_Rep1_A11_MC.CEL", "Spleen_HuGene- 2_1_GT_Rep2_C07_MC.CEL", "Spleen_HuGene-2_1_GT_Rep3_F04_MC.CEL") # rename CEL files celnames <- c("LiverRep1", "LiverRep2", "LiverRep3", "SpleenRep1", "SpleenRep2", "SpleenRep3") # import CEL files data.genome <- import.data(scheme.genome, "HuTissuesGenome21", filedir=datdir, celdir=celdir, celfiles=celfiles, celnames=celnames) 3, Now you are ready to convert the data to expression levels using RMA: ### new R session: load library xps library(xps) ### first, load ROOT scheme file and ROOT data file scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes/na32" scheme.genome <- root.scheme(file.path(scmdir, "hugene21stv1.root")) datdir <- "/Volumes/GigaDrive/CRAN/Workspaces/ROOTData" data.genome <- root.data(scheme.genome, paste(datdir, "HuTissuesGenome21_cel.root",sep="/")) ### preprocess raw data ### datdir <- getwd() # 1. RMA data.rma <- rma(data.genome, "HuGene21RMAcore", filedir=datdir, tmpdir="", background="antigenomic", normalize=TRUE, exonlevel="core+affx") # 2. DABG detection call call.dabg <- dabg.call(data.genome, "HuGene21DABGcore", filedir=datdir, exonlevel="core+affx") # get data.frames expr.rma <- validData(data.rma) pval.dabg <- pvalData(call.dabg) pres.dabg <- presCall(call.dabg) # density plots hist(data.rma) # boxplots boxplot(data.rma) # export expression data export.expr(data.rma, treename = "*", treetype="mdp", varlist="fUnitName:fName:fSymbol:fLevel", outfile="HuGene21RMAcoreNamesSymbols.txt", sep="\t", as.dataframe=FALSE, verbose=TRUE) I hope this info is helpful for you; below you find the Perl script. Best regards, Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ ### BEGIN perlscript "HuGene21_update_AFFX.pl" ### #!/usr/bin/perl # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Perl script to update AFFX controls of HuGene-2_1-st annotation files # # Copyright (c) 2012-2012 Christian Stratowa, Vienna, Austria. # All rights reserved. # # save HuGene-2_1-st pgf-file and annotation files in current directory # and run: # > perl HuGene21_update_AFFX.pl # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - use strict; use warnings; # get current working dir use Cwd; # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # intialize constants # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # input file names my $in_pgf = "/Volumes/GigaDrive/Affy/libraryfiles/HuGene-2_1-st /HuGene-2_1-st.pgf"; my $in_annot_tc = "/Volumes/GigaDrive/Affy/Annotation/HuGene- 2_1-st-v1.na32.hg19.transcript.csv/HuGene- 2_1-st-v1.na32.hg19.transcript.csv"; my $in_annot_ps = "/Volumes/GigaDrive/Affy/Annotation/HuGene- 2_1-st-v1.na32.hg19.probeset.csv/HuGene- 2_1-st-v1.na32.hg19.probeset.csv"; # output file names my $out_affx = "HuGene21.affx.csv"; my $out_annot_tc = "HuGene-2_1-st-v1.na32.hg19.transcript.corr.csv"; my $out_annot_ps = "HuGene-2_1-st-v1.na32.hg19.probeset.corr.csv"; # predefined strings my $na = "---"; my $beg_assignment_tc = "--- // --- // "; my $end_assignment_tc = " // --- // --- // --- // --- // --- // ---"; my $beg_assignment_ps = "--- // "; my $end_assignment_ps = " // --- // --- // --- // ---"; # variables my @array; my $idx; # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # read pgf-file and put control->affx into array # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - print("reading pgf-file and storing control->affx in array ... "); open(INFILE, $in_pgf) or die("Couldn't read $in_pgf: $!"); # fill array with [probeset_id,mrna_assignment,category, line_nr] $idx = 0; while (my $line = <infile>) { $idx++; if ($line =~ /control->affx/) { chomp($line); $line =~ s/\r//; # remove optional carriage return character my @tmp = split(/\t/, $line); push @array, [@tmp, $idx]; }#if }#while push @array, [0, "NA",, "NA", $idx+1]; close(INFILE) or die("Couldn't close $in_pgf: $!"); # replace "line_nr" with "total_probes" for (my $i=0; $i<$#array; $i++) { $array[$i][3] = ($array[$i+1][3] - $array[$i][3] - 1)/2; #very dirty workaround (would need to find number of lines between probeset_ids) # if ($array[$i][3] > 100) {$array[$i][3] = $array[$i-1][3];} if ($array[$i][3] > 100) {$array[$i][3] = 20;} }#for print("done.\n"); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # write control->affx array to out_affx (for testing purposes only) # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - print("writing control->affx to $out_affx ... "); open(OUTFILE, ">$out_affx") or die("Couldn't open $out_affx: $!"); for (my $i=0; $i<$#array; $i++) { my $tmp = join("\",\"", @{$array[$i]}); # print(OUTFILE "\"$tmp\"\n"); print(OUTFILE "\"$tmp\"\r\n"); }#for close(OUTFILE) or die("Couldn't close $out_affx: $!"); print("done.\n"); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # update control->affx lines of transcript annotation file out_annot_tc # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - print("appending control->affx lines to $out_annot_tc ... "); open(OUTFILE, ">$out_annot_tc") or die("Couldn't open $out_annot_tc: $!"); open(INFILE, $in_annot_tc) or die("Couldn't read $in_annot_tc: $!"); # delete old control->affx lines while (<infile>) { if (/control->affx/) {next;} print(OUTFILE $_); }#while # append new control->affx lines for (my $i=0; $i<$#array; $i++) { my $afx = join("", $beg_assignment_tc, $array[$i][2], $end_assignment_tc); my $tmp = join("\",\"", $array[$i][0], $array[$i][0], $na, $na, 0,0, $array[$i][3],$na, $afx, $na, $na, $na, $na, $na, $na, $na, $na, $array[$i][1]); print(OUTFILE "\"$tmp\"\n"); # print(OUTFILE "\"$tmp\"\r\n"); }#for close(INFILE) or die("Couldn't close $in_annot_tc: $!"); close(OUTFILE) or die("Couldn't close $out_annot_tc: $!"); print("done.\n"); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # update control->affx lines of probeset annotation file out_annot_ps # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - print("appending control->affx lines to $out_annot_ps ... "); open(OUTFILE, ">$out_annot_ps") or die("Couldn't open $out_annot_ps: $!"); open(INFILE, $in_annot_ps) or die("Couldn't read $in_annot_ps: $!"); # delete old control->affx lines while (<infile>) { if (/control->affx/) {next;} print(OUTFILE $_); }#while # append new control->affx lines for (my $i=0; $i<$#array; $i++) { my $afx = join("", $beg_assignment_ps, $array[$i][2], $end_assignment_ps); my $tmp = join("\",\"", $array[$i][0], $na, $na, 0, 0, $array[$i][3], 0, 0, 0, $na, $afx, 0, 0, 0, 0, $na, $na, $na, $na, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, $array[$i][1]); print(OUTFILE "\"$tmp\"\n"); # print(OUTFILE "\"$tmp\"\r\n"); }#for close(INFILE) or die("Couldn't close $in_annot_ps: $!"); close(OUTFILE) or die("Couldn't close $out_annot_ps: $!"); print("done.\n"); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### END perlscript "HuGene21_update_AFFX.pl" ### On 8/20/12 7:36 AM, Dario Greco wrote: > dear friends, > we are starting a project using the new affymetrix human gene 2.1 st chips. > i would like to know: > 1) does anyone have yet any experience with them? any opinion/particular note analysing them? > 2) what is the bioc roadmap for including the cdf/annotation packages for this? > 3) what is the roadmap for the alternative cdf packages? > > thanks you so much for your kind reply. > cheers > dario > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- output of sessionInfo(): > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] xps_1.17.1 loaded via a namespace (and not attached): [1] tools_2.15.0 > -- Sent via the guest posting facility at bioconductor.org.
ADD COMMENT

Login before adding your answer.

Traffic: 755 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6