Hi!
I want to do a Vendiagramm of 2 samples(3-3 repeats) and a list of differencial expressed genes, with hugene 2.0 CEL files, and i'm using oligo package.
Has someone got a pipeline?? Or can someone help with the following:
celdir <- create celdir path
setwd(celdir)
setwd(celdir) #set working directory
# create sampleNames from the list of celFiles - substitute file names with sample names
raw<- list.celfiles('path of my cell files', full.names=TRUE)
filename2sample <- ('path of the txt')
fn2sp <- read.table(filename2sample, header=T)
sNames <- unlist(lapply(celFiles, convertFilename2Sample, tabledict=fn2sp)) # co
#I've found this for reading cell file
affyGeneFS <- read.celfiles(raw)
norm <- rma(affyGeneFS, target = "probeset")
Background correcting
Normalizing
Calculating Expression
Warning message:
'isIdCurrent' is deprecated.
Use 'dbIsValid' instead.
See help("Deprecated")
What do i have to do if I get these message, or if i can continue working then it can stay like that???
expr<- exprs(norm)
AFTER THIS STEP CAN I USE limma as same as in affy u133 plus??
And in u133 there was a function for annotation:
library(hgu133plus2.db)
transcriptclusterid2symbol_many <- select(hgu133plus2.db,
keys = keys(hgu133plus2.db),
columns=c("PROBEID","ENTREZID","SYMBOL","MAP"),
keytype="PROBEID")
Is there a same function in hugene 2.0??
Cheers,
Anna
ps.
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Hungarian_Hungary.1250 LC_CTYPE=Hungarian_Hungary.1250
[3] LC_MONETARY=Hungarian_Hungary.1250 LC_NUMERIC=C
[5] LC_TIME=Hungarian_Hungary.1250
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] pd.hugene.2.0.st_3.10.0 RSQLite_1.0.0 DBI_0.3.1
[4] oligo_1.30.0 Biostrings_2.34.1 XVector_0.6.0
[7] IRanges_2.0.1 S4Vectors_0.4.0 oligoClasses_1.28.0
[10] pheatmap_1.0.2 limma_3.22.7 optparse_1.3.0
[13] hgu133plus2cdf_2.15.0 affy_1.44.0 Biobase_2.26.0
[16] BiocGenerics_0.12.1 BiocInstaller_1.16.2
loaded via a namespace (and not attached):
[1] affxparser_1.38.0 affyio_1.34.0 AnnotationDbi_1.28.2 bit_1.1-12
[5] codetools_0.2-11 colorspace_1.2-6 ff_2.2-13 foreach_1.4.2
[9] GenomeInfoDb_1.2.4 GenomicRanges_1.18.4 getopt_1.20.0 grid_3.1.3
[13] gtable_0.1.2 iterators_1.0.7 munsell_0.4.2 plyr_1.8.1
[17] preprocessCore_1.28.0 RColorBrewer_1.1-2 Rcpp_0.11.5 scales_0.2.4
[21] splines_3.1.3 tools_3.1.3 zlibbioc_1.12.0
R doesn't think plus is minus. The topTable() function is sorting your data based on the evidence for differential expression, regardless of direction. This is what most people want. If you put the probesets with negative t-statistics at the bottom, then you will be sorting the table so that the interesting genes are at the top AND the bottom of the table, rather than having them all at the top.
But if you really want to do this, you can use order()
And please note that not all genes have gene symbols! The gene symbols that you get from the annotation packages are all the gene symbols that map from probeset ID -> Entrez Gene ID -> HUGO. There are not gene symbols, in general, for the untranslated content on this array (e.g., miRNA, snoRNA, scaRNA, lincRNA).