HUgene 2.0 analyses with CEL files, oligo package
2
0
Entering edit mode
kanacska ▴ 10
@kanacska-7375
Last seen 8.6 years ago
Hungary

Hi!

I want to do a Vendiagramm of 2 samples(3-3 repeats) and a list of differencial expressed genes, with hugene 2.0 CEL files, and i'm using oligo package.

Has someone got a pipeline?? Or can someone help with the following:

celdir <- create celdir path
setwd(celdir)
setwd(celdir) #set working directory


# create sampleNames from the list of celFiles - substitute file names with sample names
raw<- list.celfiles('path of my cell files', full.names=TRUE)
filename2sample <- ('path of the txt')
fn2sp <- read.table(filename2sample, header=T)
sNames <- unlist(lapply(celFiles, convertFilename2Sample, tabledict=fn2sp)) # co

#I've found this for reading cell file

affyGeneFS <- read.celfiles(raw)
norm <- rma(affyGeneFS, target = "probeset")

Background correcting
Normalizing
Calculating Expression
Warning message:
'isIdCurrent' is deprecated.
Use 'dbIsValid' instead.
See help("Deprecated") 

What do i have to do if I get these message, or if i can continue working then it can stay like that???

expr<- exprs(norm)

AFTER THIS STEP CAN I USE limma as same as in affy u133 plus??

And in u133 there was a function for annotation:

library(hgu133plus2.db)

transcriptclusterid2symbol_many <- select(hgu133plus2.db,
                                          keys = keys(hgu133plus2.db),
                                          columns=c("PROBEID","ENTREZID","SYMBOL","MAP"),
                                          keytype="PROBEID")

Is there a same function in hugene 2.0??

Cheers, 

Anna

ps. 

> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=Hungarian_Hungary.1250  LC_CTYPE=Hungarian_Hungary.1250   
[3] LC_MONETARY=Hungarian_Hungary.1250 LC_NUMERIC=C                      
[5] LC_TIME=Hungarian_Hungary.1250    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pd.hugene.2.0.st_3.10.0 RSQLite_1.0.0           DBI_0.3.1              
 [4] oligo_1.30.0            Biostrings_2.34.1       XVector_0.6.0          
 [7] IRanges_2.0.1           S4Vectors_0.4.0         oligoClasses_1.28.0    
[10] pheatmap_1.0.2          limma_3.22.7            optparse_1.3.0         
[13] hgu133plus2cdf_2.15.0   affy_1.44.0             Biobase_2.26.0         
[16] BiocGenerics_0.12.1     BiocInstaller_1.16.2   

loaded via a namespace (and not attached):
 [1] affxparser_1.38.0     affyio_1.34.0         AnnotationDbi_1.28.2  bit_1.1-12           
 [5] codetools_0.2-11      colorspace_1.2-6      ff_2.2-13             foreach_1.4.2        
 [9] GenomeInfoDb_1.2.4    GenomicRanges_1.18.4  getopt_1.20.0         grid_3.1.3           
[13] gtable_0.1.2          iterators_1.0.7       munsell_0.4.2         plyr_1.8.1           
[17] preprocessCore_1.28.0 RColorBrewer_1.1-2    Rcpp_0.11.5           scales_0.2.4         
[21] splines_3.1.3         tools_3.1.3           zlibbioc_1.12.0      

microarray annotation hgu133plus2 hugene20 • 2.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States

Three things:

1.) You probably don't want to summarize a Gene ST array at the probeset level, so just use

norm <- rma(affyGeneFS)

And you don't have to worry about the warning messages.

2.) There is no need to extract the expression values from your 'norm' exprSet. The limma package will do that for you. And you can use limma, just as you would for any other Affy arrays.

3.) The annotation for this array is supplied in the hugene20sttranscriptcluster.db package (if you summarize at the 'core' level, which I highly recommend). If you have used limma, then you will have an MArrayLM object that you can annotate.

design <- model.matrix(<args go here>)

contrast <- makeContrasts(<args go here>)

fit <- lmFit(norm, design)

fit2 <- contrasts.fit(fit, contrast)

fit2 <- eBayes(fit2)

gns <- select(hugene20sttranscriptcluster.db, row.names(fit2$coef), c("ENTREZID","SYMBOL","MAP"))

## here you need to account for the one-to-many mappings that result. My usual solution is

gns <- gns[!duplicated(gns[,1]),]

fit2$genes <- gns

if(!isTRUE(all.equal(row.names(fit2$coef), fit2$genes$PROBEID)))

stop("There has been an error when annotating these data, check the gns object!\n")

Then topTable() will output tables with the annotation appended.

 

ADD COMMENT
0
Entering edit mode
kanacska ▴ 10
@kanacska-7375
Last seen 8.6 years ago
Hungary

Dear James,

Thank you for xour quick help!!

Unfortunatly I have another question: when i want to orther my toptable sort.by ='t'              

                  logFC       AveExpr         t      P.Value   adj.P.Val        B
17047868 -3.528982  7.859722 -23.23726 3.602484e-08 0.001603732 4.176879
17059383 -3.599709  8.576114 -21.67394 5.982178e-08 0.001603732 4.093493
17056791  1.562905  7.379065  17.19374 3.204670e-07 0.005727493 3.739025
17059491 -3.694398 10.375221 -14.72316 9.775825e-07 0.013103760 3.423587
16996017 -1.823099  7.487446 -13.84718 1.515446e-06 0.016250736 3.279348
16675045  1.534715  6.423220  12.85058 2.578229e-06 0.023039484 3.087940

..........................

R thinks plus is minus. What do i have to do so it does my list in a right orther from minus till plus.

Plus: what do i have to do if i get the gene symbol list but some places it says NA, what do i have to do to get all of the ProbeID-s genesymbol in R??

 

Thank you 

Anna

ADD COMMENT
0
Entering edit mode

R doesn't think plus is minus. The topTable() function is sorting your data based on the evidence for differential expression, regardless of direction. This is what most people want. If you put the probesets with negative t-statistics at the bottom, then you will be sorting the table so that the interesting genes are at the top AND the bottom of the table, rather than having them all at the top.

But if you really want to do this, you can use order()

tab <- topTable(<args go here>)

tab.sorted <- tab[order(tab$logFC, decreasing = TRUE),]

And please note that not all genes have gene symbols! The gene symbols that you get from the annotation packages are all the gene symbols that map from probeset ID -> Entrez Gene ID -> HUGO. There are not gene symbols, in general, for the untranslated content on this array (e.g., miRNA, snoRNA, scaRNA, lincRNA).

 

ADD REPLY

Login before adding your answer.

Traffic: 646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6