Annotation of Affy Porcine Gene 1.0 ST array data
5
0
Entering edit mode
aminbau14 • 0
@aminbau14-7116
Last seen 10.7 years ago
Germany

Dear List

I am beginner of R/bioconductor. I am analyzing a microarray dataset obtained from Affymetrix Porcine Gene 1.0 ST array. With the help of posts of the site, I have summarized the expression values for each array probes using  ‘oligo’ package in R v3.1.2.  However, I got every time the warning message below while RMA normalization. 

> eset <- oligo::rma(oligoRaw)

Background correcting

Normalizing

Calculating Expression

Warning message:

'isIdCurrent' is deprecated.

Use 'dbIsValid' instead.

See help("Deprecated") 

What does it mean? how can fix it?

 

While loading platform design package I got the warning message below-

> librarypd.porgene.1.0.st)

Warning message:

package ‘pd.porgene.1.0.st’ was built under R version 3.2.0 

Does this making wrong in downstream process? As i am using R v 3.1.2

  

Secondly, the big battle I faced in annotation and biological interpretation of the data. I tried with ‘porcine.db’ package for annotation but not sure is it the right one or not?

With limma package I can run the lmFit function and get topTable but that contains only Affy probeset  ID of eight digit number (e.g  ‘15192707’, ‘15212448’,…).  

 

How can I get the list of deferentially expressed genes with biological info like gene name, symbol, ensembl_id, fold change and adj p value ?  How can I filter genes which up and down regulated in contrast group of interest?

Please help me with the suggestions/tips/codes for way to go.  Thanks very much in advance.  

Here is the code i am trying with-

#load libraby

library(oligo)

library(limma)

library(porcine.db)

library(pd.porgene.1.0.st)


#nomalization

oligoRaw <- read.celfiles(list.celfiles())

eset <- oligo::rma(oligoRaw)

write.exprs(eset,file="data.normal.txt")


#linear model fit

design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3, 4,4,4,5,5,5,6,6,6,7,7,7)))

colnames(design) <- c("cont.0h", "cont.6h", "vac.6h", "cont.24h", "vac.24h", "cont.72h", "vac.72h")

rownames(design)<-colnames(eset)

fit <- lmFit(eset, design)


# contrast groups for Diff expressed genes

cont.dif1<-makeContrasts(Dif24hr=(vac.24h-cont.0h), levels=design)

fit1 <- contrasts.fit(fit, cont.dif1)

fit.eBayes <- eBayes(fit1)

topTable(fit.eBayes, coef=1, adjust="BH")

results <- decideTests(fit.eBayes)

vennDiagram(results)


#Annotation

my_frame <- data.frame(exprs(eset))

porcine()

Annot <- data.frame(ACCNUM=sapply(contents(porcineACCNUM), paste, collapse=", "), SYMBOL=sapply(contents(porcineSYMBOL), paste, collapse=", "), DESC=sapply(contents(porcineGENENAME), paste, collapse=", "))

all <- merge(Annot, my_frame, by.x=0, by.y=0, all=T)

write.table(all,file="data.annoted.txt",sep="\t")

summary(all)

sessionInfo()

R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] genefilter_1.48.1        pd.porgene.1.0.st_3.10.0 porcine.db_3.0.0        
 [4] org.Ss.eg.db_3.0.0       RSQLite_1.0.0            DBI_0.3.1               
 [7] AnnotationDbi_1.28.1     GenomeInfoDb_1.2.3       limma_3.22.1            
[10] oligo_1.30.0             Biostrings_2.34.1        XVector_0.6.0           
[13] IRanges_2.0.1            S4Vectors_0.4.0          Biobase_2.26.0          
[16] oligoClasses_1.28.0      BiocGenerics_0.12.1     

loaded via a namespace (and not attached):
 [1] affxparser_1.38.0     affyio_1.34.0         annotate_1.44.0      
 [4] BiocInstaller_1.16.1  bit_1.1-12            codetools_0.2-9      
 [7] ff_2.2-13             foreach_1.4.2         GenomicRanges_1.18.3 
[10] iterators_1.0.7       preprocessCore_1.28.0 splines_3.1.2        
[13] survival_2.37-7       tools_3.1.2           XML_3.98-1.1         
[16] xtable_1.7-4          zlibbioc_1.12.0 

 

Please help me with the suggestions/tips/codes for way to go.  Thanks very much in advance.  

Sincerely

Amin

limma oligo annotation • 4.3k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

You don't have to worry about the warning for 'isIdCurrent' this has to do with some underlying code changes that won't affect your results.

It looks like you have a devel version of pd.porgene.1.0.st. You can fix that by doing

library(BiocInstaller)
biocLite("pd.porgene.1.0.st")

##Also do this

biocLite("pig.db0")

The annotation question is a bit more tricky. Ideally you would have a porgene10sttranscriptcluster.db package to annotate with, but we haven't built an annotation package for this array. That said, it isn't that difficult to build your own. You first need to go to the Affy website (requires a free registration) http://www.affymetrix.com/Auth/analysis/downloads/na34/wtgene/PorGene-1_0-st-v1.na34.sscr2.transcript.csv.zip

Unzip this file somewhere convenient.

Then go to this gist:

 

EDIT:  Now that's cool ^^^^^^^^^^^^^ Didn't expect that to happen.

probably the easiest thing to do is click the 'view raw' button at the bottom right of the code, then copy-paste into a text file, and save as makeTranscriptPkg.R, in the same directory that contains your PorGene-1_0-st-v1.na34.sscr2.transcript.csv file.

Now open R, change the directory to wherever you put those files, and source() the makeTranscriptPkg.R file. You can now build an annotation package by doing

pv <- parseCsvFiles("PorGene-1_0-st-v1.na34.sscr2.transcript.csv")

makePkg(pv$outname, pv$outsub, "8.2.1", "PIGCHIP_DB")

install.packages("porgene10sttranscriptcluster.db", repos = NULL, type = "source")
ADD COMMENT
0
Entering edit mode

i am getting the following Error: could not find function "parseCsv" when i used above code at  pv <- parseCsv("PorGene-1_0-st-v1.na34.sscr2.transcript.csv") and Error in switch(outsub, transcript = paste0(outname, "st", outsub, "cluster"),  : EXPR must be a length 1 vector at makePkg(pv$outname, pv$outsub, "8.2.1", "PIGCHIP_DB")

 

 

ADD REPLY
0
Entering edit mode

Quick reply: to me it seems that the Gist code has slightly changed since James posted his answer, so you should use parseCsvFiles ( thus with 'Files' added to the name of the function).

... which BTW was already noticed/mentioned one post below this one...... (A: Annotation of Affy Porcine Gene 1.0 ST array data)

ADD REPLY
0
Entering edit mode
aminbau14 • 0
@aminbau14-7116
Last seen 10.7 years ago
Germany

Hi James,

Great! Many thanks for the support. I could built my package with the code you provided and it worked well except a typo, should 'parseCsv' be 'parseCsvFiles' in the first command after running the source () function.

 

Best regards

Amin

ADD COMMENT
0
Entering edit mode
aminbau14 • 0
@aminbau14-7116
Last seen 10.7 years ago
Germany

Hi James and others,

I am curious to look for what comes out about alternate splicing events from my PorGene 1.0 ST array data set. After normalized with 'probeset' as target with oligo package, i could find some deferentially expressed probes/exons with limma, now i need to annotate them.

Could you please help me providing with the code for building package for exon level annotation (e.g. something like 'porgene10stprobeset.db').

 

Many thanks in advance.

Best regards

Amin

ADD COMMENT
0
Entering edit mode

Just use the code above, but instead of using the transcript csv file, use the probeset csv file. But do note that the Gene ST arrays are (IMO) sort of compromised when it comes to probeset-level summarization, because a large proportion of the probesets have three or fewer probes/probeset.

> library(pd.porgene.1.0.st)

> con <- db(pd.porgene.1.0.st)

> z <- dbGetQuery(con, "select * from pmfeature;")
> table(table(z$fsetid))

    1     2     3     4     5     6     7     8     9    10    11    12    13
35931 42591 23674 13298  7779  2362  3111  1472  1953   827   675   629   967
   14    15    16    17    18    19    20    21    22    23    24    25    26
  267   429   220   226   235   235   567   324   373   408   600  4689   115
   27    28    29    30    31    32    33    34    35    38    40    43    44
  175   223    26   188    11    13    10     3    10     1     1     1     1
   50    52   268   322   407   585   697   703   813   849   873   912   914
    2     1     1     1     1     1     1     1     1     1     1     1     1
  940   942   949   952   959   960   963   968   973
    1     1     1     1     1     2     1     1     1

> cumsum(zz)/sum(zz)
        1         2         3         4         5         6         7         8
0.2484099 0.5428639 0.7065347 0.7984707 0.8522510 0.8685808 0.8900888 0.9002655
        9        10        11        12        13        14        15        16
0.9137676 0.9194851 0.9241517 0.9285003 0.9351857 0.9370316 0.9399975 0.9415185
       17        18        19        20        21        22        23        24
0.9430809 0.9447056 0.9463303 0.9502503 0.9524903 0.9550690 0.9578897 0.9620378
       25        26        27        28        29        30        31        32
0.9944554 0.9952504 0.9964603 0.9980020 0.9981817 0.9994815 0.9995575 0.9996474
       33        34        35        38        40        43        44        50
0.9997165 0.9997373 0.9998064 0.9998133 0.9998202 0.9998272 0.9998341 0.9998479
       52       268       322       407       585       697       703       813
0.9998548 0.9998617 0.9998686 0.9998756 0.9998825 0.9998894 0.9998963 0.9999032
      849       873       912       914       940       942       949       952
0.9999101 0.9999170 0.9999240 0.9999309 0.9999378 0.9999447 0.9999516 0.9999585
      959       960       963       968       973
0.9999654 0.9999793 0.9999862 0.9999931 1.0000000
>

So about 71% of probesets have 3 or fewer probes...

In addition, the probeset level of summarization isn't actually by exon - it's by what Affymetrix calls a probe set region (PSR), which may be the entire exon, or it may be just a portion of an exon, depending on how long the exon in question might be.

So you cannot blindly attribute signal from a given PSR to an exon - you have to explore further in order to determine exactly what the PSR is supposed to be measuring.

In addition, figuring out alternative splicing is more difficult than you might imagine. The Affy TAC software is supposed to be able to do alternative splicing with the Gene ST arrays, but they fit a 'splicing index' which is pretty similar to fitting a two-way ANOVA model with a 'gene' level measure and an 'exon' level measure, and looking for significant interactions. The problem with that idea is that the likelihood of coming up significant is directly correlated with exon length (e.g., longer exons, which have more probesets, have inherently lower variability, so they are more likely to appear significantly different than shorter exons, which have fewer probes and tend to have lower apparent expression and higher variability). In the end, most of the evidence for transcript variants has to do with long exons, and in particular, long exons in long genes with lots of exons. Short genes and/or short exons are usually too variable to attain significance.

 

ADD REPLY
0
Entering edit mode
alakatos ▴ 130
@alakatos-6983
Last seen 6.1 years ago
United States

Hello,

I need to build an annotation package for  my GeneChip® Mouse Transcriptome Assay 1.0 array.

I followed  James' code but  it did not work for this platform. 

Would you please help  to make it work?

Thank you in advance.  

Anita

>pv <- parseCsvFiles("MTA-1_0.na34.1.mm10.transcript.csv")

>ls()

[1] "makePkg"       "parseCsvFiles" "pv"           
>pv
$outname
[1] "mta10"

$outsub
[1] "transcript"
>makePkg(pv$outname, pv$outsub, "r1", "MOUSECHIP_DB")

Error in sqliteSendQuery(con, statement, bind.data) : error in statement: table metadata already exists

ADD COMMENT
0
Entering edit mode

The code will fail if you already have a partially-built package in your working directory. You need to remove those files first, then run the code:

unlink(dir(".", "db$|sqlite$"), recursive = TRUE)

makePkg(pv$outname, pv$outsub, "r1", "MOUSECHIP_DB")

ADD REPLY
0
Entering edit mode
alakatos ▴ 130
@alakatos-6983
Last seen 6.1 years ago
United States
James, 
Another error message.
Would you please help?
Thank you.
Anita 

>unlink(dir(".", "db$|sqlite$"), recursive = TRUE)

> makePkg(pv$outname, pv$outsub, "r1", "MOUSECHIP_DB")
baseMapType is gb or gbNRef

Error in sqliteSendQuery(con, statement, bind.data) : error in statement: no such table: src.accession

In addition: Warning message: 'dbBeginTransaction' is deprecated. Use 'dbBegin' instead. See help("Deprecated")

ADD COMMENT
0
Entering edit mode

How about

library(BiocInstaller)

biocLite("mta10sttranscriptcluster.db")

 

ADD REPLY
0
Entering edit mode

It works. Thank you very much for making it.

Anita 

 

ADD REPLY
0
Entering edit mode

Hello,

I processed the my GeneChip® Mouse Transcriptome Assay 1.0 (MTA10) arrays.

Command:

library(affy)

raw <- ReadAffy(cdfname = "mta10.r1.genecdf", phenoData="phenotype.txt")

boxplot(raw)

Warning message: In data.row.names(row.names, rowsi, i) : some row.names duplicated: 16,17,18,23,24,25,26,27,28,29,31,33,34,35,36,38,42,43,44,45,47,49,50,51,54,55,56,62,64,66,67,69,70,71,73,74,76,81,82,83,85,87,88,89,91,92,94,96,97,98,99,101,102,105,108,109,112,115,117,118,119,121,124,125,127,128,129,131,132,133,134,135,137,138,139,140,142,144,145,146,147,151,152,153,154,155,156,159,161,162,166,169,176,180,181,183,186,190,191,192,193,194,195,197,198,203,208,211,212,213,215,220,222,223,224,226,228,229,232,233,236,238,241,242,244,245,246,247,248,251,254,255,257,261,262,266,267,270,273,274,275,276,278,280,284,286,287,288,293,294,295,297,298,299,302,308,309,312,314,316,318,321,323,326,327,328,330,331,333,336,338,339,340,341,344,346,349,354,356,357,359,360,361,362,364,365,366,368,370,374,379,380,381,382,383,386,387,391,392,393,396,398,399,400,403,404,405,409,410,413,415,416,420,422,424,425,426,428,429,430,434,435,436,439,440,441,442,443,444,446,447,450,452,453,454,458,460,461,464,465,466,467,470,475,478,480,482,484,486,487,490,491,493,494,495,498, [... truncated]

eset = rma(raw, target="core")
annotation(eset) <- "mta10sttranscriptcluster.db"

              0215F-02_01-(MTA-1_0).CEL        0215F-02_02-(MTA-1_0).CEL      etc    
20550008              5.609107                                    5.647651
20550009              6.488896                                    6.488896
20550010              6.284914                                    6.284914
20550011              6.738269                                    6.738269
20550012              6.668315                                    6.668315 

In addition, the same files were  processed  in Affy Expression Console (SST-RMA Gene level).

When I compared the 2 results,  the boxplots  (Expression console (25% ~ 4.2; median ~6; 75%~ 8), R(affy)(25% ~ 6.25; median ~6.6; 75%~ 7) and the probe_ids were different. I double-checked  the probe-ids in the original MTA10 annotation file. 

Probe_ids in the MTA10 annotation file:

transcript_cluster_id

probeset_id
TC0100000001.mm.1 TC0100000001.mm.1
TC0100000002.mm.1 TC0100000002.mm.1
TC0100000003.mm.1 TC0100000003.mm.1
TC0100000004.mm.1 TC0100000004.mm.1
TC0100000005.mm.1 TC0100000005.mm.1
TC0100000006.mm.1 TC0100000006.mm.1
TC0100000007.mm.1 TC0100000007.mm.1
TC0100000008.mm.1 TC0100000008.mm.1 etc.
                   

 

question 1:

I am wondering  if I used the correct  tool ("affy") to process this type of array?

question 2:

Why is there a discrepancy between R processed and  Affy expression Console  processed result with regards to  expression values and annotation.

question 3:

Is there any better tool(s)  (e.g "puma") or recommendation  to process this array ?

                 

Thank you for your help in advance.

Anita

P.S: the completion  mta10cdf file is posted  (C: mta10cdf not installed Bioconductor )

                 
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] oligo_1.30.0                      Biostrings_2.34.1                 XVector_0.6.0                    
 [4] oligoClasses_1.28.0               mta10sttranscriptcluster.db_8.2.0 org.Mm.eg.db_3.0.0               
 [7] mta10.r1.genecdf_1.42.0           affy_1.44.0                       biomaRt_2.22.0                   
[10] gplots_2.16.0                     ggplot2_1.0.0                     GGally_0.5.0                     
[13] hwriter_1.3.2                     ReportingTools_2.6.0              RSQLite_1.0.0                    
[16] DBI_0.3.1                         knitr_1.9                         limma_3.22.6                     
[19] annotate_1.44.0                   XML_3.98-1.1                      AnnotationDbi_1.28.1             
[22] GenomeInfoDb_1.2.4                IRanges_2.0.1                     S4Vectors_0.4.0                  
[25] genefilter_1.48.1                 Biobase_2.26.0                    BiocGenerics_0.12.1              
[28] RColorBrewer_1.1-2               
                 
                   
 
ADD REPLY
0
Entering edit mode

I didn't notice before that you hijacked this thread. Please don't ask random questions on an existing thread that have nothing to do with that thread. If you have a question, open a new thread.

ADD REPLY
0
Entering edit mode

Players are allowed to play the game without Melon Sandbox any restrictions because it has no set goals or objectives and no failure penalties.

ADD REPLY

Login before adding your answer.

Traffic: 931 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6