Question

phenotypic data and (.cel) file

0

Entering edit mode

nia ▴ 30

@nia-12707

Last seen 4.1 years ago

Firstly i get the (.tar) file from geo db then phenotypic data by using R ,I do data normalization with this method but this time is not working and secondly the .cel files are not opening in text format while i work before on GSE14325 and its .cel files was opened in text format.kindly tell me the possible solution for that.

For phenotypic data:

library(GEOquery)

gse <- getGEO('GSE94358',GSEMatrix=TRUE)

pheno.df <- pData(phenoData(gse[[1]]))

pData(gse[[1]])

R script for data normalization:

library(affy)
library(affycoretools)
d1<-ReadAffy()
pData(d1)<-read.table("PhenoData.txt",row.names=1,header = T,sep="\t");
pData(d1)
           Sample_title                                                                                                                                                                                                                                                        
GSM2473987        A_set
GSM2473988        A_set
GSM2473989       AS_set
GSM2473990       AS_set
GSM2473991       AS_set
GSM2473992        F_set
GSM2473993        F_set
GSM2473994        S_set
GSM2473995        S_set
GSM2473996        S_set
                                                                                                Sample_source_name_ch1
GSM2473987                             A - adherent cells in high attachment flask with 10% FBS (DMEM/F12 base medium)
GSM2473988                             A - adherent cells in high attachment flask with 10% FBS (DMEM/F12 base medium)
GSM2473989 AS - Adherent cells in high attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473990 AS - Adherent cells in high attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473991 AS - Adherent cells in high attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473992   F - Floating cells in low attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473993   F - Floating cells in low attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473994   S - adherent cells in low attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473995   S - adherent cells in low attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
GSM2473996   S - adherent cells in low attachment flask with serum-free stem cell media (DMEM/F12 with growth factors)
                Sample_characteristics_ch1
GSM2473987   date rna collect: 2015-02-05 
GSM2473988    date rna collect: 2015-02-05
GSM2473989 date rna collect: 2015-02-05   
GSM2473990   date rna collect: 2015-02-02 
GSM2473991    date rna collect: 2015-01-30
GSM2473992    date rna collect: 2015-02-02
GSM2473993   date rna collect: 2015-01-30 
GSM2473994    date rna collect: 2015-02-05
GSM2473995    date rna collect: 2015-02-02
GSM2473996    date rna collect: 2015-01-30
                                                                                                                                    Sample_description
GSM2473987                                                                expression levels for cells expanded in adherent flasks in traditional media
GSM2473988                                                                expression levels for cells expanded in adherent flasks in traditional media
GSM2473989                                                                  expression levels for cells expanded in adherent flasks in stem cell media
GSM2473990                                                                  expression levels for cells expanded in adherent flasks in stem cell media
GSM2473991                                                                  expression levels for cells expanded in adherent flasks in stem cell media
GSM2473992 expression levels for floating cells obtained from adherent cultures that were expanded in ultra-low attachment flasks with stem cell media
GSM2473993 expression levels for floating cells obtained from adherent cultures that were expanded in ultra-low attachment flasks with stem cell media
GSM2473994                expression levels for cells obtained from adherent cultures and expanded in ultra-low attachment flasks with stem cell media
GSM2473995                expression levels for cells obtained from adherent cultures and expanded in ultra-low attachment flasks with stem cell media
GSM2473996                expression levels for cells obtained from adherent cultures and expanded in ultra-low attachment flasks with stem cell media
> d1

AffyBatch object
size of arrays=1164x1164 features (25 kb)
cdf=HG-U133_Plus_2 (54675 affyids)
number of samples=10
number of genes=54675
annotation=hgu133plus2
notes=
Warning messages:
1: replacing previous import 'AnnotationDbi::tail' by 'utils::tail' when loading 'hgu133plus2cdf' 
2: replacing previous import 'AnnotationDbi::head' by 'utils::head' when loading 'hgu133plus2cdf' 
> eset.rma <- rma(d1)
Background correcting
Normalizing
Calculating Expression
Error in validObject(.Object) : 
  invalid class "ExpressionSet" object: sampleNames differ between phenoData and protocolData
> boxplot(exprs(eset.rma))
Error in exprs(eset.rma) : object 'eset.rma' not found
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: openSUSE 13.1 (Bottle) (x86_64)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] hgu133plus2cdf_2.18.0 affycoretools_1.46.5  affy_1.52.0          
[4] Biobase_2.34.0        BiocGenerics_0.20.0  

loaded via a namespace (and not attached):
  [1] Category_2.40.0               bitops_1.0-6                 
  [3] RColorBrewer_1.1-2            httr_1.2.1                   
  [5] GenomeInfoDb_1.10.3           gcrma_2.46.0                 
  [7] tools_3.3.3                   backports_1.0.5              
  [9] R6_2.2.0                      affyio_1.44.0                
 [11] KernSmooth_2.23-15            rpart_4.1-10                 
 [13] Hmisc_4.0-2                   DBI_0.6                      
 [15] lazyeval_0.2.0                colorspace_1.3-2             
 [17] nnet_7.3-12                   gridExtra_2.2.1              
 [19] GGally_1.3.0                  DESeq2_1.14.1                
 [21] bit_1.1-12                    preprocessCore_1.36.0        
 [23] graph_1.52.0                  htmlTable_1.9                
 [25] rtracklayer_1.34.2            ggbio_1.22.4                 
 [27] caTools_1.17.1                scales_0.4.1                 
 [29] checkmate_1.8.2               genefilter_1.56.0            
 [31] RBGL_1.50.0                   stringr_1.2.0                
 [33] digest_0.6.12                 Rsamtools_1.26.1             
 [35] foreign_0.8-67                R.utils_2.5.0                
 [37] AnnotationForge_1.16.1        XVector_0.14.1               
 [39] base64enc_0.1-3               dichromat_2.0-0              
 [41] htmltools_0.3.5               ensembldb_1.6.2              
 [43] limma_3.30.13                 BSgenome_1.42.0              
 [45] htmlwidgets_0.8               PFAM.db_3.4.0                
 [47] RSQLite_1.1-2                 BiocInstaller_1.24.0         
 [49] shiny_1.0.0                   GOstats_2.40.0               
 [51] hwriter_1.3.2                 gtools_3.5.0                 
 [53] BiocParallel_1.8.1            R.oo_1.21.0                  
 [55] acepack_1.4.1                 VariantAnnotation_1.20.3     
 [57] RCurl_1.95-4.8                magrittr_1.5                 
 [59] GO.db_3.4.0                   Formula_1.2-1                
 [61] oligoClasses_1.36.0           Matrix_1.2-8                 
 [63] Rcpp_0.12.10                  munsell_0.4.3                
 [65] S4Vectors_0.12.2              R.methodsS3_1.7.1            
 [67] stringi_1.1.3                 yaml_2.1.14                  
 [69] edgeR_3.16.5                  SummarizedExperiment_1.4.0   
 [71] zlibbioc_1.20.0               gplots_3.0.1                 
 [73] plyr_1.8.4                    AnnotationHub_2.6.5          
 [75] grid_3.3.3                    gdata_2.17.0                 
 [77] ReportingTools_2.14.0         lattice_0.20-35              
 [79] Biostrings_2.42.1             splines_3.3.3                
 [81] GenomicFeatures_1.26.3        annotate_1.52.1              
 [83] locfit_1.5-9.1                knitr_1.15.1                 
 [85] GenomicRanges_1.26.4          codetools_0.2-15             
 [87] geneplotter_1.52.0            reshape2_1.4.2               
 [89] biomaRt_2.30.0                stats4_3.3.3                 
 [91] XML_3.98-1.5                  biovizBase_1.22.0            
 [93] latticeExtra_0.6-28           data.table_1.10.4            
 [95] foreach_1.4.3                 httpuv_1.3.3                 
 [97] gtable_0.2.0                  reshape_0.8.6                
 [99] assertthat_0.1                ggplot2_2.2.1                
[101] mime_0.5                      xtable_1.8-2                 
[103] ff_2.2-13                     survival_2.41-2              
[105] OrganismDbi_1.16.0            tibble_1.2                   
[107] iterators_1.0.8               GenomicAlignments_1.10.1     
[109] AnnotationDbi_1.36.2          memoise_1.0.0                
[111] IRanges_2.8.2                 cluster_2.0.6                
[113] interactiveDisplayBase_1.12.0 GSEABase_1.36.0

.cel phenotype extraction • 1.5k views

ADD COMMENT • link updated 7.1 years ago by James W. MacDonald 65k • written 7.1 years ago by nia ▴ 30

score 1 · Answer 1 · 2017-04-07

1

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

If you look at the information for the GEO data you are trying to use, the authors summarized the data using rma already. So there is no profit in getting the raw data and re-processing, as you will get the exact same results that you would get if you just used your original 'gse' object.

ADD COMMENT • link 7.1 years ago James W. MacDonald 65k

0

Entering edit mode

so if i do the LIMMA step directly so how to get the entrez id in up and down annotated table.

library(affy)
library(affycoretools)
d1<-ReadAffy()
pData(d1)<-read.table(“pd.txt",row.names=1,header = T,sep="\t");
pData(d1)
d1

library(limma)
pData(d1)[,1 ]
group<- factor(pData(d1)[,1 ], levels =levels(pData(d1)[,1]))
design<- model.matrix(~group)
fit1 <-lmFit(d1, design)
fit1 <-eBayes(fit1)
tab50 <- topTable(fit1, coef = 2, adjust = "fdr",n = 50)
head(tab50, n=2)
probeList <- rownames(exprs(d1));
if (require(hgu133a.db) & require(annotate))
{
geneSymbol <- getSYMBOL(probeList, 'hgu133a.db')
geneName <- sapply(lookUp(probeList, 'hgu133a.db',
'GENENAME'), function(x) x[1])
EntrezID <- sapply(lookUp(probeList, 'hgu133a.db',
'ENTREZID'), function(x) x[1])
} 
numGenes <- nrow(d1);
annotated_table <- topTable (fit1, coef=2,number=numGenes,genelist=fit1$genes);
annotated_table$FC <- ifelse (annotated_table$logFC > 0,
2^annotated_table$logFC, -1/2^annotated_table$logFC);

colnames(annotated_table)[colnames(annotated_table)=="FC"]<-"FoldChange DownS/Norm";

UP_annotated_table <-annotated_table[(annotated_table[,"FoldChange DownS/Norm"] > 0),]
DOWN_annotated_table <-annotated_table[(annotated_table[,"FoldChange DownS/Norm"] < 0),]
head(UP_annotated_table,n=10)
head(DOWN_annotated_table,n=10)

ADD REPLY • link 7.0 years ago nia ▴ 30

0

Entering edit mode

so if i do the LIMMA step directly so how to get the entrez id in up and down annotated table.

library(affy)
library(affycoretools)
d1<-ReadAffy()
pData(d1)<-read.table(“pd.txt",row.names=1,header = T,sep="\t");
pData(d1)
d1

library(limma)
pData(d1)[,1 ]
group<- factor(pData(d1)[,1 ], levels =levels(pData(d1)[,1]))
design<- model.matrix(~group)
fit1 <-lmFit(d1, design)
fit1 <-eBayes(fit1)
tab50 <- topTable(fit1, coef = 2, adjust = "fdr",n = 50)
head(tab50, n=2)
probeList <- rownames(exprs(d1));
if (require(hgu133a.db) & require(annotate))
{
geneSymbol <- getSYMBOL(probeList, 'hgu133a.db')
geneName <- sapply(lookUp(probeList, 'hgu133a.db',
'GENENAME'), function(x) x[1])
EntrezID <- sapply(lookUp(probeList, 'hgu133a.db',
'ENTREZID'), function(x) x[1])
} 
numGenes <- nrow(d1);
annotated_table <- topTable (fit1, coef=2,number=numGenes,genelist=fit1$genes);
annotated_table$FC <- ifelse (annotated_table$logFC > 0,
2^annotated_table$logFC, -1/2^annotated_table$logFC);

colnames(annotated_table)[colnames(annotated_table)=="FC"]<-"FoldChange DownS/Norm";

UP_annotated_table <-annotated_table[(annotated_table[,"FoldChange DownS/Norm"] > 0),]
DOWN_annotated_table <-annotated_table[(annotated_table[,"FoldChange DownS/Norm"] < 0),]
head(UP_annotated_table,n=10)
head(DOWN_annotated_table,n=10)

ADD REPLY • link 7.0 years ago nia ▴ 30

0

Entering edit mode

> gse <- gse[[1]]
> fData(gse) <- fData(gse)[,c(1,2,10:12)]
> grp <- factor(sapply(strsplit(as.character(pData(gse)[,1]), " "), "[", 1))
> grp
 [1] A_set  A_set  AS_set AS_set AS_set F_set  F_set  S_set  S_set  S_set
Levels: A_set AS_set F_set S_set
> design <- model.matrix(~grp)
> fit <- lmFit(gse, design)
> fit2 <- eBayes(fit)

> tt <- topTable(fit2, 2, 50)

> UP <- subset(tt, tt$logFC > 0)

> DOWN <- subset(tt, tt$logFC < 0)

> head(DOWN)
                       ID   GB_ACC
210004_at       210004_at AF035776
236646_at       236646_at BE301029
1560477_a_at 1560477_a_at AK054643
                                                            Gene.Title
210004_at    oxidized low density lipoprotein (lectin-like) receptor 1
236646_at                                    transmembrane protein 52B
1560477_a_at                  sterile alpha motif domain containing 11
             Gene.Symbol ENTREZ_GENE_ID     logFC  AveExpr         t
210004_at           OLR1           4973 -2.495755 3.516644 -14.90119
236646_at        TMEM52B         120939 -1.991936 3.200324 -12.37906
1560477_a_at      SAMD11         148398 -1.385055 5.790050 -11.81502
                  P.Value  adj.P.Val        B
210004_at    3.134046e-07 0.01713540 3.500307
236646_at    1.353597e-06 0.03551374 3.055998
1560477_a_at 1.948628e-06 0.03551374 2.926959

ADD REPLY • link 7.0 years ago James W. MacDonald 65k

0

Entering edit mode

Thankyou James it works i need one more help.

i want to identify and retrieve all DEGs from the data and set the p value<0.05 and fdr<0.01 how can i do this?

" The limma package in R wasused to identify DEGs. Altered expression of probes was determined using t-tests, and theBenjamini-Hochberg method14 was used for multiple test corrections. Probes with expression changes of p < 0.05 and corresponding False Discovery Rate (FDR) < 0.01 were considered to be statistically significant."

ADD REPLY • link 7.0 years ago nia ▴ 30

1

Entering edit mode

You will get farther, faster if you learn how to figure things out for yourself. For example, the help for topTable has information that would be relevant for your question. I'll leave it up to you to read that help page and figure out how to do this.

I should however note that a p-value of 0.05 and an FDR of 0.01 are mutually exclusive, so whomever you are quoting is saying something nonsensical. In other words, if you have a vector of p-values and then you use p.adjust to compute FDR, the resulting FDR values will be greater than or equal to the original p-values, but NEVER smaller.

ADD REPLY • link 7.0 years ago James W. MacDonald 65k