Question

Really weird results when I filter this new data that I have, script has worked on two previous data sets.

0

Entering edit mode

hakimelakhrass • 0

@hakimelakhrass-9661

Last seen 7.1 years ago

So this is my script and I have used it in the past. The filtering and then getting my list of DEG. The list contains only pos_controls from affymetrix but no main probes. Which I do not understand because I thought I removed them with the filter step which works on my other 2 datasets. Thanks!

mydir <- "C:\\Users\\hakim\\Documents\\hanane_data"

#listing the files from directory using special CEL file read function
celList <- list.celfiles(mydir, full.names=TRUE)
#reading data from cellist and setting annotation package to approiate one for this microarray
rawData <- read.celfiles(celList, pkgname='pd.hugene.2.0.st')
#normalizing the data using RMA algorithm
normData <- rma(rawData, target="core")
#retreaving feature data
featureData(normData) <- getNetAffx(normData, "transcript")

#the respective experimental groups of your data
group <- factor(c((rep.int(0,4)),rep.int(1,4),rep.int(2,4)
                  ,rep.int(3,4),rep.int(4,4),(rep.int(5,4)),rep.int(6,4),rep.int(7,4)))
#design and contrast matrix of the data
design <- model.matrix(~ 0 + group)
colnames(design) <- c("ID01","I01","ID005","I005", "ID3","I3","CtlID","CtlI")
contrast <- makeContrasts( "ID01-ID005","ID01-ID3","ID005-ID3","ID01-CtlID","ID005-CtlID","ID3-CtlID",
                           "I01-I005","I01-I3","I005-I3","I01-CtlI","I005-CtlI","I3-CtlI",
                           "ID01-I01","ID005-I005","ID3-I3","CtlI-CtlID",
                           levels= design )
eset <- getMainProbes(normData)
normData.filtered <- nsFilter(eset, require.entrez = FALSE,
                              remove.dupEntrez = FALSE)

normfit <-eBayes( contrasts.fit( lmFit(normData.filtered$eset, design), contrast) )

#getting the list of probes
probeset.list <-topTable(normfit,coef="ID01-ID3",number=100000, adjust="BH", lfc=1)

R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] hugenedeg_1.0                        affycoretools_1.42.0                 hugene20sttranscriptcluster.db_8.4.0
 [4] org.Hs.eg.db_3.2.3                   annotate_1.48.0                      XML_3.98-1.3                        
 [7] AnnotationDbi_1.32.3                 genefilter_1.52.1                    pd.hugene.2.0.st_3.14.1             
[10] oligo_1.34.2                         Biostrings_2.38.4                    XVector_0.10.0                      
[13] IRanges_2.4.7                        S4Vectors_0.8.11                     oligoClasses_1.32.0                 
[16] limma_3.26.8                         Biobase_2.30.0                       BiocGenerics_0.16.1                 
[19] RSQLite_1.0.0                        DBI_0.3.1                           

loaded via a namespace (and not attached):
 [1] Category_2.36.0            bitops_1.0-6               RColorBrewer_1.1-2         GenomeInfoDb_1.6.3        
 [5] gcrma_2.42.0               tools_3.2.3                affyio_1.40.0              KernSmooth_2.23-15        
 [9] rpart_4.1-10               Hmisc_3.17-2               colorspace_1.2-6           nnet_7.3-12               
[13] gridExtra_2.0.0            GGally_1.0.1               DESeq2_1.10.1              bit_1.1-12                
[17] preprocessCore_1.32.0      graph_1.48.0               rtracklayer_1.30.2         ggbio_1.18.5              
[21] caTools_1.17.1             scales_0.3.0               affy_1.48.0                RBGL_1.46.0               
[25] stringr_1.0.0              Rsamtools_1.22.0           foreign_0.8-66             R.utils_2.2.0             
[29] AnnotationForge_1.12.2     dichromat_2.0-0            BSgenome_1.38.0            PFAM.db_3.2.2             
[33] BiocInstaller_1.20.1       GOstats_2.36.0             hwriter_1.3.2              gtools_3.5.0              
[37] BiocParallel_1.4.3         R.oo_1.20.0                acepack_1.3-3.3            VariantAnnotation_1.16.4  
[41] RCurl_1.95-4.7             magrittr_1.5               GO.db_3.2.2                Formula_1.2-1             
[45] futile.logger_1.4.1        Matrix_1.2-3               Rcpp_0.12.3                munsell_0.4.3             
[49] R.methodsS3_1.7.1          stringi_1.0-1              edgeR_3.12.0               SummarizedExperiment_1.0.2
[53] zlibbioc_1.16.0            gplots_2.17.0              plyr_1.8.3                 grid_3.2.3                
[57] affxparser_1.42.0          gdata_2.17.0               ReportingTools_2.10.0      lattice_0.20-33           
[61] splines_3.2.3              GenomicFeatures_1.22.13    locfit_1.5-9.1             knitr_1.12.3              
[65] GenomicRanges_1.22.4       geneplotter_1.48.0         reshape2_1.4.1             codetools_0.2-14          
[69] biomaRt_2.26.1             futile.options_1.0.0       RcppArmadillo_0.6.500.4.0  biovizBase_1.18.0         
[73] latticeExtra_0.6-28        lambda.r_1.1.7             foreach_1.4.3              gtable_0.1.2              
[77] reshape_0.8.5              ggplot2_2.0.0              xtable_1.8-2               ff_2.2-13                 
[81] survival_2.38-3            OrganismDbi_1.12.1         iterators_1.0.8            GenomicAlignments_1.6.3   
[85] cluster_2.0.3              GSEABase_1.32.0

microarray limma genefilter • 953 views

ADD COMMENT • link updated 8.2 years ago by James W. MacDonald 65k • written 8.2 years ago by hakimelakhrass • 0

score 0 · Answer 1 · 2016-03-03

Debugging your own scripts is really up to you as an analyst. Things look OK to me however:

> eset2 <- rma(read.celfiles(list.celfiles())
> annotation(eset2)
 [1] "pd.mogene.2.0.st" 
> featureData(eset2) <- getNetAffx(eset2, "transcript")
> library(affycoretools)    
> eset2 <- getMainProbes(eset2)
> table(pData(featureData(eset2))$category)
           main normgene->exon  
          33793           1352

And you should expect some normgene->exon probesets in this table, as they do double-duty as normgene->exon and main probesets on this array, and the Affy transcript csv labels them as normgene rather than main probesets.