Question: Really weird results when I filter this new data that I have, script has worked on two previous data sets.
gravatar for hakimelakhrass
19 months ago by
hakimelakhrass0 wrote:

So this is my script and I have used it in the past. The filtering and then getting my list of DEG. The list contains only pos_controls from affymetrix but no main probes. Which I do not understand because I thought I removed them with the filter step which works on my other 2 datasets. Thanks!

mydir <- "C:\\Users\\hakim\\Documents\\hanane_data"

#listing the files from directory using special CEL file read function
celList <- list.celfiles(mydir, full.names=TRUE)
#reading data from cellist and setting annotation package to approiate one for this microarray
rawData <- read.celfiles(celList, pkgname='')
#normalizing the data using RMA algorithm
normData <- rma(rawData, target="core")
#retreaving feature data
featureData(normData) <- getNetAffx(normData, "transcript")

#the respective experimental groups of your data
group <- factor(c((,4)),,4),,4)
#design and contrast matrix of the data
design <- model.matrix(~ 0 + group)
colnames(design) <- c("ID01","I01","ID005","I005", "ID3","I3","CtlID","CtlI")
contrast <- makeContrasts( "ID01-ID005","ID01-ID3","ID005-ID3","ID01-CtlID","ID005-CtlID","ID3-CtlID",
                           levels= design )
eset <- getMainProbes(normData)
normData.filtered <- nsFilter(eset, require.entrez = FALSE,
                              remove.dupEntrez = FALSE)

normfit <-eBayes( lmFit(normData.filtered$eset, design), contrast) )

#getting the list of probes
probeset.list <-topTable(normfit,coef="ID01-ID3",number=100000, adjust="BH", lfc=1)

R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] hugenedeg_1.0                        affycoretools_1.42.0                 hugene20sttranscriptcluster.db_8.4.0
 [4]                   annotate_1.48.0                      XML_3.98-1.3                        
 [7] AnnotationDbi_1.32.3                 genefilter_1.52.1                    pd.hugene.2.0.st_3.14.1             
[10] oligo_1.34.2                         Biostrings_2.38.4                    XVector_0.10.0                      
[13] IRanges_2.4.7                        S4Vectors_0.8.11                     oligoClasses_1.32.0                 
[16] limma_3.26.8                         Biobase_2.30.0                       BiocGenerics_0.16.1                 
[19] RSQLite_1.0.0                        DBI_0.3.1                           

loaded via a namespace (and not attached):
 [1] Category_2.36.0            bitops_1.0-6               RColorBrewer_1.1-2         GenomeInfoDb_1.6.3        
 [5] gcrma_2.42.0               tools_3.2.3                affyio_1.40.0              KernSmooth_2.23-15        
 [9] rpart_4.1-10               Hmisc_3.17-2               colorspace_1.2-6           nnet_7.3-12               
[13] gridExtra_2.0.0            GGally_1.0.1               DESeq2_1.10.1              bit_1.1-12                
[17] preprocessCore_1.32.0      graph_1.48.0               rtracklayer_1.30.2         ggbio_1.18.5              
[21] caTools_1.17.1             scales_0.3.0               affy_1.48.0                RBGL_1.46.0               
[25] stringr_1.0.0              Rsamtools_1.22.0           foreign_0.8-66             R.utils_2.2.0             
[29] AnnotationForge_1.12.2     dichromat_2.0-0            BSgenome_1.38.0            PFAM.db_3.2.2             
[33] BiocInstaller_1.20.1       GOstats_2.36.0             hwriter_1.3.2              gtools_3.5.0              
[37] BiocParallel_1.4.3         R.oo_1.20.0                acepack_1.3-3.3            VariantAnnotation_1.16.4  
[41] RCurl_1.95-4.7             magrittr_1.5               GO.db_3.2.2                Formula_1.2-1             
[45] futile.logger_1.4.1        Matrix_1.2-3               Rcpp_0.12.3                munsell_0.4.3             
[49] R.methodsS3_1.7.1          stringi_1.0-1              edgeR_3.12.0               SummarizedExperiment_1.0.2
[53] zlibbioc_1.16.0            gplots_2.17.0              plyr_1.8.3                 grid_3.2.3                
[57] affxparser_1.42.0          gdata_2.17.0               ReportingTools_2.10.0      lattice_0.20-33           
[61] splines_3.2.3              GenomicFeatures_1.22.13    locfit_1.5-9.1             knitr_1.12.3              
[65] GenomicRanges_1.22.4       geneplotter_1.48.0         reshape2_1.4.1             codetools_0.2-14          
[69] biomaRt_2.26.1             futile.options_1.0.0       RcppArmadillo_0.6.500.4.0  biovizBase_1.18.0         
[73] latticeExtra_0.6-28        lambda.r_1.1.7             foreach_1.4.3              gtable_0.1.2              
[77] reshape_0.8.5              ggplot2_2.0.0              xtable_1.8-2               ff_2.2-13                 
[81] survival_2.38-3            OrganismDbi_1.12.1         iterators_1.0.8            GenomicAlignments_1.6.3   
[85] cluster_2.0.3              GSEABase_1.32.0 


ADD COMMENTlink modified 19 months ago by James W. MacDonald44k • written 19 months ago by hakimelakhrass0
gravatar for James W. MacDonald
19 months ago by
United States
James W. MacDonald44k wrote:

Debugging your own scripts is really up to you as an analyst. Things look OK to me however:

> eset2 <- rma(read.celfiles(list.celfiles())
> annotation(eset2)
 [1] "" 
> featureData(eset2) <- getNetAffx(eset2, "transcript")
> library(affycoretools)    
> eset2 <- getMainProbes(eset2)
> table(pData(featureData(eset2))$category)
           main normgene->exon  
          33793           1352             

And you should expect some normgene->exon probesets in this table, as they do double-duty as normgene->exon and main probesets on this array, and the Affy transcript csv labels them as normgene rather than main probesets.

ADD COMMENTlink written 19 months ago by James W. MacDonald44k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 190 users visited in the last hour