Question

How to annotate mir4.1 arrays?

0

Entering edit mode

richardallenfriedmanbrooklyn ▴ 20

@richardallenfriedmanbrooklyn-24118

Last seen 8 months ago

United States

Dear list.

I am analyzing an Affymetrix mir 4.1 dataset using the pd.mirna.4.1 file obtained by the instructions in the following post:

Affymetrix miRNA4.1 / oligo package / pd.mirna.4.1

library(devtools)
install_github("soumyabrataghosh/pd.mirna.4.1")

I am getting only probeset ids but not mir names or ENTREZ gene ids. Here is my session

> library(oligo)
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call,
    duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map,
    mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce,
    rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min

Loading required package: oligoClasses
Welcome to oligoClasses version 1.60.0
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Loading required package: IRanges
Loading required package: XVector
Loading required package: GenomeInfoDb

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

    strsplit

===================================================================================================================
Welcome to oligo version 1.62.2
===================================================================================================================
> library(affycoretools)
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2

> library(limma)

Attaching package: 'limma'

The following object is masked from 'package:oligo':

    backgroundCorrect

The following object is masked from 'package:BiocGenerics':

    plotMA
> library(pd.mirna.4.1)
Loading required package: RSQLite
Loading required package: DBI
> celfiles  <-  list.celfiles("data",full.names=TRUE)
> raw<-  read.celfiles(celfiles,pkgname="pd.mirna.4.1")
Platform design info loaded.
Reading in : data/a1.ctr.exo.fadu.CEL
.

Reading in : data/e3.tgfb.exo.fadu.CEL
> probeset.eset<-annotateEset(probeset.eset, pd.mirna.4.1, columns = c("PROBEID", "ENTREZID", "SYMBOL", "GENENAME"))
Error: There is no annotation object provided with the pd.mirna.4.1 package.

> sessionInfo( )
R version 4.2.3 (2023-03-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pd.mirna.4.1_0.1     DBI_1.1.3            RSQLite_2.3.1        limma_3.54.2         affycoretools_1.70.0
 [6] oligo_1.62.2         Biostrings_2.66.0    GenomeInfoDb_1.34.9  XVector_0.38.0       IRanges_2.32.0      
[11] S4Vectors_0.36.2     Biobase_2.58.0       oligoClasses_1.60.0  BiocGenerics_0.44.0 

loaded via a namespace (and not attached):
  [1] backports_1.4.1             GOstats_2.64.0              Hmisc_5.0-1                
  [4] BiocFileCache_2.6.1         plyr_1.8.8                  lazyeval_0.2.2             
  [7] GSEABase_1.60.0             splines_4.2.3               BiocParallel_1.32.6        
 [10] ggplot2_3.4.2               digest_0.6.31               foreach_1.5.2              
 [13] ensembldb_2.22.0            htmltools_0.5.5             GO.db_3.16.0               
 [16] fansi_1.0.4                 magrittr_2.0.3              checkmate_2.2.0            
 [19] memoise_2.0.1               BSgenome_1.66.3             cluster_2.1.4              
 [22] gcrma_2.70.0                annotate_1.76.0             matrixStats_0.63.0         
 [25] R.utils_2.12.2              ggbio_1.46.0                prettyunits_1.1.1          
 [28] colorspace_2.1-0            blob_1.2.4                  rappdirs_0.3.3             
 [31] xfun_0.39                   dplyr_1.1.2                 crayon_1.5.2               
 [34] RCurl_1.98-1.12             jsonlite_1.8.4              graph_1.76.0               
 [37] genefilter_1.80.3           survival_3.5-5              VariantAnnotation_1.44.1   
 [40] iterators_1.0.14            glue_1.6.2                  gtable_0.3.3               
 [43] zlibbioc_1.44.0             DelayedArray_0.24.0         Rgraphviz_2.42.0           
 [46] scales_1.2.1                GGally_2.1.2                edgeR_3.40.2               
 [49] Rcpp_1.0.10                 xtable_1.8-4                progress_1.2.2             
 [52] htmlTable_2.4.1             foreign_0.8-84              bit_4.0.5                  
 [55] OrganismDbi_1.40.0          preprocessCore_1.60.2       Formula_1.2-5              
 [58] AnnotationForge_1.40.2      htmlwidgets_1.6.2           httr_1.4.5                 
 [61] gplots_3.1.3                RColorBrewer_1.1-3          ff_4.0.9                   
 [64] R.methodsS3_1.8.2           pkgconfig_2.0.3             reshape_0.8.9              
 [67] XML_3.99-0.14               nnet_7.3-19                 dbplyr_2.3.2               
 [70] locfit_1.5-9.7              utf8_1.2.3                  tidyselect_1.2.0           
 [73] rlang_1.1.1                 reshape2_1.4.4              AnnotationDbi_1.60.2       
 [76] munsell_0.5.0               tools_4.2.3                 cachem_1.0.8               
 [79] cli_3.6.1                   generics_0.1.3              evaluate_0.20              
 [82] stringr_1.5.0               fastmap_1.1.1               yaml_2.3.7                 
 [85] knitr_1.42                  bit64_4.0.5                 caTools_1.18.2             
 [88] KEGGREST_1.38.0             AnnotationFilter_1.22.0     RBGL_1.74.0                
 [91] R.oo_1.25.0                 xml2_1.3.4                  biomaRt_2.54.1             
 [94] compiler_4.2.3              rstudioapi_0.14             filelock_1.0.2             
 [97] curl_5.0.0                  png_0.1-8                   affyio_1.68.0              
[100] PFAM.db_3.16.0              tibble_3.2.1                geneplotter_1.76.0         
[103] stringi_1.7.12              Glimma_2.8.0                GenomicFeatures_1.50.4     
[106] lattice_0.21-8              ProtGenerics_1.30.0         Matrix_1.5-4               
[109] vctrs_0.6.2                 pillar_1.9.0                lifecycle_1.0.3            
[112] BiocManager_1.30.20         data.table_1.14.8           bitops_1.0-7               
[115] rtracklayer_1.58.0          GenomicRanges_1.50.2        affy_1.76.0                
[118] hwriter_1.3.2.1             R6_2.5.1                    BiocIO_1.8.0               
[121] KernSmooth_2.23-21          gridExtra_2.3               affxparser_1.70.0          
[124] codetools_0.2-19            dichromat_2.0-0.1           gtools_3.9.4               
[127] SummarizedExperiment_1.28.0 DESeq2_1.38.3               Category_2.64.0            
[130] rjson_0.2.21                ReportingTools_2.38.0       GenomicAlignments_1.34.1   
[133] Rsamtools_2.14.0            GenomeInfoDbData_1.2.9      parallel_4.2.3             
[136] hms_1.1.3                   grid_4.2.3                  rpart_4.1.19               
[139] rmarkdown_2.21              MatrixGenerics_1.10.0       biovizBase_1.46.0          
[142] base64enc_0.1-3             restfulr_0.0.15            
>

How do I get the miR symbols and ENTREZ GENEIDS corrsponding to the probe ids?

Thanks and best wishes,

Richard Friedman.

Columbia University Cancer Center

AffymetrixChip miRNA oligo • 662 views

ADD COMMENT • link 11 months ago • updated 8 months ago richardallenfriedmanbrooklyn ▴ 20

0

Entering edit mode

Dear List,

I ended up reading in the annotation csv file from Affy and subsetting it, and merging it with the toptable file from limma.

Best wishes,

Rich

ADD REPLY • link 11 months ago richardallenfriedmanbrooklyn ▴ 20

score 1 · Answer 1 · 2023-05-09

You haven't yet run rma on your data, so you cannot annotate the data yet. Once you have run rma, you can annotate using the csv file from ThermoFisher. (you will need a login for this).

> eset <- rma(raw)
## note that you need to specify coment.char!
> anno <- read.csv(("TFS-Assets_LSG_Support-Files_miRNA-4_1-st-v1-annotations-20160922-csv/miRNA-4_1-st-v1.annotations.20160922.csv", comment.char = "#")
> anno <- anno[,2:4]
> eset <- annotateEset(eset, anno, 1, 2:3)
## et voila!

As an aside, this is all documented in the help page for annotateEset

Usage:

     annotateEset(object, x, ...)

     ## S4 method for signature 'ExpressionSet,ChipDb'
     annotateEset(
       object,
       x,
       columns = c("PROBEID", "ENTREZID", "SYMBOL", "GENENAME"),
       multivals = "first"
     )

     ## S4 method for signature 'ExpressionSet,AffyGenePDInfo'
     annotateEset(object, x, type = "core", ...)

     ## S4 method for signature 'ExpressionSet,AffyHTAPDInfo'
     annotateEset(object, x, type = "core", ...)

     ## S4 method for signature 'ExpressionSet,AffyExonPDInfo'
     annotateEset(object, x, type = "core", ...)

     ## S4 method for signature 'ExpressionSet,AffyExpressionPDInfo'
     annotateEset(object, x, type = "core", ...)

     ## S4 method for signature 'ExpressionSet,character'
     annotateEset(object, x, ...)

     ## S4 method for signature 'ExpressionSet,data.frame'
     annotateEset(object, x, probecol = NULL, annocols = NULL, ...) <------------- This part here

Arguments:

  object: An ExpressionSet to which we want to add annotation.

       x: Either a ChipDb package (e.g.,
          hugene10sttranscriptcluster.db), or a pdInfoPackage object
          (e.g., pd.hugene.1.0.st.v1).

     ...: Allow users to pass in arbitrary arguments. Particularly
          useful for passing in columns, multivals, and type arguments
          for methods.

 columns: For ChipDb method; what annotation data to add. Use the
          'columns' function to see what choices you have. By default
          we get the ENTREZID, SYMBOL and GENENAME.

multivals: For ChipDb method; this is passed to 'mapIds' to control how
          1:many mappings are handled. The default is 'first', which
          takes just the first result. Other valid values are 'list'
          and 'CharacterList', which return all mapped results.

    type: For pdInfoPackages; either 'core' or 'probeset',
          corresponding to the 'target' argument used in the call to
          'rma'.

probecol: Column of the data.frame that contains the probeset IDs. Can <---------------- As well as this entry and the following one
          be either numeric (the column number) or character (the
          column header).

annocols: Column(x) of the data.frame to use for annotating. Can be a
          vector of numbers (which column numbers to use) or a
          character vector (vector of column names).