Dear administrators,
I'm analyzing an apple (Malus x domestica) fruit dataset I downloaded from NCBI-GEO, accession # GSE24523. This experiment used a Nimblegen Microarray, platform GPL11164 and NimbleGen design name: 080501_GDR_Malus_EST-V4_EXPNimbleGen design ID: 7552.
Using the oligo and pdInfoBuilder packages I created an eset after converting the pair files to the xys files:
> eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 193586 features, 24 samples
element names: exprs
protocolData
rowNames: GSM618107_14418002_532.xys GSM618108_12742302_532.xys ... GSM618130_12782502_532.xys (24 total)
varLabels: exprs dates
varMetadata: labelDescription channel
phenoData
rowNames: GSM618107_14418002_532.xys GSM618108_12742302_532.xys ... GSM618130_12782502_532.xys (24 total)
varLabels: Timepoint
varMetadata: channel labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.080501.gdr.malus.est.v4.exp
I'm able to construct volcano plots and use limma to test for differential expression, however, I'm not able to obtain the feature data of the assay data features. I do have the pmSequence nucleotide strings, width and sequence:
> head(pmSeqHC) A DNAStringSet instance of length 6 width seq [1] 65 GTGTGAAACATGTTTGGGCACCATCAAATCTCAGACTATTATCTTTAGATGATAACATGATTCAA [2] 54 TCTTTTCGGAATGTGAGAAATAGTGGTGTTTTCACTTTCTCTCGCATTTCAACT [3] 50 GCCGTAACAACCCTGTGTGAGGCGGATTATGAAACAAGTAAAATGAGTGT [4] 51 ATCTCTAGAATCTCAAGGTGGCCCTCTCTCACTGCTGCTGTTGTCGCGAAG [5] 57 CAAAGACTTGTCCTTCTTTATATTCTATTCAGCACTTTTGCTCTCGCCAGTGGCATT [6] 55 GTCCACTGAGGAATTTATTTCTATGGAATTCCGTTTTACTTTAAGAGATAATGGA
As well as the feature names:
> head(ID) [1] "add_Affy_dap_1_1031" "add_Affy_dap_1_109" "add_Affy_dap_1_1096" "add_Affy_dap_1_1198" "add_Affy_dap_1_1249" "add_Affy_dap_1_1300"
And the limma result rownames:
> head(tab) logFC AveExpr t P.Value adj.P.Val B Contig9003_2_r_204_2_696 2.476124 5.441725 13.398555 1.322361e-12 2.559905e-07 17.05329 Malus_CN941480_2_f_292_5_373 1.764380 5.897758 10.625992 1.558974e-10 1.508977e-05 13.27949 Contig23165_2_r_145_8_238 1.639928 9.028845 9.973012 5.425434e-10 3.011584e-05 12.24010 Contig20209_1_f_261_1_795 2.927348 6.719980 9.902833 6.222731e-10 3.011584e-05 12.12464 Contig16255_2_f_213_3_301 2.496227 7.413003 9.229728 2.391889e-09 9.260724e-05 10.97910 Contig16255_1_r_157_2_270 2.584128 7.658646 9.133751 2.911828e-09 9.394820e-05 10.81005
My current issue is that I'm not able to obtain any feature data online when using BiomaRt to obtain the annotated result. I've tried plants.ensembl.org and well as plantdb.org. I also tried rosaceae.org but it seems they don't have the feature data on their database although this group constructed the Nimblegen array. Below is an example of the many variations of code I tried to access the mapped features to the array:
ID = featureNames(eset)
length(ID)
[1] 193586
plantbase <- useMart(biomart = "plants_mart", host = "plants.ensembl.org")
plantbase <- useDataset(mart = plantbase, dataset = "ptrichocarpa_eg_gene")
listDatasets(plantbase)
listAttributes(plantbase)
listFilters(plantbase)
getBM(attributes = c("ensembl_transcript_id", "uniprot_swissprot_accession"),
filters="ensembl_transcript_id",
values= ID[1:5],
mart=plantbase)
This is the typical output I get after my query, and is without errors:
> getBM(attributes = c("ensembl_transcript_id", "uniprot_swissprot_accession"),
+ filters="ensembl_transcript_id",
+ values= ID[1:5],
+ mart=plantbase)
[1] ensembl_transcript_id uniprot_swissprot_accession
<0 rows> (or 0-length row.names)
Any help or insight is greatly appreciated. Thanks, Franklin
> sessionInfo() R version 3.3.1 (2016-06-21) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] R6_2.2.0 rentrez_1.0.4 annotate_1.50.0 XML_3.98-1.4 [5] AnnotationDbi_1.34.4 biomaRt_2.28.0 genefilter_1.54.2 pd.080501.gdr.malus.est.v4.exp_0.0.1 [9] limma_3.28.18 pdInfoBuilder_1.36.0 affxparser_1.44.0 RSQLite_1.0.0 [13] DBI_0.5 oligo_1.36.1 Biostrings_2.40.2 XVector_0.12.1 [17] IRanges_2.6.1 S4Vectors_0.10.3 Biobase_2.32.0 oligoClasses_1.34.0 [21] BiocGenerics_0.18.0 loaded via a namespace (and not attached): [1] bit_1.1-12 codetools_0.2-14 preprocessCore_1.34.0 splines_3.3.1 curl_1.2 [6] grid_3.3.1 bitops_1.0-6 httr_1.2.1 survival_2.39-4 zlibbioc_1.18.0 [11] lattice_0.20-33 foreach_1.4.3 iterators_1.0.8 Matrix_1.2-6 KernSmooth_2.23-15 [16] jsonlite_1.0 BiocInstaller_1.22.3 SummarizedExperiment_1.2.3 RCurl_1.95-4.8 tools_3.3.1 [21] affyio_1.42.0 GenomeInfoDb_1.8.3 GenomicRanges_1.24.2 ff_2.2-13 xtable_1.8-2 >
First of all, I don't know anything on the organism you are investigating. However, using a simple Google search I found this site (@ rosaceae) https://www.rosaceae.org/species/malus/malus_spp/unigene_v4 , which seems to contain (a little) more annotation info for the v4 release. That is, at the download section BLAST results files are available that map GDR_ID (e.g. Malus_v4_Contig1) to for example SwissProt IDs. I think this is the kind of info you are after.
Having said this, I realize the info is rather outdated. Although a V5 annotation release seems to be available, it is not clear to me how this can be related the the Nimblegene array you would like to analyze. Except for getting in touch with the ppl that maintain the GDR, I cannot think about a solution for this....
This is always the problem with a non-model organism. You can use whatever annotations someone else did (and for which you have little knowledge of how they did the annotations), or you can 'roll your own'. It wouldn't take much time to blast those sequences against nt, for what that's worth.
Thanks Guido. The MDP numbers don't match any other database than rosaceae-GDR. As per James's comment, I rolled my own, and did the heavy lifting to get the Entrez Gene IDs. It was tedious, but learned a lot while doing it. Thanks guys!!