SRAmetadb Bioconductor package; study record count low for 2013
0
0
Entering edit mode
@hugh-shanahan-5565
Last seen 9.6 years ago
Thanks Jack ! Sent from my phone so apologies for my brevity and frequent typos. -------- Original message -------- From: Jack Zhu <zhujack@mail.nih.gov> Date: 17/06/2014 20:31 (GMT+00:00) To: Sean Davis <sdavis2@mail.nih.gov>,"Shanahan, Hugh" <hugh.shanahan@rhul.ac.uk>,bioconductor@r-project.org,Jamie.Al- Nasir.2012@live.rhul.ac.uk Subject: Re: [BioC] SRAmetadb Bioconductor package; study record count low for 2013 Hi Jamie and all, By modifying my codes and pulling data from a curated table (SRA_Accessions.tab) from the SRA, I think the missing 'submission_date' in the submission table have been fixed:   strftime('%Y', s.submission_date) count(*) 1                              2008      348 2                              2009     1260 3                              2010     2865 4                              2011     4276 5                              2012     6606 6                              2013   15309 7                              2014     7706 Please let me know if you still see any problems or have any questions.  Thanks. Jack ---- Yuelin Jack Zhu Genetics Branch/CCR/NCI/NIH Tel: (301)496-4527 FAX: (301) 402-3241 E-mail: zhujack@mail.nih.gov On Sun, Jun 8, 2014 at 11:11 AM, Jack Zhu <zhujack@mail.nih.gov> wrote: > Hi all, > > Regarding missing studies by submission_date for 2013 and 2014 in the > SRAdb SQLite database, I did some investigation and found the reason. > The metadata in the SRAdb is mainly parsed from the XML files of the > SRA submissions and it is true with the submission table.  But I see > quite some submission xml files don't have submission date, e.g. > > ftp://ftp-trace.ncbi.nih.gov/sra/Submissions/SRA157/SRA157949/ > >   SRA157949.experiment.xml >   SRA157949.submission.xml > > So it seem all the study and submission records are there, but some > submission records just don't submission date.  I am looking into the > possibility of adding dates for those records. > > Jamie, thanks for the finding and I will keep you updated. > > Jack > > > On Fri, Jun 6, 2014 at 3:49 PM, Sean Davis <sdavis2@mail.nih.gov> wrote: >> Hi, Jack. >> >> I took a look at this and it does appear that the number of >> submissions is very low for 2013.  Also, there are no 2014 submissions >> listed that I could find.  This was using the June 1, 2014 sqlite >> file. >> >> Sean >> >> >> >> ---------- Forwarded message ---------- >> From: Al-Nasir, Jamie (2012) <jamie.al-nasir.2012@live.rhul.ac.uk> >> Date: Thu, Jun 5, 2014 at 2:20 PM >> Subject: [BioC] SRAmetadb Bioconductor package; study record count low for 2013 >> To: "bioconductor@r-project.org" <bioconductor@r-project.org> >> Cc: "Shanahan, Hugh" <hugh.shanahan@rhul.ac.uk> >> >> >> Hello, >> >> >> I have been looking at the SRA (Sequence Read Archive) SQLite database >> >> provided as a Bioconductor package for R. >> >> >> My question concerns top-level studies, which are found in the study table >> >> and dated in the submissions table. >> >> >> The question is why are there so few entries for the top level studies for 2013 >> >> as compared with 2011 and 2012.... >> >> >> The SQL queries I have written, joining the Submission table and Study table >> >> in order to obtain the submission_date yield the following counts of top-level >> >> studies by year.... >> >> >> 2005|64 >> 2006|38 >> 2007|94 >> 2008|269 >> 2009|893 >> 2010|2631 >> 2011|4077 >> 2012|5208 >> 2013|724 >> >> >> As one can see the number of studies in the meta-data falls off on 2013. >> >> I have been using the sraDB bioconductor SQLite database which has >> >> the creation timestamp of 2013-12-03 08:29:26 in the metaInfo table. >> >> >> Would really appreciate if anyone has any useful thoughts on this. >> >> >> Best regards, >> >> Jamie >> >> Jamie Al-Nasir MPharm (Hons) >> Department of Computer Science >> Centre for Systems and Synthetic Biology >> Mobile: +44 (0)759 4800 229 >> Web: http://jamie.al-nasir.com/ >> >>         [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
SRAdb SRAdb • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 602 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6