Entering edit mode
Hugh Shanahan
▴
50
@hugh-shanahan-5565
Last seen 10.2 years ago
Thanks Jack !
Sent from my phone so apologies for my brevity and frequent typos.
-------- Original message --------
From: Jack Zhu <zhujack@mail.nih.gov>
Date: 17/06/2014 20:31 (GMT+00:00)
To: Sean Davis <sdavis2@mail.nih.gov>,"Shanahan, Hugh"
<hugh.shanahan@rhul.ac.uk>,bioconductor@r-project.org,Jamie.Al-
Nasir.2012@live.rhul.ac.uk
Subject: Re: [BioC] SRAmetadb Bioconductor package; study record count
low for 2013
Hi Jamie and all,
By modifying my codes and pulling data from a curated table
(SRA_Accessions.tab) from the SRA, I think the missing
'submission_date' in the submission table have been fixed:
 strftime('%Y', s.submission_date) count(*)
1Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 2008Â Â Â
  348
2Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 2009Â Â Â
 1260
3Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 2010Â Â Â
 2865
4Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 2011Â Â Â
 4276
5Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 2012Â Â Â
 6606
6Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 2013Â Â Â
15309
7Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 2014Â Â Â
 7706
Please let me know if you still see any problems or have any
questions. Thanks.
Jack
----
Yuelin Jack Zhu
Genetics Branch/CCR/NCI/NIH
Tel: (301)496-4527
FAX: (301) 402-3241
E-mail: zhujack@mail.nih.gov
On Sun, Jun 8, 2014 at 11:11 AM, Jack Zhu <zhujack@mail.nih.gov>
wrote:
> Hi all,
>
> Regarding missing studies by submission_date for 2013 and 2014 in
the
> SRAdb SQLite database, I did some investigation and found the
reason.
> The metadata in the SRAdb is mainly parsed from the XML files of the
> SRA submissions and it is true with the submission table. But I
see
> quite some submission xml files don't have submission date, e.g.
>
> ftp://ftp-trace.ncbi.nih.gov/sra/Submissions/SRA157/SRA157949/
>
>Â Â SRA157949.experiment.xml
>Â Â SRA157949.submission.xml
>
> So it seem all the study and submission records are there, but some
> submission records just don't submission date. I am looking into
the
> possibility of adding dates for those records.
>
> Jamie, thanks for the finding and I will keep you updated.
>
> Jack
>
>
> On Fri, Jun 6, 2014 at 3:49 PM, Sean Davis <sdavis2@mail.nih.gov>
wrote:
>> Hi, Jack.
>>
>> I took a look at this and it does appear that the number of
>> submissions is very low for 2013. Also, there are no 2014
submissions
>> listed that I could find. This was using the June 1, 2014 sqlite
>> file.
>>
>> Sean
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Al-Nasir, Jamie (2012) <jamie.al-nasir.2012@live.rhul.ac.uk>
>> Date: Thu, Jun 5, 2014 at 2:20 PM
>> Subject: [BioC] SRAmetadb Bioconductor package; study record count
low for 2013
>> To: "bioconductor@r-project.org" <bioconductor@r-project.org>
>> Cc: "Shanahan, Hugh" <hugh.shanahan@rhul.ac.uk>
>>
>>
>> Hello,
>>
>>
>> I have been looking at the SRA (Sequence Read Archive) SQLite
database
>>
>> provided as a Bioconductor package for R.
>>
>>
>> My question concerns top-level studies, which are found in the
study table
>>
>> and dated in the submissions table.
>>
>>
>> The question is why are there so few entries for the top level
studies for 2013
>>
>> as compared with 2011 and 2012....
>>
>>
>> The SQL queries I have written, joining the Submission table and
Study table
>>
>> in order to obtain the submission_date yield the following counts
of top-level
>>
>> studies by year....
>>
>>
>> 2005|64
>> 2006|38
>> 2007|94
>> 2008|269
>> 2009|893
>> 2010|2631
>> 2011|4077
>> 2012|5208
>> 2013|724
>>
>>
>> As one can see the number of studies in the meta-data falls off on
2013.
>>
>> I have been using the sraDB bioconductor SQLite database which has
>>
>> the creation timestamp of 2013-12-03 08:29:26 in the metaInfo
table.
>>
>>
>> Would really appreciate if anyone has any useful thoughts on this.
>>
>>
>> Best regards,
>>
>> Jamie
>>
>> Jamie Al-Nasir MPharm (Hons)
>> Department of Computer Science
>> Centre for Systems and Synthetic Biology
>> Mobile: +44 (0)759 4800 229
>> Web: http://jamie.al-nasir.com/
>>
>>Â Â Â Â Â Â Â Â [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]