Question

According to Recount2 metadata and Recount2 website, SRP000941 has 345 samples, but after downloading the study the rse_gene object only has 343 samples?

0

Entering edit mode

Kayla • 0

@kayla-15261

Last seen 6.1 years ago

I am attempting to use multiple projects from recount2 and can't figure out why I can only get 343 samples downloaded from SRP000941 yet according to all_metadata() and the recount2 website SRP000941 has 545 samples. The R code showing my issue is posted below. Thanks in advance for any help.

#download all metadata from recount, make into tibble called tblmetadata metadata <- all_metadata() tblmetadata <- as_tibble(metadata)

#do some metadata processing, decide SRP000941 is one of the studies I want to use #according to processing (and recount website) SRP000941 has 345 samples test <- tblmetadata %>% filter(project == "SRP000941") dim(test) #dim(test) outputs 345 21

#download and load rse_gene from SRP000941 download_study("SRP000941", type = "rse-gene") load(file.path("SRP000941", "rse_gene.Rdata"))

#check dimensions of rse_gene just downloaded dim(rse_gene) #output is 58037 343 dim(assays(rse_gene)$counts) #again output is 58037 343

recount • 1.2k views

ADD COMMENT • link updated 6.1 years ago by Leonardo Collado Torres ★ 1.0k • written 6.1 years ago by Kayla • 0

score 0 · Answer 1 · 2018-03-16

Hi,

This question is related to https://github.com/leekgroup/recount-website/issues/11.

The following R code

library('recount')
metadata <- all_metadata()
m <- subset(metadata, project == 'SRP000941')

download_study("SRP000941", type = "rse-gene")
load(file.path("SRP000941", "rse_gene.Rdata"))

m$run[!m$run %in% rse_gene$run]
m[which(!m$run %in% rse_gene$run), ]

shows that 0 reads were downloaded for those 2 samples

DataFrame with 2 rows and 21 columns
      project      sample  experiment         run read_count_as_reported_by_sra reads_downloaded
  <character> <character> <character> <character>                     <integer>        <integer>
1   SRP000941   SRS366515   SRX263859  SRR1220437                      36581388                0
2   SRP000941   SRS366515   SRX263859  SRR1220440                      36368682                0
  proportion_of_reads_reported_by_sra_downloaded paired_end sra_misreported_paired_end mapped_read_count
                                       <numeric>  <logical>                  <logical>         <integer>
1                                              0       TRUE                      FALSE                 0
2                                              0       TRUE                      FALSE                 0
        auc sharq_beta_tissue sharq_beta_cell_type biosample_submission_date biosample_publication_date
  <numeric>       <character>          <character>               <character>                <character>
1        NA         stem cell                  ips   2012-10-02T09:07:19.193    2013-05-02T11:05:43.390
2        NA         stem cell                  ips   2012-10-02T09:07:19.193    2013-05-02T11:05:43.390
    biosample_update_date avg_read_length geo_accession bigwig_file       title characteristics
              <character>       <integer>   <character> <character> <character> <CharacterList>
1 2015-01-30T10:35:23.203             200            NA          NA          NA              NA
2 2015-01-30T10:35:23.203             200            NA          NA          NA              NA

Hence why there are not in the final rse_gene file. As to why they weren't downloaded, Abhinav Nellore might have an answer.

Best, Leonardo