According to Recount2 metadata and Recount2 website, SRP000941 has 345 samples, but after downloading the study the rse_gene object only has 343 samples?
1
0
Entering edit mode
Kayla • 0
@kayla-15261
Last seen 6.8 years ago

I am attempting to use multiple projects from recount2 and can't figure out why I can only get 343 samples downloaded from SRP000941 yet according to all_metadata() and the recount2 website SRP000941 has 545 samples. The R code showing my issue is posted below. Thanks in advance for any help.

#download all metadata from recount, make into tibble called tblmetadata
metadata <- all_metadata()
tblmetadata <- as_tibble(metadata)

#do some metadata processing, decide SRP000941 is one of the studies I want to use
#according to processing (and recount website) SRP000941 has 345 samples
test <- tblmetadata %>% filter(project == "SRP000941")
dim(test)
#dim(test) outputs 345 21

#download and load rse_gene from SRP000941
download_study("SRP000941", type = "rse-gene")
load(file.path("SRP000941", "rse_gene.Rdata"))

#check dimensions of rse_gene just downloaded
dim(rse_gene)
#output is 58037 343
dim(assays(rse_gene)$counts)
#again output is 58037 343

 

recount • 1.4k views
ADD COMMENT
0
Entering edit mode
@lcolladotor
Last seen 5 days ago
United States

Hi,

This question is related to https://github.com/leekgroup/recount-website/issues/11.

The following R code

library('recount')
metadata <- all_metadata()
m <- subset(metadata, project == 'SRP000941')

download_study("SRP000941", type = "rse-gene")
load(file.path("SRP000941", "rse_gene.Rdata"))

m$run[!m$run %in% rse_gene$run]
m[which(!m$run %in% rse_gene$run), ]

shows that 0 reads were downloaded for those 2 samples

DataFrame with 2 rows and 21 columns
      project      sample  experiment         run read_count_as_reported_by_sra reads_downloaded
  <character> <character> <character> <character>                     <integer>        <integer>
1   SRP000941   SRS366515   SRX263859  SRR1220437                      36581388                0
2   SRP000941   SRS366515   SRX263859  SRR1220440                      36368682                0
  proportion_of_reads_reported_by_sra_downloaded paired_end sra_misreported_paired_end mapped_read_count
                                       <numeric>  <logical>                  <logical>         <integer>
1                                              0       TRUE                      FALSE                 0
2                                              0       TRUE                      FALSE                 0
        auc sharq_beta_tissue sharq_beta_cell_type biosample_submission_date biosample_publication_date
  <numeric>       <character>          <character>               <character>                <character>
1        NA         stem cell                  ips   2012-10-02T09:07:19.193    2013-05-02T11:05:43.390
2        NA         stem cell                  ips   2012-10-02T09:07:19.193    2013-05-02T11:05:43.390
    biosample_update_date avg_read_length geo_accession bigwig_file       title characteristics
              <character>       <integer>   <character> <character> <character> <CharacterList>
1 2015-01-30T10:35:23.203             200            NA          NA          NA              NA
2 2015-01-30T10:35:23.203             200            NA          NA          NA              NA

Hence why there are not in the final rse_gene file. As to why they weren't downloaded, Abhinav Nellore might have an answer.

Best, Leonardo

ADD COMMENT
0
Entering edit mode

Thank you! Really appreciate your help.

ADD REPLY
0
Entering edit mode

No problem =)

ADD REPLY

Login before adding your answer.

Traffic: 526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6