Question: Help inputing sailfish TPM files into tximport
0
gravatar for sjmonkley
3.0 years ago by
sjmonkley20
Sweden
sjmonkley20 wrote:

Hi 

I am trying to run tximport on transcript TPM files from sailfish. This problem is that my files are not organised as they are in the Vignette with the quant.sf file for each sample in a separate folder. Instead they had all been combined into one file with separate column for sample.

I have managed to extract the quant data for each sample into a separate file for each however when I try and use those as input I get the message:

 all(file.exists(files))
[1] FALSE

I understand this means the data are not organised  in the way tximport wants but its difficult to tell what exactly is the problem without understanding the expected input requirements and these are not made clear in the Vignette.

It would be useful if there was a way to input all the quant.sf files in a different way and some guidance how that should look would be helpful for those of use not super proficient at R

Thanks

Sue

sailfish tximport • 637 views
ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by sjmonkley20

Hi Sue,

  Could you provide some details about how your data is organized and the commands you're currently attempting to use (that are failing)?  It seems that that R believes that at least some of the expected files do not exist in the provided locations — which you can check outside of the tximport package.

--Rob

ADD REPLYlink written 3.0 years ago by rob.patro0
Answer: Help inputing sailfish TPM files into tximport
1
gravatar for Michael Love
3.0 years ago by
Michael Love24k
United States
Michael Love24k wrote:

tximport performs two fairly simple operations: 1) reading in the quantification data from multiple files and forming matrices and 2) summarizing transcript-level abundances, lengths and counts matrices into gene-level matrices.

If you've already combined the individual samples yourself, then you don't need help with (1).

If you want to use Sailfish and then the downstream gene-level statistical packages in Bioconductor, I'd recommend you just generate the quant.sf files the standard way and don't manipulate them before running tximport. This way you won't run into any unexpected problems. And luckily Sailfish is very fast so it's not that much effort to regenerate the quant.sf files in case you no longer have access to them :)

ADD COMMENTlink written 3.0 years ago by Michael Love24k
Answer: Help inputing sailfish TPM files into tximport
0
gravatar for sjmonkley
3.0 years ago by
sjmonkley20
Sweden
sjmonkley20 wrote:

Thanks Michael/Rob

as I don't do the Sailfish processing myself as it is part of an in-house pipeline getting the data reprocessed will take longer than I though it would take to reorganize the files. Am thinking not now!

When I said the Sailfish output I have has been combined- I meant into 1 dataframe for all samples summarised by transcript

My understanding is that tximport function does more than just form the matrices but also converts to gene level summatio and the input to summarizeToGene  (txi) is a list of matrices so I dont see how I can use my dataframe without conversion somehow? Or can I use my data as input somehow?

Rob- the files exist but I suspect are not formatted as tximport requires: they have 2 extra columns (sample & a duplicated ID column) and are all in one folder rather than same file name in separate folders. Also column names differ slightly. All these are things that I could fix but its fiddly and I am not sure if it will help or if the problem is something else.

Appologies if I have misunderstood- I am actually a biologist by training and am on a steep learning curve!

Sue

 

ADD COMMENTlink written 3.0 years ago by sjmonkley20

OK after speaking to the people who ran the pipeline I realised I could get the quant.sf files and have done so 

However despite setting directory and files as in vignette and these having correct path etc I still get the error (below- there are actually 26 samples but only showed first 6)

> all(file.exists(files))
[1] FALSE
> files
                                                                                  sample1 
               "C:/Users/KGXB111/AZ_IPSD_data/Input_files/CD14M1_02_S2/sailfish/quant.sf" 
                                                                                  sample2 
               "C:/Users/KGXB111/AZ_IPSD_data/Input_files/CD14M2_08_S8/sailfish/quant.sf" 
                                                                                  sample3 
              "C:/Users/KGXB111/AZ_IPSD_data/Input_files/CD14M3_16_S16/sailfish/quant.sf" 
                                                                                  sample4 
               "C:/Users/KGXB111/AZ_IPSD_data/Input_files/CD14P1_01_S1/sailfish/quant.sf" 
                                                                                  sample5 
               "C:/Users/KGXB111/AZ_IPSD_data/Input_files/CD14P2_07_S7/sailfish/quant.sf" 
                                                                                  sample6 
              "C:/Users/KGXB111/AZ_IPSD_data/Input_files/CD14P3_15_S15/sailfish/quant.sf" 

Here is the path of the 1st sample where the qaunt.sf file is:

C:\Users\KGXB111\AZ_IPSD_data\Input_files\CD14M1_02_S2\sailfish

I cannot see what is wrong with the files list or the names/paths. I ran the tximportData sailfish data from the vignette and I dont get the error message:

 all(file.exists(files))
[1] TRUE
> files
                                                                            sample1 
"C:/Users/KGXB111/R-3.3.1/library/tximportData/extdata/sailfish/ERR188297/quant.sf" 
                                                                            sample2 
"C:/Users/KGXB111/R-3.3.1/library/tximportData/extdata/sailfish/ERR188088/quant.sf" 
                                                                            sample3 
"C:/Users/KGXB111/R-3.3.1/library/tximportData/extdata/sailfish/ERR188329/quant.sf" 
                                                                            sample4 
"C:/Users/KGXB111/R-3.3.1/library/tximportData/extdata/sailfish/ERR188288/quant.sf" 
                                                                            sample5 
"C:/Users/KGXB111/R-3.3.1/library/tximportData/extdata/sailfish/ERR188021/quant.sf" 
                                                                            sample6 
"C:/Users/KGXB111/R-3.3.1/library/tximportData/extdata/sailfish/ERR188356/quant.sf" 

 

Any idea how to get to the problem?

Sue

ADD REPLYlink written 3.0 years ago by sjmonkley20

So this is R telling you that the path is not correct:

> all(file.exists(files))
[1] FALSE

This means that the files do not exist at the location specified. You can run this to see if all or only a subset of the files do not exist at the path specified:

file.exists(files)
ADD REPLYlink written 3.0 years ago by Michael Love24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 114 users visited in the last hour