Help inputing sailfish TPM files into tximport
2
0
Entering edit mode
sjmonkley ▴ 30
@sjmonkley-8860
Last seen 7 weeks ago
Sweden

Hi

I am trying to run tximport on transcript TPM files from sailfish. This problem is that my files are not organised as they are in the Vignette with the quant.sf file for each sample in a separate folder. Instead they had all been combined into one file with separate column for sample.

I have managed to extract the quant data for each sample into a separate file for each however when I try and use those as input I get the message:

all(file.exists(files))
[1] FALSE

I understand this means the data are not organised  in the way tximport wants but its difficult to tell what exactly is the problem without understanding the expected input requirements and these are not made clear in the Vignette.

It would be useful if there was a way to input all the quant.sf files in a different way and some guidance how that should look would be helpful for those of use not super proficient at R

Thanks

Sue

tximport sailfish • 1.0k views
0
Entering edit mode

Hi Sue,

Could you provide some details about how your data is organized and the commands you're currently attempting to use (that are failing)?  It seems that that R believes that at least some of the expected files do not exist in the provided locations — which you can check outside of the tximport package.

--Rob

1
Entering edit mode
@mikelove
Last seen 2 days ago
United States

tximport performs two fairly simple operations: 1) reading in the quantification data from multiple files and forming matrices and 2) summarizing transcript-level abundances, lengths and counts matrices into gene-level matrices.

If you've already combined the individual samples yourself, then you don't need help with (1).

If you want to use Sailfish and then the downstream gene-level statistical packages in Bioconductor, I'd recommend you just generate the quant.sf files the standard way and don't manipulate them before running tximport. This way you won't run into any unexpected problems. And luckily Sailfish is very fast so it's not that much effort to regenerate the quant.sf files in case you no longer have access to them :)

0
Entering edit mode
sjmonkley ▴ 30
@sjmonkley-8860
Last seen 7 weeks ago
Sweden

Thanks Michael/Rob

as I don't do the Sailfish processing myself as it is part of an in-house pipeline getting the data reprocessed will take longer than I though it would take to reorganize the files. Am thinking not now!

When I said the Sailfish output I have has been combined- I meant into 1 dataframe for all samples summarised by transcript

My understanding is that tximport function does more than just form the matrices but also converts to gene level summatio and the input to summarizeToGene  (txi) is a list of matrices so I dont see how I can use my dataframe without conversion somehow? Or can I use my data as input somehow?

Rob- the files exist but I suspect are not formatted as tximport requires: they have 2 extra columns (sample & a duplicated ID column) and are all in one folder rather than same file name in separate folders. Also column names differ slightly. All these are things that I could fix but its fiddly and I am not sure if it will help or if the problem is something else.

Appologies if I have misunderstood- I am actually a biologist by training and am on a steep learning curve!

Sue

0
Entering edit mode

OK after speaking to the people who ran the pipeline I realised I could get the quant.sf files and have done so

However despite setting directory and files as in vignette and these having correct path etc I still get the error (below- there are actually 26 samples but only showed first 6)

> all(file.exists(files))
[1] FALSE
> files
sample1
"C:/Users/KGXB111/AZ_IPSD_data/Input_files/CD14M1_02_S2/sailfish/quant.sf"
sample2
"C:/Users/KGXB111/AZ_IPSD_data/Input_files/CD14M2_08_S8/sailfish/quant.sf"
sample3
"C:/Users/KGXB111/AZ_IPSD_data/Input_files/CD14M3_16_S16/sailfish/quant.sf"
sample4
"C:/Users/KGXB111/AZ_IPSD_data/Input_files/CD14P1_01_S1/sailfish/quant.sf"
sample5
"C:/Users/KGXB111/AZ_IPSD_data/Input_files/CD14P2_07_S7/sailfish/quant.sf"
sample6
"C:/Users/KGXB111/AZ_IPSD_data/Input_files/CD14P3_15_S15/sailfish/quant.sf" 

Here is the path of the 1st sample where the qaunt.sf file is:

C:\Users\KGXB111\AZ_IPSD_data\Input_files\CD14M1_02_S2\sailfish

I cannot see what is wrong with the files list or the names/paths. I ran the tximportData sailfish data from the vignette and I dont get the error message:

 all(file.exists(files))
[1] TRUE
> files
sample1
"C:/Users/KGXB111/R-3.3.1/library/tximportData/extdata/sailfish/ERR188297/quant.sf"
sample2
"C:/Users/KGXB111/R-3.3.1/library/tximportData/extdata/sailfish/ERR188088/quant.sf"
sample3
"C:/Users/KGXB111/R-3.3.1/library/tximportData/extdata/sailfish/ERR188329/quant.sf"
sample4
"C:/Users/KGXB111/R-3.3.1/library/tximportData/extdata/sailfish/ERR188288/quant.sf"
sample5
"C:/Users/KGXB111/R-3.3.1/library/tximportData/extdata/sailfish/ERR188021/quant.sf"
sample6
"C:/Users/KGXB111/R-3.3.1/library/tximportData/extdata/sailfish/ERR188356/quant.sf" 

Any idea how to get to the problem?

Sue

0
Entering edit mode

So this is R telling you that the path is not correct:

> all(file.exists(files))
[1] FALSE

This means that the files do not exist at the location specified. You can run this to see if all or only a subset of the files do not exist at the path specified:

file.exists(files)