tximport samples.txt file mandatory vs. optional colnames?
1
0
Entering edit mode
AKSR ▴ 20
@aksr-5026
Last seen 4.4 years ago

Greetings!

I seek some help with tximport sample.txt file formatting. But 1st some context:

CONTEXT: I have SALMON generated quantification data as ~ 150 quant.sf files. No issue there. Now I need to read these in for feeding them to the next DESeq2 step, for which I am trying to use tximport.

OBSERVATION: From tximportData vignette at this link, it shows the following:

> colnames(samples)
[1] "pop"        "center"     "assay"      "sample"     "experiment"
[6] "run"

and

> colnames(samples.ext)
 [1] "Source.Name"                            
 [2] "Comment.ENA_SAMPLE."                    
 [3] "Characteristics.Organism."              
 [4] "Term.Source.REF"                        
 [5] "Term.Accession.Number"                  
 [6] "Characteristics.Strain."                
 [7] "Characteristics.population."            
 [8] "Comment.1000g.Phase1.Genotypes."        
 [9] "Protocol.REF"                           
[10] "Protocol.REF.1"                         
[11] "Extract.Name"                           
[12] "Comment.LIBRARY_SELECTION."             
[13] "Comment.LIBRARY_SOURCE."                
[14] "Comment.SEQUENCE_LENGTH."               
[15] "Comment.LIBRARY_STRATEGY."              
[16] "Comment.LIBRARY_LAYOUT."                
[17] "Comment.NOMINAL_LENGTH."                
[18] "Comment.NOMINAL_SDEV."                  
[19] "Protocol.REF.2"                         
[20] "Performer"                              
[21] "Assay.Name"                             
[22] "Technology.Type"                        
[23] "Comment.ENA_EXPERIMENT."                
[24] "Comment.READ_INDEX_1_BASE_COORD."       
[25] "Protocol.REF.3"                         
[26] "Scan.Name"                              
[27] "Comment.SUBMITTED_FILE_NAME."           
[28] "Comment.ENA_RUN."                       
[29] "Comment.FASTQ_URI."                     
[30] "Protocol.REF.4"                         
[31] "Derived.Array.Data.File"                
[32] "Comment..Derived.ArrayExpress.FTP.file."
[33] "Factor.Value.population."               
[34] "Factor.Value.laboratory."               
[35] "date"

QUESTIONS:

  • What are the least number of columns and their header names that is acceptable at this step?
  • Do header names have to be chosen from amongst list of colnames acceptable to tximport or no?
  • If I were concerned about batch effects, then what are the additional column(s) I would need to add in as factors - apart from date and lab?
  • Where can I find examples of both minimalistic and highly detailed samples.txt files, and perhaps 1 or 2 in-between?

Thanks in advance!

tximport salmon deseq2 • 1.7k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 day ago
United States

I think the answer is in ?tximport - you only require a vector of files which are paths to the quant.sf files on your machine (best to leave these in their original directories). The rest of the metadata that you include in downstream analysis (which could involve any number of Bioc packages) is up to you.

ADD COMMENT

Login before adding your answer.

Traffic: 537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6