Greetings!
I seek some help with tximport sample.txt file formatting. But 1st some context:
CONTEXT: I have SALMON generated quantification data as ~ 150 quant.sf files. No issue there. Now I need to read these in for feeding them to the next DESeq2 step, for which I am trying to use tximport.
OBSERVATION: From tximportData vignette at this link, it shows the following:
> colnames(samples)
[1] "pop" "center" "assay" "sample" "experiment"
[6] "run"
and
> colnames(samples.ext)
[1] "Source.Name"
[2] "Comment.ENA_SAMPLE."
[3] "Characteristics.Organism."
[4] "Term.Source.REF"
[5] "Term.Accession.Number"
[6] "Characteristics.Strain."
[7] "Characteristics.population."
[8] "Comment.1000g.Phase1.Genotypes."
[9] "Protocol.REF"
[10] "Protocol.REF.1"
[11] "Extract.Name"
[12] "Comment.LIBRARY_SELECTION."
[13] "Comment.LIBRARY_SOURCE."
[14] "Comment.SEQUENCE_LENGTH."
[15] "Comment.LIBRARY_STRATEGY."
[16] "Comment.LIBRARY_LAYOUT."
[17] "Comment.NOMINAL_LENGTH."
[18] "Comment.NOMINAL_SDEV."
[19] "Protocol.REF.2"
[20] "Performer"
[21] "Assay.Name"
[22] "Technology.Type"
[23] "Comment.ENA_EXPERIMENT."
[24] "Comment.READ_INDEX_1_BASE_COORD."
[25] "Protocol.REF.3"
[26] "Scan.Name"
[27] "Comment.SUBMITTED_FILE_NAME."
[28] "Comment.ENA_RUN."
[29] "Comment.FASTQ_URI."
[30] "Protocol.REF.4"
[31] "Derived.Array.Data.File"
[32] "Comment..Derived.ArrayExpress.FTP.file."
[33] "Factor.Value.population."
[34] "Factor.Value.laboratory."
[35] "date"
QUESTIONS:
- What are the least number of columns and their header names that is acceptable at this step?
- Do header names have to be chosen from amongst list of colnames acceptable to tximport or no?
- If I were concerned about batch effects, then what are the
additional column(s) I would need to add in as factors - apart
from
date
andlab
? - Where can I find examples of both minimalistic and highly detailed samples.txt files, and perhaps 1 or 2 in-between?
Thanks in advance!