Entering edit mode
Hello, I am trying to read a multiples .seq file resulting from sequencing to then find ORFs and translate and align them. I don't know how to read the .seq file, I tried read.table() but it imports it in multiple rows however the entire file contains a single sequence. thanks for your help
I already tried to import it as fasta, but it gives me that error:
Error in .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec, : reading FASTA file Luis1_001_E06.seq: ">" expected at beginning of line 1
the file contains only the bases, as follows: GTCCCCGCCGAAATTATTACGACTCACTATAGGGGATTGTGAGCGGATAACAATTCCCCTCTAGAAATAATTTTGTTTAA CTTTAAGAAGGAGATATACCATGGGCAGCAGCCATCACCATCATCACCACAGCCAGGATCCAATCGTGCGCCGCAAGCTG
I could manually add a ">NAME" But I have many files and I don't know how to do that with so many files, manually it would take a lot of time
So here's a contrived example
Hello, thank you very much, I did not know the function scan() Before your answer I had found an option to import the file: F1 <- readLines("file.seq") F1 <- paste(F1, collapse = "") F1 <- DNAStringSet(F1) F1 The result was this:
[1] 1267 GTCCCCGCCGAAATTATTACGACTCACTATAGGGGAT...GCGCTTCTTTCTTGTGGTATAGGCATAATGATGATGC
With the code you shared, I got this result:
DNAStringSet object of length 16: width seq
[1] 80 GTCCCCGCCGAAATTATTACGACTCACTATAGGGGAT...GGATAACAATTCCCCTCTAGAAATAATTTTGTTTAA
[2] 80 CTTTAAGAAGGAGATATACCATGGGCAGCAGCCATCA...CACCACAGCCAGGATCCAATCGTGCGCCGCAAGCTG
[3] 80 ACCGGTTATGTTGGCTTCGCTAACCTGCCGAACCAGT...CAAATCCGTGCGCAAAGGTTTCAATTTCAACGTCAT
... ... ...
[16] 67 TGGCGAGGGCGGGAATTAAACAATAGTTTTGCGCTTCTTTCTTGTGGTATAGGCATAATGATGATGC
So I think that way it reads each line as if it were a different sequence, I guess it's because the file contains line breaks. Then I modified the code you shared with me like this:
z <- scan("notFASTA.seq", "c") zz<- paste(zz, collapse = "") zz <- DNAStringSet(z) zz
Thank you very much for your help, a big hug!
*Excuse me, one more question, what would be the difference between scan() and readLines()