Question

Repeated tag sequence error

0

Entering edit mode

N • 0

@e49ee5f8

Last seen 2.6 years ago

India

Enter the body of text here I am using edgeR for RNA seq analysis, repeated tag sequence error was obtained frequently. Kindly help me in this regard.

Code should be placed in three backticks as shown below


>     x <- readDGE(files, columns=c(1,2))

Error in readDGE(files, columns = c(1, 2)) : 
  Repeated tag sequences inGSM4321761_WT_rep1.txt

# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

edgeR • 1.0k views

ADD COMMENT • link updated 2.6 years ago by Gordon Smyth 50k • written 2.6 years ago by N • 0

0

Entering edit mode

You apparently have duplicates in your input file. Did you see this post? It provides some ways to check this. edgeR import error

ADD REPLY • link 2.6 years ago Guido Hooiveld ★ 3.9k

0

Entering edit mode

Yes Sir, I have seen that post. I am using R for the first time. I am not able to debug the error with the code x <- read.delim("tmp1.txt", stringsAsFactors = FALSE) any(duplicated(x[,1])). Kindly provide me suggestions.

x <- readDGE(files, stringAsfactors=FALSE)

Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (stringAsfactors = FALSE)

x <- read.delim("GSM4321761_WT_rep1.txt", stringsAsFactors = FALSE) any(duplicated(x[,1])) [1] TRUE

x[duplicated(x[,1]),] tracking_id FPKM 11616 LOC_Os02g55670 7.75375 28033 LOC_Os06g07923 12.15650 45853 LOC_Os10g22310 0.00000 46591 LOC_Os10g31460 15.15640 46593 LOC_Os10g31460 54.53610

x <- readDGE(files) Error in readDGE(files) : Repeated tag sequences inGSM4321761_WT_rep1.txt

ADD REPLY • link 2.6 years ago N • 0

score 2 · Answer 1 · 2021-09-07

The problem is not with the R code but with the data you are trying to read.

I am going to guess that you are trying to read files from the GEO series GSE145579: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE145579

The data provided by this GEO series contains two fatal errors:

First there are multiple rows with the same feature names (tracking_id) as previous rows, which is nonsense and prevents one from making sense of the data. This is the problem that has been correctly diagnosed by the edgeR readDGE function.
Second, the data provides FPKM values rather than read counts and FPKM values cannot be analysed using edgeR. FPKM values are unfortunately unsuitable for downstream differential expression analyses.

I suggest that you write to the authors of the GEO study and ask them to provide you with data in a more useful form.