Repeated tag sequence error
1
0
Entering edit mode
N • 0
@e49ee5f8
Last seen 2.6 years ago
India

Enter the body of text here I am using edgeR for RNA seq analysis, repeated tag sequence error was obtained frequently. Kindly help me in this regard.

Code should be placed in three backticks as shown below


>     x <- readDGE(files, columns=c(1,2))

Error in readDGE(files, columns = c(1, 2)) : 
  Repeated tag sequences inGSM4321761_WT_rep1.txt

# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
edgeR • 1.0k views
ADD COMMENT
0
Entering edit mode

You apparently have duplicates in your input file. Did you see this post? It provides some ways to check this. edgeR import error

ADD REPLY
0
Entering edit mode

Yes Sir, I have seen that post. I am using R for the first time. I am not able to debug the error with the code x <- read.delim("tmp1.txt", stringsAsFactors = FALSE) any(duplicated(x[,1])). Kindly provide me suggestions.

x <- readDGE(files, stringAsfactors=FALSE)

Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (stringAsfactors = FALSE)

x <- read.delim("GSM4321761_WT_rep1.txt", stringsAsFactors = FALSE) any(duplicated(x[,1])) [1] TRUE

x[duplicated(x[,1]),] tracking_id FPKM 11616 LOC_Os02g55670 7.75375 28033 LOC_Os06g07923 12.15650 45853 LOC_Os10g22310 0.00000 46591 LOC_Os10g31460 15.15640 46593 LOC_Os10g31460 54.53610

x <- readDGE(files) Error in readDGE(files) : Repeated tag sequences inGSM4321761_WT_rep1.txt

ADD REPLY
2
Entering edit mode
@gordon-smyth
Last seen 21 minutes ago
WEHI, Melbourne, Australia

The problem is not with the R code but with the data you are trying to read.

I am going to guess that you are trying to read files from the GEO series GSE145579: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE145579

The data provided by this GEO series contains two fatal errors:

  • First there are multiple rows with the same feature names (tracking_id) as previous rows, which is nonsense and prevents one from making sense of the data. This is the problem that has been correctly diagnosed by the edgeR readDGE function.
  • Second, the data provides FPKM values rather than read counts and FPKM values cannot be analysed using edgeR. FPKM values are unfortunately unsuitable for downstream differential expression analyses.

I suggest that you write to the authors of the GEO study and ask them to provide you with data in a more useful form.

ADD COMMENT

Login before adding your answer.

Traffic: 840 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6