Question: Duplication of same sample as if two different samples
gravatar for A
11 weeks ago by
A10 wrote:

Hi all, 

I really need help for a problem I have come across since I received another batch of RNA-seq data which I have combined with the first batch. Within both batches I have the same organ but different biological replicates... for example, 2 replicates of the lungs in batch 1 and 2 and the third and fourth replicate of the lungs in batch 2. 

With the metadata, replicate info and the column order of the counts table aligning with the row order of the metadata sheet, all samples model well with DESeq2 with regards to replicates etc... apart from one organ.. the small intestine. The small intestine is actually recognised as two  separate samples and not replicates (or the same organ) across different ages. The small intestine samples are actually separated according to the batches so that in a PCA plot for example, there are separate coloured dots for one set of small intestine samples and another, as if they are two different organs.


This is not happening with the other organs and all organs are recognised correctly as one sample in terms of one organ within which there are different replicates for different ages. 


Is this a known issue/bug? Could this result from mistakes in the metadata sheet? I have checked this over and unless I am missing something really obvious, I cannot see any inconsistencies in the metadata table... Any help would be greatly appreciated.. 


I am also happy to provide code although I don't know what to add as the steps are a standard DESeq2 pipeline!


Many thanks!

ADD COMMENTlink modified 11 weeks ago by Michael Love20k • written 11 weeks ago by A10
gravatar for Michael Love
11 weeks ago by
Michael Love20k
United States
Michael Love20k wrote:

I don’t follow what the perceived problem is. Is the problem that the points are separated in the PCA plot?

ADD COMMENTlink written 11 weeks ago by Michael Love20k

Hi Michael, 


Thank you for the quick response. 


No the points are not separated, they cluster together. The problem is, is that there are two colours assigned for organ as if they are two separate samples even though they are the same. So I have 24 samples in total for the small intestine. I get a split in to two separate samples. about 20 named small intestine and another 5 samples also called small intestine but treated as a separate sample. 


I am really sorry if this is unclear, I am not sure how to link the PCA through a link... as I dont know where to upload. 

ADD REPLYlink written 11 weeks ago by A10

Check table(dds$organ) and make sure that there isn’t a typo in the levels. Recent releases of DESeq2 checks that there aren’t spaces or stray punctuation (typos) potentially affecting factor levels but you may have an older version of DESeq2.

ADD REPLYlink written 11 weeks ago by Michael Love20k

To be more clear, R won’t tolerate any changes in the exact characters. It doesn’t do any kind of fuzzy clumping of characters into levels. “small intestine” is different than “small intestine ” is different than “small.intestine” etc

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by Michael Love20k

Thank you Michael, 


I get the following result:





I can only imagine there may be an alteration in the apostrophe, although there is none entered in the metadata sheet... 

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by A10

There is a small difference somewhere, which you can’t see by eye... R doesn’t make mistakes in comparisons. Just recode from scratch.

Like I said earlier, not sure what version of DESeq2 you are using, but if you used the ones from the past few years, they check if extra spaces are present in the coding of variables and warn the user.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by Michael Love20k

Thanks Michael, I will edit from scracth as you suggest and update DESeq2!


Also, I just realised that 'small intestine' is different to 'small intestine ' with a space in the line! Will edit this all and report back if there is a fix for anyone else who might have this problem!


Many thanks

ADD REPLYlink written 11 weeks ago by A10

Solved! Indeed an invisible space in some of the samples! Thanks again!

ADD REPLYlink written 11 weeks ago by A10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 318 users visited in the last hour