Hi,
I have a data set that have expression data from 7 patients covering different cell populations. This cell populations are marked in a way that certain cells of similar origin based on the location in the organ have been differently marked where similar cells give rise to a secondary tumor while its adjacent region does not. So they are marked as cells grt and cells dgrt. Now the idea is to find the differences in this 2 cell populations inducing the factors of number of patients , batches, so the structure matrix of a dataset which resembles as below.
I want to use the multifactorial design to find the differences in cell populations of grt vs dgrt using the batch, patients (ind) and condition. But I encounter error as shown below.
colData batch condition ind u.132p1 u grt b u.141p1 u grt c u.183p2 u grt d u.183p3 u grt d i.127P1 i grt f i.168P3 i grt g i.168P4 i grt g u.132p2 u dgrt b u.132p3 u dgrt b u.141p2 u dgrt c u.141p3 u dgrt c u.141p4 u dgrt c u.183p1 u dgrt d i.127P2 i dgrt f i.127P3 i dgrt f i.127P4 i dgrt f i.168P1 i dgrt g i.168P2 i dgrt g
Now so I have actually patient samples coming from 7 patients where the specific regions have been marked to give different cell types . There are different batches of facility that contributes to the patient samples, now this data primary tries to find differences between cells of similar origin where one seats for a (grt) secondary tumor and the other (dgrt) not.
The 2 batches are u and i for the facilities. Now Since samples belong to same patient I also wanted to factor in the patients as well to also capture if patient specific effects can also be seen or not. So in the design matrix at first I just put the batch and conditions (refering to the cell type giving or not giving tumor) but that was not able to find any differences with DESeq2 the code I used is below where the colData does not include the patients.
dds <- DESeqDataSetFromMatrix(countData = data.grt.dgrt,colData = colData,design = ~ batch + condition) dds ddds <- DESeq(dds) res <- results(ddds) res resOrdered<-res[order(res$padj),] head(resOrdered) thresh<-subset(resOrdered,padj<0.05)
These did not give me any differential expression which is likely not possible. I understand that the design matrix is not proper. So now I was trying to put even the patient specificity in the design matrix and then try to find the differences between grt vs dgrt reducing the batch effects. How will I be able to do this as if I try to create the colData with the patients, I am unable to create the object dds
dds <- DESeqDataSetFromMatrix(countData = data.grt.dgrt,colData = colData,design = ~ ind + condition +batch)
converting counts to integer mode
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
Please read the vignette section 'Model matrix not full rank':
vignette('DESeq2')
My ultimate aim is to find the differences level of the same cell type where from same patient cells from a region does not give tumor while adjacent cells give tumor, so to make the design I should take into account the number of patients, the cell type and the batches since it involves sequencing done at different facility. How will I be able to achieve this here.
So you say that the batch effect will not be a factor in this case? How is the system able to understand that the structure has some patients sequenced in a new facility if I do not put the batch into design? if I try to make correlation heatmap of the samples based on rlog I see the samples are seperated as clusters of 2 which is batch specific and not cell type and so I was trying to put the batch as well as one factor in inside. So how will the system differentiate between 2 patients that have cells coming from similar patient of 2 different type but sequenced in different batches. Sorry I did not understand if I have to channge the structure or not in this case when you say "finer-grained". Thanks a lot for the response
You might benefit from speaking to a statistician, who can explain how controlling for individual here supersedes controlling for batch, and why batch can not be included in the design because it is confounded with individual.
Thanks a lot for the reply. Just to say that I tried to put the design with ~patient + condition but still not DEGs, seems unlikely. Is there anyway that I should make a contrast now in res() . Below is the code, is there anything wrong? But if it is correct then how could I improve it to fish out the differences, am sure am missing out something, as it is unlikely impossible to see no differences.
No DEGs with padj here. I know I can reduce the threshold but still if in case I am missing out something in the calling the DE analysis.
Can you post a PCA plot.
I don't know whether it's likely or not to find any DEG in your experiment. This depends on the variability, effect sizes and sample size. A good way to assess things is to look at a PCA plot (see vignette).
Yes indeed the PCA plot showed that there is very few differences between the the cells near the tumor and cells far away from the tumor since they were just being separated broadly by the effect of batch.
PCA plot of grt vs dgrt and colored by cell type accounting for a very low variability and the PC1 accounts for 27% while PC2 10% of the variability.
https://www.scribd.com/document/325601560/Pca-grt-dgrt-by-cellType
and as you can see the larger separation on 2 oppositve axis is largely dominated by batch as the sample separation here is due to batch where the upper left is from facility u while lower left is i. Yes I was actually getting ambitious to find the differences but may be the samples we sequenced had low variability to figure out this differences between cells of defined locations. I have to try out a different methodology to understand the differences. Thanks a lot for your guidance.