Home Science Biotechnology Using Sex to Identify Mislabeled Samples

Using Sex to Identify Mislabeled Samples

Study Results

In 2015, Miriam Lohr and her group from Dortmund University in Germany decided to quantify the frequency of mislabeled samples in 45 publicly available transcriptomic datasets with data obtained from cancer patients. They accomplished this using sex-specific identifiers—genes that are expressed from either X- or Y-chromosomes. They analyzed these gene expression patterns to determine whether the sample was, in fact, from a male or female patient, then cross-referenced those results to the actual sex of the patient. Of the 4913 patients they evaluated, they found that 1.1% were “misclassified” and 3.0% were “unconfident,” meaning that the sex could not be confirmed based on transcriptomic analysis. In 18 of the 45 datasets (40%) tested, they detected at least one “misclassified” sample. To demonstrate the effect these mislabeled samples could have on actual study results, Lohr et al assessed which genes had prognostic value from the cohorts. They found that by incorporating mislabeling errors, 12% to 53% of the genes significantly associated with patient survival were no longer significant, while another 9% to 39% of genes appeared as newly significant.1

Another similar study was performed in 2016 by Lilah Toker and colleagues. They used a similar methodology, applying sex-specific genes to identify mislabeled samples in 70 transcriptomic datasets, which included both cancer-related and non-cancer–related studies. This group confirmed Lohr’s initial findings, as they discovered mislabeled samples in 46% of the datasets analyzed, with an average mismatch rate of 2%. Though the source of error was usually difficult to determine, they found that the most common source appeared to be samples that had been physically mixed up, and not mistakes due to improper recording of the participants’ sex.2

The main point of both studies was to shed light on how pervasive mislabeling can be in transcriptomic datasets. These mislabeled samples are extremely distressing, as they might wrongly guide any number of research groups who use them to erroneous conclusions. The authors also suggest that while sex-specific identifiers could be used to correct mismatches, mislabeled samples between patients of the same sex can’t be identified with these methods, likely leading to a greater amount of error than what was reported. Altogether, the importance of appropriate labeling can’t be overstated. Every precaution should be taken to ensure that samples are labeled correctly, including making sure your labels are tailored for their environment. Barcoded labels, radio-frequency identification (RFID) labels, and laboratory information management systems (LIMS) can also help reduce errors during the processing of high-throughput data generated from transcriptomic analyses.


  1. Lohr M, Hellwig B, Edlund K, et al. Identification of sample annotation errors in gene expression datasets. Arch Toxicol. 2015;89:2265-2272.
  2. Toker L, Feng M, Pavlidis P. Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies. F1000Research. 2016;5:1-13.

Subscribe to the Blog

Alexander Goldberg, Ph.D.
I'm a scientific writer and social media manager for GA International. I have a Ph.D. in biology and previously worked as a post-doc in toxicology and medicine, having studied chronological lifespan in yeast, anti-neoplastic small molecules, and the genetics of tuberous sclerosis complex.


Please enter your comment!
Please enter your name here

Most Popular

Biobanking is a Priority for Scientists Studying COVID-19

Biobanks serve as a source of primary human tissues, with some storing millions of samples from hundreds of thousands of patients. So, what do...

Research Spotlight: Dr. Dhivya Sudhan, UT Southwestern Medical Center

This is the first in a series of interviews delving into the lives of scientists and their research. This week’s Research Spotlight features Dr....

9 Ways to Improve Turnaround Time in Medical Laboratories

To improve turnaround time in the laboratory is an important benchmark that all medical labs strive for. However, improving turnaround time is rarely a...

What We Know About COVID-19 Immunity Right Now

Don't forget to favorite this blog as it will be continuously updated with the latest news regarding COVID-19 immunity. Five months after the COVID-19 outbreak...

Connect with us


More Categories

Recent Comments