Home Science Biotechnology Using Sex to Identify Mislabeled Samples

Using Sex to Identify Mislabeled Samples

Study Results

In 2015, Miriam Lohr and her group from Dortmund University in Germany decided to quantify the frequency of mislabeled samples in 45 publicly available transcriptomic datasets with data obtained from cancer patients. They accomplished this using sex-specific identifiers—genes that are expressed from either X- or Y-chromosomes. They analyzed these gene expression patterns to determine whether the sample was, in fact, from a male or female patient, then cross-referenced those results to the actual sex of the patient. Of the 4913 patients they evaluated, they found that 1.1% were “misclassified” and 3.0% were “unconfident,” meaning that the sex could not be confirmed based on transcriptomic analysis. In 18 of the 45 datasets (40%) tested, they detected at least one “misclassified” sample. To demonstrate the effect these mislabeled samples could have on actual study results, Lohr et al assessed which genes had prognostic value from the cohorts. They found that by incorporating mislabeling errors, 12% to 53% of the genes significantly associated with patient survival were no longer significant, while another 9% to 39% of genes appeared as newly significant.1

Another similar study was performed in 2016 by Lilah Toker and colleagues. They used a similar methodology, applying sex-specific genes to identify mislabeled samples in 70 transcriptomic datasets, which included both cancer-related and non-cancer–related studies. This group confirmed Lohr’s initial findings, as they discovered mislabeled samples in 46% of the datasets analyzed, with an average mismatch rate of 2%. Though the source of error was usually difficult to determine, they found that the most common source appeared to be samples that had been physically mixed up, and not mistakes due to improper recording of the participants’ sex.2

The main point of both studies was to shed light on how pervasive mislabeling can be in transcriptomic datasets. These mislabeled samples are extremely distressing, as they might wrongly guide any number of research groups who use them to erroneous conclusions. The authors also suggest that while sex-specific identifiers could be used to correct mismatches, mislabeled samples between patients of the same sex can’t be identified with these methods, likely leading to a greater amount of error than what was reported. Altogether, the importance of appropriate labeling can’t be overstated. Every precaution should be taken to ensure that samples are labeled correctly, including making sure your labels are tailored for their environment. Barcoded labels, radio-frequency identification (RFID) labels, and laboratory information management systems (LIMS) can also help reduce errors during the processing of high-throughput data generated from transcriptomic analyses.

LabTAG by GA International is a leading manufacturer of high-performance specialty labels and a supplier of identification solutions used in research and medical labs as well as healthcare institutions.


  1. Lohr M, Hellwig B, Edlund K, et al. Identification of sample annotation errors in gene expression datasets. Arch Toxicol. 2015;89:2265-2272.
  2. Toker L, Feng M, Pavlidis P. Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies. F1000Research. 2016;5:1-13.
Alexander Goldberg, Ph.D.
The scientific writer and social media manager at GA International. Dr. Alex Goldberg earned his Ph.D. in biology and previously worked as a post-doc in toxicology and medicine, studying chronological lifespan in yeast, anti-neoplastic small molecules, and the genetics of tuberous sclerosis complex.


Please enter your comment!
Please enter your name here

About LabTAG

LabTAG is the worldwide leader in cryogenic and chemical-resistant label manufacturing. With over 20 years of experience in the industry, and a catalog of 6000+ products, we have the selection and know-how to meet your labeling needs.

Learn more about LabTAG

Most Popular

Tips for Running Western Blots

One of the most used techniques in biology and biochemistry, Western blots are also one of the hardest techniques to master. Though every experiment...

5 Features that Every Modern LIMS Should Have

Recently, labs have turned to full digitization with the intention of enhancing their ability to track everything, from samples to consumables. Thus, the use...

Why It’s Important to Include Sample Identification When Optimizing Lab Protocols

Recently, a news article on optimizing lab protocols described how several researchers at the University of Montreal decided to devise a new strategy for...

Going Green: Does Using -70°C Affect Sample Storage

Ultra-low temperature storage is a staple of laboratory life. Nearly every scientist leans on safe and protected long-term sample storage afforded by lab freezers,...

Connect with us


More Categories

Recent Comments

Efficient Solutions for Barcode Printing Challenges Unveiled on Why Do Barcodes Need A Quiet Zone?
Central BioHub GmbH on The History and Function of Biobanks
Michelle Yin on The Science of Cryogenics