Home Science Biotechnology Using Sex to Identify Mislabeled Samples

Using Sex to Identify Mislabeled Samples

Study Results

In 2015, Miriam Lohr and her group from Dortmund University in Germany decided to quantify the frequency of mislabeled samples in 45 publicly available transcriptomic datasets with data obtained from cancer patients. They accomplished this using sex-specific identifiers—genes that are expressed from either X- or Y-chromosomes. They analyzed these gene expression patterns to determine whether the sample was, in fact, from a male or female patient, then cross-referenced those results to the actual sex of the patient. Of the 4913 patients they evaluated, they found that 1.1% were “misclassified” and 3.0% were “unconfident,” meaning that the sex could not be confirmed based on transcriptomic analysis. In 18 of the 45 datasets (40%) tested, they detected at least one “misclassified” sample. To demonstrate the effect these mislabeled samples could have on actual study results, Lohr et al assessed which genes had prognostic value from the cohorts. They found that by incorporating mislabeling errors, 12% to 53% of the genes significantly associated with patient survival were no longer significant, while another 9% to 39% of genes appeared as newly significant.1

Another similar study was performed in 2016 by Lilah Toker and colleagues. They used a similar methodology, applying sex-specific genes to identify mislabeled samples in 70 transcriptomic datasets, which included both cancer-related and non-cancer–related studies. This group confirmed Lohr’s initial findings, as they discovered mislabeled samples in 46% of the datasets analyzed, with an average mismatch rate of 2%. Though the source of error was usually difficult to determine, they found that the most common source appeared to be samples that had been physically mixed up, and not mistakes due to improper recording of the participants’ sex.2

The main point of both studies was to shed light on how pervasive mislabeling can be in transcriptomic datasets. These mislabeled samples are extremely distressing, as they might wrongly guide any number of research groups who use them to erroneous conclusions. The authors also suggest that while sex-specific identifiers could be used to correct mismatches, mislabeled samples between patients of the same sex can’t be identified with these methods, likely leading to a greater amount of error than what was reported. Altogether, the importance of appropriate labeling can’t be overstated. Every precaution should be taken to ensure that samples are labeled correctly, including making sure your labels are tailored for their environment. Barcoded labels, radio-frequency identification (RFID) labels, and laboratory information management systems (LIMS) can also help reduce errors during the processing of high-throughput data generated from transcriptomic analyses.

LabTAG by GA International is a leading manufacturer of high-performance specialty labels and a supplier of identification solutions used in research and medical labs as well as healthcare institutions.


  1. Lohr M, Hellwig B, Edlund K, et al. Identification of sample annotation errors in gene expression datasets. Arch Toxicol. 2015;89:2265-2272.
  2. Toker L, Feng M, Pavlidis P. Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies. F1000Research. 2016;5:1-13.
Alexander Goldberg, Ph.D.
The scientific writer and social media manager at GA International. Dr. Alex Goldberg earned his Ph.D. in biology and previously worked as a post-doc in toxicology and medicine, studying chronological lifespan in yeast, anti-neoplastic small molecules, and the genetics of tuberous sclerosis complex.


Please enter your comment!
Please enter your name here

About LabTAG

LabTAG is the worldwide leader in cryogenic and chemical-resistant label manufacturing. With over 20 years of experience in the industry, and a catalog of 6000+ products, we have the selection and know-how to meet your labeling needs.

Learn more about LabTAG

Most Popular

4 Tips for Labeling Microscope Slides

Microscopy is one of the oldest techniques in biomedical research, dating back to the 16th century. As labs continue to grow, so too does...

Will 3D Imaging Make 2D Pathology Obsolete?

Histochemical analysis is an integral component of pathology laboratories, providing patients with diagnostic and prognostic information, which is critical for making treatment-related decisions. However,...

Tips for Choosing the Right Barcode in the Lab

When implementing a new laboratory information management system (LIMS) or any other digital management system, assessing how samples and inventory will be tracked and...

New Viruses & Bacteria, Scientists Are Studying in the Melting Arctic Permafrost

Permafrost in the Arctic has come under sharp focus over the last decade as climate change has rapidly increased its thaw rate and consequently...

Connect with us


More Categories

Recent Comments

Central BioHub GmbH on The History and Function of Biobanks
Michelle Yin on The Science of Cryogenics