Healthcare / Medical Imaging

Whole-Slide Pathology Annotation for a Histopathology AI Vendor

Board-certified pathologists annotated 8,400 whole-slide images for tumor region segmentation and nuclei instance labeling, narrowing the model's hospital-by-hospital performance gap from 18% to 4%.

Medical ImagingHistopathologySegmentationHIPAA

Client

Healthcare AI Company

Volume

8,400 whole-slide images (~120GB)

Duration

14 weeks

Team

9 board-certified pathologists, 4 senior medical reviewers, 12 trained annotators

Languages

English (medical)

The challenge

A histopathology AI vendor was building a model for tumor region detection and nuclei segmentation in breast cancer biopsies. Their existing dataset combined public benchmarks (CAMELYON16, MoNuSeg, PanNuke) with limited internal annotation.

On production validation, the model showed a noticeable performance gap across the five hospital partners they served. Stain variation, scanner differences, and edge cases in clinical slides were eroding accuracy on three of the five sites.

They needed annotations that reflected real-world staining diversity and scanner variation, not just curated benchmark data. They also needed cell-type schema that matched a HoVer-Net style downstream architecture.

Our approach

Track 1: Tumor region segmentation

Board-certified pathologists outlined tumor regions at 20x magnification. Cancer subtype was labeled per region: invasive ductal carcinoma, invasive lobular carcinoma, DCIS, and benign reference tissue.

Stroma vs. tumor boundary precision target: ≤ 50 micrometers
Pathologist consensus on ambiguous boundary regions
Per-region staining quality flag for downstream filtering

Track 2: Nuclei instance segmentation

Trained annotators produced per-cell instance masks at high magnification with pathologist QA. The schema followed HoVer-Net (Graham et al., 2019) to match the client's downstream architecture.

Cell types: tumor nuclei, lymphocyte, fibroblast, epithelial, necrotic
Boundary precision at single-pixel level on 40x magnification crops
Per-cell confidence flag for fragmented or out-of-focus nuclei

Track 3: Artifact and QC annotation

Slide-level artifacts were annotated as first-class outputs rather than discarded. The artifact maps became a separate training signal that prevented the model from learning artifacts as features.

Air bubbles, tissue folds, out-of-focus regions
Pen marks, marker dots, scanner stripe artifacts
Hospital-by-hospital artifact frequency reported back to the client

Workflow and compliance

All annotation work happened inside the client's controlled environment. Protected health information never left their VPN. Annotators signed NDAs and completed HIPAA awareness training before the project started. An audit trail of every annotation action was retained for 7 years per the client's retention policy.

Results

8,127

Slides annotated

0.89

Tumor region Dice

0.84

Nuclei instance F1

Narrowed 18% → 4%

Hospital performance gap

What made it work

1
Board-certified pathologists, not crowd workers, on the cancer region track. Subtype boundary calls require training that crowd annotation cannot substitute.
2
Hospital-by-hospital sampling from week one meant the model saw scanner diversity throughout training, not as a post-hoc fix.
3
Treating artifacts as a first-class annotation output, instead of filtering them out, gave the client a signal they could train against.
4
Cell-type schema designed against the downstream architecture (HoVer-Net) avoided the common pattern of annotating in one format and reformatting later, which leaks errors.

References

Published research that informed the labeling schema and workflow.

Bejnordi, B. E. et al. (2017). Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. · JAMA 318(22)
Kumar, N. et al. (2017). A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology. · IEEE TMI
Gamper, J. et al. (2019). PanNuke: An Open Pan-Cancer Histology Dataset for Nuclei Instance Segmentation and Classification. · ECDP 2019
Graham, S. et al. (2019). HoVer-Net: Simultaneous Segmentation and Classification of Nuclei in Multi-Tissue Histology Images. · Medical Image Analysis
Kirillov, A. et al. (2023). Segment Anything. · ICCV 2023

More case studies

Generative Video / Image Quality Assessment

Subjective Video Quality Scoring at 98% Agreement for a Generative Video Model Team

Document AI / Financial Services

Structured Extraction From 50,000 Financial Documents for a Document AI Vendor

Robotics / Imitation Learning

Action Trajectory Labeling for a Robotics Lab Training Manipulation Policies

Agentic AI / AI Safety Evaluation

Decision-Quality Annotation for an Agentic AI in Security Incident Response

Robotics / Vision-Language Foundation Models

Scaling Multi-View Robotic Video Annotation From Manual Process to 1,000-Hour Ramp

Have a similar project?

Share your data and requirements. We will scope the workflow, team, timeline, and pricing model.

Start a Pilot Explore Services