Healthcare / Medical Imaging
Whole-Slide Pathology Annotation for a Histopathology AI Vendor
Board-certified pathologists annotated 8,400 whole-slide images for tumor region segmentation and nuclei instance labeling, narrowing the model's hospital-by-hospital performance gap from 18% to 4%.
Client
Healthcare AI Company
Volume
8,400 whole-slide images (~120GB)
Duration
14 weeks
Team
9 board-certified pathologists, 4 senior medical reviewers, 12 trained annotators
Languages
English (medical)
The challenge
A histopathology AI vendor was building a model for tumor region detection and nuclei segmentation in breast cancer biopsies. Their existing dataset combined public benchmarks (CAMELYON16, MoNuSeg, PanNuke) with limited internal annotation.
On production validation, the model showed a noticeable performance gap across the five hospital partners they served. Stain variation, scanner differences, and edge cases in clinical slides were eroding accuracy on three of the five sites.
They needed annotations that reflected real-world staining diversity and scanner variation, not just curated benchmark data. They also needed cell-type schema that matched a HoVer-Net style downstream architecture.
Our approach
Track 1: Tumor region segmentation
Board-certified pathologists outlined tumor regions at 20x magnification. Cancer subtype was labeled per region: invasive ductal carcinoma, invasive lobular carcinoma, DCIS, and benign reference tissue.
- Stroma vs. tumor boundary precision target: ≤ 50 micrometers
- Pathologist consensus on ambiguous boundary regions
- Per-region staining quality flag for downstream filtering
Track 2: Nuclei instance segmentation
Trained annotators produced per-cell instance masks at high magnification with pathologist QA. The schema followed HoVer-Net (Graham et al., 2019) to match the client's downstream architecture.
- Cell types: tumor nuclei, lymphocyte, fibroblast, epithelial, necrotic
- Boundary precision at single-pixel level on 40x magnification crops
- Per-cell confidence flag for fragmented or out-of-focus nuclei
Track 3: Artifact and QC annotation
Slide-level artifacts were annotated as first-class outputs rather than discarded. The artifact maps became a separate training signal that prevented the model from learning artifacts as features.
- Air bubbles, tissue folds, out-of-focus regions
- Pen marks, marker dots, scanner stripe artifacts
- Hospital-by-hospital artifact frequency reported back to the client
Workflow and compliance
All annotation work happened inside the client's controlled environment. Protected health information never left their VPN. Annotators signed NDAs and completed HIPAA awareness training before the project started. An audit trail of every annotation action was retained for 7 years per the client's retention policy.
Results
8,127
Slides annotated
0.89
Tumor region Dice
0.84
Nuclei instance F1
Narrowed 18% → 4%
Hospital performance gap
What made it work
- 1
Board-certified pathologists, not crowd workers, on the cancer region track. Subtype boundary calls require training that crowd annotation cannot substitute.
- 2
Hospital-by-hospital sampling from week one meant the model saw scanner diversity throughout training, not as a post-hoc fix.
- 3
Treating artifacts as a first-class annotation output, instead of filtering them out, gave the client a signal they could train against.
- 4
Cell-type schema designed against the downstream architecture (HoVer-Net) avoided the common pattern of annotating in one format and reformatting later, which leaks errors.
References
Published research that informed the labeling schema and workflow.
- Bejnordi, B. E. et al. (2017). Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. · JAMA 318(22)
- Kumar, N. et al. (2017). A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology. · IEEE TMI
- Gamper, J. et al. (2019). PanNuke: An Open Pan-Cancer Histology Dataset for Nuclei Instance Segmentation and Classification. · ECDP 2019
- Graham, S. et al. (2019). HoVer-Net: Simultaneous Segmentation and Classification of Nuclei in Multi-Tissue Histology Images. · Medical Image Analysis
- Kirillov, A. et al. (2023). Segment Anything. · ICCV 2023
More case studies
Generative Video / Image Quality Assessment
Subjective Video Quality Scoring at 98% Agreement for a Generative Video Model Team
Document AI / Financial Services
Structured Extraction From 50,000 Financial Documents for a Document AI Vendor
Robotics / Imitation Learning
Action Trajectory Labeling for a Robotics Lab Training Manipulation Policies
Agentic AI / AI Safety Evaluation
Decision-Quality Annotation for an Agentic AI in Security Incident Response
Robotics / Vision-Language Foundation Models
Scaling Multi-View Robotic Video Annotation From Manual Process to 1,000-Hour Ramp
Have a similar project?
Share your data and requirements. We will scope the workflow, team, timeline, and pricing model.