Blog

Notes from the Annotation Floor

Research we read, workflows we run, and the parts that break at scale. Written by the team that ships annotation data for enterprise AI.

Data Quality

Getting Better Training Data for Your AI Model: What Actually Works

Model performance lives or dies on data quality. The teams that ship reliable models follow the same four habits: clean datasets, clear instructions, calibrated annotators, and consensus where it matters.

2025-067 min read

Read post

Document AI

What Document AI Research Tells Us About PDF Annotation Workflows

LayoutLMv3, Nougat, and TableFormer changed how researchers approach document understanding. We pull out the parts that matter when annotating real-world PDFs at production scale.

2025-058 min read

Read post

Robotics

Why Robotics Datasets Need More Than Bounding Boxes

Bounding boxes annotate what is in a frame. Robotics policies need to know what is happening between frames. RT-1, BC-Z, and Open X-Embodiment all hinge on a more demanding kind of label.

2025-047 min read

Read post

Operations

How to Run a Five-Day Annotation Pilot That Actually Saves Time

Pilots either calibrate the project for production or burn a week and produce labels you throw away. The difference is mostly in the first 48 hours.

2025-036 min read

Read post

Have a data problem on your roadmap?

Tell us about the dataset and what you need it to do. We will respond with a scoping note and a sensible next step.

Start a Pilot