Sam Stiller · Product Manager

How I think

Automation boundaries are a dial, not a switch.

Drag the confidence threshold. Watch which mistake you're choosing to make. The numbers are illustrative, the tradeoff is the job.

HIGH RISK

0

False negatives. Errors that slipped through unreviewed and reached someone.

WASTED EFFORT

0

False positives. Correct items flagged for review that never needed it.

These two costs aren't equal. A missed error can hit a customer and cost real money. A needless review costs a reviewer a few minutes. So you don't chase one accuracy number. You set the dial where the cheaper mistake absorbs the risk.

The evolution

Professional Experience

Three roles spanning Maritime Law and Finance, Renewable Energy Analytics and Finance, and Fixed Income Analysis

PM10x

FactSet · 2019-2022

Portfolio intelligence
Relevance ranking

Upvote / downvote

Live feedback re-scored each analyst's feed

Origin

Where the whole thesis started

Analysts weren't short on insights. They were short on knowing which of them mattered to one specific portfolio.

I worked on PM10x, a portfolio intelligence tool that ranked financial insights by relevance using a weighted scoring formula. The first version was an educated guess at the weights. From there I tuned them by hand, testing which signals mattered and in what proportion. On top of that, every upvote and downvote adjusted how a given insight scored for that analyst, so each person's feed sharpened the more they used it.

That feedback loop taught me the lesson I've been chasing ever since. A score is only as good as the signal you feed it, and the people using the product are the signal.

Data platform & APIs

REsurety · 2022-2025

Customer data APIs
Structured data platform
Internal user management (0 to 1)

Usage over opinion

The signal behind every roadmap call

0 to 1

Internal User Management App: Decrease TTV and increase Commercial efficiency

What customers asked for and what they actually did rarely matched. Building for the first one kept burning roadmap.

I owned three surfaces at once. Customer-facing data APIs, a structured data platform, and a zero-to-one internal tool for user management. Keeping them coherent meant grounding decisions in observed usage instead of stated preferences. The datasets people pulled every day showed me where the value lived. The ones they praised on calls but never opened were noise.

That discipline changed where my time went. Features customers used every day kept earning roadmap space. Features that had shipped but sat unused stopped getting it.

AI claims workflow

Veson Nautical · 2025

Multi-agent orchestration
Confidence scoring · fallback logic
Eval frameworks · HITL audit

Production

Real money, real audit trails

Reversion

The signal I built the boundary around

Written by me Quantify the AI Advantage in Claims Management ↗

In the press Ship Technology Excellence Awards 2025: Veson Nautical ↗

An AI workflow that gets a claim wrong isn't a demo bug. It's a financial decision with consequences.

I owned the production AI claims workflow. I joined once the initiative was already in motion and carried it through to live customers. It ran on multi-agent orchestration with confidence scoring, fallback logic, and evaluation frameworks, all behind a human-in-the-loop audit interface. The signal that mattered most wasn't headline accuracy. It was reversion. When a reviewer overrides the system, that's the clearest tell that the automation boundary sits too aggressively for that kind of case. So I designed the boundaries to treat reversion as the calibration signal rather than leaning on a single eval score. It's the same instinct from PM10x, now built into a system with money on the line.

How I build

I ship the prototype, then the spec.

Most PMs hand off a requirements doc and wait. I open Claude Code, v0, or Figma Make and build the thing first. A working prototype answers in an afternoon what a spec argues about for two weeks, and it puts a real artifact in front of engineering instead of a wishlist.

The interactive widget on this page is the same habit. It's faster to build the idea than to describe it, and the build is the proof that I can.

Claude Code v0 Figma

Akt Shipped

A workout tracking app I built and actually use every day. Real data model, real daily use, not a screenshot.

Bopo Experiment

A smaller side build for pressure-testing an idea end to end before it earns roadmap space.

Since 2019, I've tuned systems to surface what matters, reading the signal in what's actually working.

Automation boundaries are a dial, not a switch.

Professional Experience

I ship the prototype, then the spec.