Sam Stiller / Product Manager Connect on LinkedIn
Samuel Stiller, Product Manager Sam Stiller · Product Manager · Boston

Since 2019, I've tuned systems to surface what matters, reading the signal in what's actually working.

Product Manager. AI/LLM workflows and the data platforms underneath them.


How I think

Automation boundaries are a dial, not a switch.

Drag the confidence threshold. Watch which mistake you're choosing to make. The numbers are illustrative, the tradeoff is the job.

Confidence threshold 62%
0 · auto-approve everything review everything · 100
1,000 model decisions
Auto-approved Sent to human review
HIGH RISK
0
False negatives. Errors that slipped through unreviewed and reached someone.
WASTED EFFORT
0
False positives. Correct items flagged for review that never needed it.
These two costs aren't equal. A missed error can hit a customer and cost real money. A needless review costs a reviewer a few minutes. So you don't chase one accuracy number. You set the dial where the cheaper mistake absorbs the risk.

The evolution

Professional Experience

Three roles spanning Maritime Law and Finance, Renewable Energy Analytics and Finance, and Fixed Income Analysis

PM10x
FactSet · 2019-2022
Portfolio intelligence
Relevance ranking
Upvote / downvote
Live feedback re-scored each analyst's feed
Origin
Where the whole thesis started

Analysts weren't short on insights. They were short on knowing which of them mattered to one specific portfolio.

I worked on PM10x, a portfolio intelligence tool that ranked financial insights by relevance using a weighted scoring formula. The first version was an educated guess at the weights. From there I tuned them by hand, testing which signals mattered and in what proportion. On top of that, every upvote and downvote adjusted how a given insight scored for that analyst, so each person's feed sharpened the more they used it.

That feedback loop taught me the lesson I've been chasing ever since. A score is only as good as the signal you feed it, and the people using the product are the signal.

Data platform & APIs
REsurety · 2022-2025
Customer data APIs
Structured data platform
Internal user management (0 to 1)
Usage over opinion
The signal behind every roadmap call
0 to 1
Internal User Management App: Decrease TTV and increase Commercial efficiency

What customers asked for and what they actually did rarely matched. Building for the first one kept burning roadmap.

I owned three surfaces at once. Customer-facing data APIs, a structured data platform, and a zero-to-one internal tool for user management. Keeping them coherent meant grounding decisions in observed usage instead of stated preferences. The datasets people pulled every day showed me where the value lived. The ones they praised on calls but never opened were noise.

That discipline changed where my time went. Features customers used every day kept earning roadmap space. Features that had shipped but sat unused stopped getting it.

AI claims workflow
Veson Nautical · 2025
Multi-agent orchestration
Confidence scoring · fallback logic
Eval frameworks · HITL audit
Production
Real money, real audit trails
Reversion
The signal I built the boundary around

An AI workflow that gets a claim wrong isn't a demo bug. It's a financial decision with consequences.

I owned the production AI claims workflow. I joined once the initiative was already in motion and carried it through to live customers. It ran on multi-agent orchestration with confidence scoring, fallback logic, and evaluation frameworks, all behind a human-in-the-loop audit interface. The signal that mattered most wasn't headline accuracy. It was reversion. When a reviewer overrides the system, that's the clearest tell that the automation boundary sits too aggressively for that kind of case. So I designed the boundaries to treat reversion as the calibration signal rather than leaning on a single eval score. It's the same instinct from PM10x, now built into a system with money on the line.


How I build

I ship the prototype, then the spec.

Most PMs hand off a requirements doc and wait. I open Claude Code, v0, or Figma Make and build the thing first. A working prototype answers in an afternoon what a spec argues about for two weeks, and it puts a real artifact in front of engineering instead of a wishlist.

The interactive widget on this page is the same habit. It's faster to build the idea than to describe it, and the build is the proof that I can.

Claude Code v0 Figma
Akt Shipped
A workout tracking app I built and actually use every day. Real data model, real daily use, not a screenshot.
Bopo Experiment
A smaller side build for pressure-testing an idea end to end before it earns roadmap space.