←back to thread

Claude for Chrome

(www.anthropic.com)
795 points davidbarker | 1 comments | | HN request time: 0.215s | source
1. shardullavekar ◴[] No.45037163[source]
I've been building a general browser agent myself, and I’ve found the biggest bottleneck in these systems isn’t capability demos but long-running reliability.

Tools like Manus / GPT Agent Mode / BrowserUse / Claude’s Chrome control typically make an LLM call per action/decision. That piles up latency, cost, and fragility as the DOM shifts, sessions expire, and sites rate-limit. Eventually you hit prompt-injection landmines or lose context and the run stalls.

I am approaching browser agents differently: record once, replay fast. We capture HTML snapshots + click targets + short voice notes to build a deterministic plan, then only use an LLM for rare ambiguities or recovery. That makes multi-hour jobs feasible. Concretely, users run things like:

Recruiter sourcing for hours at a stretch

SEO crawls: gather metadata → update internal dashboard → email a report

Bulk LinkedIn connection flows with lightweight personalization

Even long web-testing runs

A stress test I like (can share code/method): “Find 100+ GitHub profiles in Bangalore strong in Python + Java, extract links + metadata, and de-dupe.” Most per-step-LLM agents drift or stall after a few minutes due to DOM churn, pagination loops, or rate limits. A record-→-replay plan with checkpoints + idempotent steps tends to survive.

I’d benchmark on:

Throughput over time (actions/min sustained for 30–60+ mins)

End-to-end success rate on multi-page flows with infinite scroll/pagination

Resume semantics (crash → restart without duplicates)

Selector robustness (resilient to minor DOM changes)

Cost per 1,000 actions

Disclosure: I am the founder of 100x.bot (record-to-agent, long-run reliability focus). I’m putting together a public benchmark with the scenario above + a few gnarlier ones (auth walls, rate-limit backoff, content hashing for dedupe). If there’s interest, I can post the methodology and harness here so results are apples-to-apples.