Daily LLM / Coding Agent Intelligence · Việt Nam/Japan/Global · 2026-05-26T06:08

AI CTO Report — Coding Agents & Harness Engineering

EXEC_SUMMARY: 5 insight hành động. DATA_HEALTH: PARTIAL do Reddit=0, Facebook=0, X=24/30; vẫn quét 148 candidates, đủ nền để ra quyết định trial có kiểm soát.

148 scanned X 24 YT 20 GH 64

1. Executive Snapshot

64 GitHub repos → ưu tiên PoC OSS agent wrappers cho NEXA/SYNCA.
20 YouTube video IDs → nhu cầu enablement Claude Code/Codex/Cursor còn cao.
30 HN/dev_web items → discourse nghiêng về reliability/eval.
10 arXiv papers; quota 15 chưa đạt → benchmark confidence giảm 1 bậc.
0 Reddit + 0 Facebook usable → không claim sentiment social PASS.

148

candidates

GitHub

X fallback

YouTube

PARTIAL

DATA_HEALTH

2. KOL/OG Feed Watch

X: 24/30 public-web fallback, engagement N/A do unauth parse. YouTube: 20 IDs, views N/A. Reddit/Facebook: blocker 0. HN/GitHub: direct links.

Nguồn	Link trực tiếp	Metric	Timestamp
dev_web	What ClickHouse learned from a year of coding with AI agents	2 pts / 0 comments	2026-05-25T17:36:45Z
dev_web	Ask HN: What do you do at work while the coding agent is working?	5 pts / 6 comments	2026-05-25T16:55:30Z
dev_web	Show HN: Musts – Open-source validation loops for AI coding agents	1 pts / 0 comments	2026-05-25T16:44:39Z
dev_web	Is it too soon to built software factories?	4 pts / 2 comments	2026-05-25T16:39:32Z
dev_web	Close the Coding Agent Loop	2 pts / 0 comments	2026-05-25T13:36:17Z
dev_web	Show HN: Simple Sprite Sheet Generation	3 pts / 0 comments	2026-05-24T19:37:43Z
dev_web	Show HN: My first app, artisanally vibe-coded in 4 months	3 pts / 4 comments	2026-05-24T10:07:13Z
dev_web	Zero – Programming Language for Agents	3 pts / 0 comments	2026-05-23T11:13:35Z
dev_web	Show HN: opub, donated compute for open-source	2 pts / 0 comments	2026-05-21T14:59:15Z
dev_web	Zero: The Programming Language for Agents	3 pts / 0 comments	2026-05-19T20:19:46Z
dev_web	Show HN: GoPOSIX – a Go-native POSIX userland, ~97% BusyBox-compatible	2 pts / 0 comments	2026-05-20T04:31:50Z
dev_web	Implicit Knowledge Is a Liability	1 pts / 0 comments	2026-05-12T14:37:45Z
dev_web	Ask HN: Is agent-driven QA a thing?	1 pts / 1 comments	2026-05-08T22:57:31Z
dev_web	Ask HN: May be a basic question, but how can I use AI well?	10 pts / 5 comments	2026-04-19T08:42:37Z
dev_web	Launch HN: Kampala (YC W26) – Reverse-Engineer Apps into APIs	100 pts / 83 comments	2026-04-16T15:19:54Z
dev_web	Ask HN: Opus 4.7 – is anyone measuring the real token cost on agentic tasks?	1 pts / 0 comments	2026-04-16T20:19:18Z
dev_web	Show HN: Repowise – Codebase intelligence for AI coding agents (open source)	1 pts / 0 comments	2026-04-06T20:15:26Z
dev_web	Show HN: Salacia – The First Runtime OS for Agentic Coding	1 pts / 1 comments	2026-02-28T15:32:32Z
dev_web	Show HN: Tracecore: Benchmark AI Agents on Deterministic Coding Tasks	1 pts / 0 comments	2026-02-26T22:07:31Z
dev_web	Show HN: Frouter – Live-ping and auto-configure free AI models for coding agents	1 pts / 0 comments	2026-02-25T10:03:54Z
dev_web	ForgeCode: Top open source coding agent in Terminal-Bench 2.0	4 pts / 0 comments	2026-04-29T18:16:23Z
dev_web	Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview	393 pts / 148 comments	2026-04-27T12:35:55Z
dev_web	Show HN: Amber, a capability-based runtime/compiler for agent benchmarks	1 pts / 0 comments	2026-04-13T07:48:11Z
dev_web	Claude Code ranks 39th on terminal bench. The leaked source shows why	4 pts / 2 comments	2026-04-01T12:59:36Z
dev_web	Show HN: Wozcode – double Claude Code output	4 pts / 2 comments	2026-03-31T19:07:11Z
dev_web	DAAF: Rigorous+responsible data analysis/research with Claude Code (open-source)	1 pts / 0 comments	2026-05-25T22:52:05Z
dev_web	Show HN: Unsiloed AI – #1 on olmOCR-Bench	4 pts / 3 comments	2026-05-25T21:35:03Z
dev_web	Show HN: AI skills for program / project / delivery managers	1 pts / 0 comments	2026-05-25T21:16:06Z
dev_web	Show HN: AWO – Run Claude and Codex in isolated Git worktrees	1 pts / 0 comments	2026-05-25T19:46:25Z
dev_web	Ask HN: How is all new software not broken?	1 pts / 2 comments	2026-05-25T19:03:02Z
github	gastownhall/gascity	833 stars / 266 forks / 405 issues	2026-05-25T23:05:23Z
github	JetBrains/junie	261 stars / 14 forks / 35 issues	2026-05-25T23:04:32Z
github	sashiko-dev/sashiko	740 stars / 133 forks / 43 issues	2026-05-25T23:04:02Z
github	zarazhangrui/beautiful-html-templates	1966 stars / 188 forks / 0 issues	2026-05-25T23:03:22Z
github	colbymchenry/codegraph	24809 stars / 1371 forks / 172 issues	2026-05-25T23:05:30Z
github	teng-lin/notebooklm-py	15110 stars / 2077 forks / 2 issues	2026-05-25T23:00:49Z
github	imbue-ai/mngr	373 stars / 37 forks / 122 issues	2026-05-25T23:00:34Z
github	prassanna-ravishankar/repowire	94 stars / 22 forks / 2 issues	2026-05-25T23:00:31Z
github	mochilang/mochi	328 stars / 14 forks / 62 issues	2026-05-25T22:45:33Z
github	vercel-labs/zerolang	4505 stars / 285 forks / 109 issues	2026-05-25T22:44:15Z

3. Trend Radar

Hot now: CLI coding agents + repo momentum (64 repos).
Emerging: evaluation harness cho reliability (30 dev_web/HN signals).
Noise: demo video thiếu benchmark (20 YT links, views N/A).
Declining confidence: social sentiment vì Reddit/Facebook=0.
Watchlist: SWE-bench/Terminal-Bench/product changelog; papers 10/15.

4. Repo / Paper / Product Watch

Repo Watch: GitHub 64 candidates; Paper Watch: arXiv 10/15; Product Watch: Claude Code, Codex CLI, Cursor Agent, Devin, OpenCode, Copilot, JetBrains AI, Replit Agent, Gemini CLI/Jules cần changelog trực tiếp vòng sau.

5. Impact Coverage

Domain	0-2w	1-2m	3-6m	Decision
FARE	pilot 2 squads	measure 10-15% cycle-time	governance gate	trial
NEXA	agent CLI playbook	eval harness baseline	internal platform	adopt
SYNCA	PoC test-gen	CI eval suite	regulated rollout	trial
Thị trường Nhật	monitor compliance	JP case studies	SI offering	monitor
Global	track velocity	benchmark map	vendor strategy	monitor

6. CTO Recommendations

Lập eval harness coding-agent nội bộ — ROI/time-saving 12-20%, risk 2/5, owner: Head of Engineering, TTV 2 tuần, validate pass@task + defects.
Trial Claude Code/Codex/Cursor trên 2 squad — ROI 10-18%, risk 3/5, owner: EM, TTV 10 ngày, validate cycle-time + PR rework.
Tạo policy prompt/data guardrail — ROI 5-8%, risk 2/5, owner: Security Lead, TTV 1 tuần, validate secret-scan + audit log.
Nâng collector social auth — ROI 20-30% research confidence, risk 1/5, owner: CTO Ops, TTV 1 ngày, validate X≥30 Reddit≥15 FB≥1.

7. Source Appendix / Blockers

DATA_HEALTH PARTIAL: source_volume PASS (148≥100); GitHub PASS; YouTube PASS; dev_web PASS; X FAIL 24/30; papers FAIL 10/15; Reddit FAIL 0/15; Facebook FAIL 0 usable. Blocker: unauth public endpoints/quota/API.

Manifest queries: coding agent, agentic programming, harness engineering AI, SWE-bench, Terminal-Bench, Claude Code, OpenAI Codex CLI, Cursor agent, OpenCode, AI coding workflow.