AI CTO Report — Coding Agents & Harness Engineering
EXEC_SUMMARY: 5 insight hành động. DATA_HEALTH: PARTIAL do Reddit=0, Facebook=0, X=24/30; vẫn quét 148 candidates, đủ nền để ra quyết định trial có kiểm soát.
148 scanned X 24 YT 20 GH 64
1. Executive Snapshot
- 64 GitHub repos → ưu tiên PoC OSS agent wrappers cho NEXA/SYNCA.
- 20 YouTube video IDs → nhu cầu enablement Claude Code/Codex/Cursor còn cao.
- 30 HN/dev_web items → discourse nghiêng về reliability/eval.
- 10 arXiv papers; quota 15 chưa đạt → benchmark confidence giảm 1 bậc.
- 0 Reddit + 0 Facebook usable → không claim sentiment social PASS.
2. KOL/OG Feed Watch
X: 24/30 public-web fallback, engagement N/A do unauth parse. YouTube: 20 IDs, views N/A. Reddit/Facebook: blocker 0. HN/GitHub: direct links.
| Nguồn | Link trực tiếp | Metric | Timestamp |
|---|---|---|---|
| dev_web | What ClickHouse learned from a year of coding with AI agents | 2 pts / 0 comments | 2026-05-25T17:36:45Z |
| dev_web | Ask HN: What do you do at work while the coding agent is working? | 5 pts / 6 comments | 2026-05-25T16:55:30Z |
| dev_web | Show HN: Musts – Open-source validation loops for AI coding agents | 1 pts / 0 comments | 2026-05-25T16:44:39Z |
| dev_web | Is it too soon to built software factories? | 4 pts / 2 comments | 2026-05-25T16:39:32Z |
| dev_web | Close the Coding Agent Loop | 2 pts / 0 comments | 2026-05-25T13:36:17Z |
| dev_web | Show HN: Simple Sprite Sheet Generation | 3 pts / 0 comments | 2026-05-24T19:37:43Z |
| dev_web | Show HN: My first app, artisanally vibe-coded in 4 months | 3 pts / 4 comments | 2026-05-24T10:07:13Z |
| dev_web | Zero – Programming Language for Agents | 3 pts / 0 comments | 2026-05-23T11:13:35Z |
| dev_web | Show HN: opub, donated compute for open-source | 2 pts / 0 comments | 2026-05-21T14:59:15Z |
| dev_web | Zero: The Programming Language for Agents | 3 pts / 0 comments | 2026-05-19T20:19:46Z |
| dev_web | Show HN: GoPOSIX – a Go-native POSIX userland, ~97% BusyBox-compatible | 2 pts / 0 comments | 2026-05-20T04:31:50Z |
| dev_web | Implicit Knowledge Is a Liability | 1 pts / 0 comments | 2026-05-12T14:37:45Z |
| dev_web | Ask HN: Is agent-driven QA a thing? | 1 pts / 1 comments | 2026-05-08T22:57:31Z |
| dev_web | Ask HN: May be a basic question, but how can I use AI well? | 10 pts / 5 comments | 2026-04-19T08:42:37Z |
| dev_web | Launch HN: Kampala (YC W26) – Reverse-Engineer Apps into APIs | 100 pts / 83 comments | 2026-04-16T15:19:54Z |
| dev_web | Ask HN: Opus 4.7 – is anyone measuring the real token cost on agentic tasks? | 1 pts / 0 comments | 2026-04-16T20:19:18Z |
| dev_web | Show HN: Repowise – Codebase intelligence for AI coding agents (open source) | 1 pts / 0 comments | 2026-04-06T20:15:26Z |
| dev_web | Show HN: Salacia – The First Runtime OS for Agentic Coding | 1 pts / 1 comments | 2026-02-28T15:32:32Z |
| dev_web | Show HN: Tracecore: Benchmark AI Agents on Deterministic Coding Tasks | 1 pts / 0 comments | 2026-02-26T22:07:31Z |
| dev_web | Show HN: Frouter – Live-ping and auto-configure free AI models for coding agents | 1 pts / 0 comments | 2026-02-25T10:03:54Z |
| dev_web | ForgeCode: Top open source coding agent in Terminal-Bench 2.0 | 4 pts / 0 comments | 2026-04-29T18:16:23Z |
| dev_web | Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview | 393 pts / 148 comments | 2026-04-27T12:35:55Z |
| dev_web | Show HN: Amber, a capability-based runtime/compiler for agent benchmarks | 1 pts / 0 comments | 2026-04-13T07:48:11Z |
| dev_web | Claude Code ranks 39th on terminal bench. The leaked source shows why | 4 pts / 2 comments | 2026-04-01T12:59:36Z |
| dev_web | Show HN: Wozcode – double Claude Code output | 4 pts / 2 comments | 2026-03-31T19:07:11Z |
| dev_web | DAAF: Rigorous+responsible data analysis/research with Claude Code (open-source) | 1 pts / 0 comments | 2026-05-25T22:52:05Z |
| dev_web | Show HN: Unsiloed AI – #1 on olmOCR-Bench | 4 pts / 3 comments | 2026-05-25T21:35:03Z |
| dev_web | Show HN: AI skills for program / project / delivery managers | 1 pts / 0 comments | 2026-05-25T21:16:06Z |
| dev_web | Show HN: AWO – Run Claude and Codex in isolated Git worktrees | 1 pts / 0 comments | 2026-05-25T19:46:25Z |
| dev_web | Ask HN: How is all new software not broken? | 1 pts / 2 comments | 2026-05-25T19:03:02Z |
| github | gastownhall/gascity | 833 stars / 266 forks / 405 issues | 2026-05-25T23:05:23Z |
| github | JetBrains/junie | 261 stars / 14 forks / 35 issues | 2026-05-25T23:04:32Z |
| github | sashiko-dev/sashiko | 740 stars / 133 forks / 43 issues | 2026-05-25T23:04:02Z |
| github | zarazhangrui/beautiful-html-templates | 1966 stars / 188 forks / 0 issues | 2026-05-25T23:03:22Z |
| github | colbymchenry/codegraph | 24809 stars / 1371 forks / 172 issues | 2026-05-25T23:05:30Z |
| github | teng-lin/notebooklm-py | 15110 stars / 2077 forks / 2 issues | 2026-05-25T23:00:49Z |
| github | imbue-ai/mngr | 373 stars / 37 forks / 122 issues | 2026-05-25T23:00:34Z |
| github | prassanna-ravishankar/repowire | 94 stars / 22 forks / 2 issues | 2026-05-25T23:00:31Z |
| github | mochilang/mochi | 328 stars / 14 forks / 62 issues | 2026-05-25T22:45:33Z |
| github | vercel-labs/zerolang | 4505 stars / 285 forks / 109 issues | 2026-05-25T22:44:15Z |
3. Trend Radar
- Hot now: CLI coding agents + repo momentum (64 repos).
- Emerging: evaluation harness cho reliability (30 dev_web/HN signals).
- Noise: demo video thiếu benchmark (20 YT links, views N/A).
- Declining confidence: social sentiment vì Reddit/Facebook=0.
- Watchlist: SWE-bench/Terminal-Bench/product changelog; papers 10/15.
4. Repo / Paper / Product Watch
Repo Watch: GitHub 64 candidates; Paper Watch: arXiv 10/15; Product Watch: Claude Code, Codex CLI, Cursor Agent, Devin, OpenCode, Copilot, JetBrains AI, Replit Agent, Gemini CLI/Jules cần changelog trực tiếp vòng sau.
5. Impact Coverage
| Domain | 0-2w | 1-2m | 3-6m | Decision |
|---|---|---|---|---|
| FARE | pilot 2 squads | measure 10-15% cycle-time | governance gate | trial |
| NEXA | agent CLI playbook | eval harness baseline | internal platform | adopt |
| SYNCA | PoC test-gen | CI eval suite | regulated rollout | trial |
| Thị trường Nhật | monitor compliance | JP case studies | SI offering | monitor |
| Global | track velocity | benchmark map | vendor strategy | monitor |
6. CTO Recommendations
- Lập eval harness coding-agent nội bộ — ROI/time-saving 12-20%, risk 2/5, owner: Head of Engineering, TTV 2 tuần, validate pass@task + defects.
- Trial Claude Code/Codex/Cursor trên 2 squad — ROI 10-18%, risk 3/5, owner: EM, TTV 10 ngày, validate cycle-time + PR rework.
- Tạo policy prompt/data guardrail — ROI 5-8%, risk 2/5, owner: Security Lead, TTV 1 tuần, validate secret-scan + audit log.
- Nâng collector social auth — ROI 20-30% research confidence, risk 1/5, owner: CTO Ops, TTV 1 ngày, validate X≥30 Reddit≥15 FB≥1.
7. Source Appendix / Blockers
DATA_HEALTH PARTIAL: source_volume PASS (148≥100); GitHub PASS; YouTube PASS; dev_web PASS; X FAIL 24/30; papers FAIL 10/15; Reddit FAIL 0/15; Facebook FAIL 0 usable. Blocker: unauth public endpoints/quota/API.
Manifest queries: coding agent, agentic programming, harness engineering AI, SWE-bench, Terminal-Bench, Claude Code, OpenAI Codex CLI, Cursor agent, OpenCode, AI coding workflow.