Case study: build một mini app bằng Hermes hoặc OpenClaw rồi audit như dev

Đây không phải báo cáo “tôi đã chạy thành công OpenClaw/Hermes và build app thật”. Bài này là một lab plan có thể dùng để chạy case study. Khi bạn thật sự chạy tool, hãy thay các section mẫu bằng evidence: machine, version, transcript summary, diff, screenshot, test result.

Lý do phải nói rõ: với agent tooling, fake case study rất nguy hiểm. Một bài viết nghe như walkthrough thành công có thể khiến người đọc copy setup chưa được kiểm chứng vào máy thật.

Mục tiêu lab này: build một mini app đủ nhỏ bằng Hermes hoặc OpenClaw, rồi audit như developer. Nếu app chạy được, tốt. Nếu không chạy, bài vẫn có giá trị vì failure cũng là evidence.

Lab scope

App đề xuất: expense tracker tĩnh.

Không backend. Không auth. Không database. Không external API. Không deploy. Không payment. Không package install trừ khi agent giải thích và bạn approve.

Feature slice:

thêm expense với amount, category, note;
hiển thị list expenses;
tính total;
validate amount > 0;
có empty state;
lưu tạm bằng browser state hoặc localStorage nếu được approve.

Đây là app nhỏ nhưng đủ để test agent có hiểu state, UI, validation, và scope không.

Setup fields bắt buộc

Trước khi chạy, ghi lại:

Tool: Hermes Agent hoặc OpenClaw
Tool version:
Install method:
Machine:
OS:
Runtime:
Model/provider:
Channel: CLI, Telegram, WebChat, hoặc khác
Workspace path:
Repo state:
Approval policy:
Network access:
File write boundary:

Nếu thiếu các field này, case study không audit được.

Ví dụ:

Tool: Hermes Agent
Channel: CLI
Workspace path: ~/agent-labs/hermes-expense-tracker
Approval policy: ask before write and shell
Network access: disabled for app task
File write boundary: workspace only

Đừng ghi token, account ID nhạy cảm, full home path có secret, hoặc transcript chứa credential.

Initial app brief

Prompt đầu tiên:

We are in a disposable lab repo.

Build the first slice of a tiny expense tracker.

Requirements:
- Add one expense with amount, category, and note.
- Show expenses in a list.
- Show total amount.
- Validate amount must be greater than 0.
- Show an empty state before any expense exists.

Constraints:
- No backend.
- No auth.
- No external API.
- No deploy.
- No package install unless you explain why first and wait for approval.
- Before editing, list the files you plan to create or modify.
- Stop after the first working slice and summarize manual test steps.

Nếu agent không restate plan hoặc không list files, yêu cầu lại. Nếu nó sửa ngay dù bạn yêu cầu wait, ghi failure vào transcript summary.

Transcript summary, không dán toàn bộ transcript

Không cần đưa toàn bộ chat vào bài. Chỉ cần summary đủ audit:

Turn 1:
- User asked for expense tracker slice.
- Agent proposed files: index.html, styles.css, app.js.
- Agent requested approval before editing.

Turn 2:
- User approved.
- Agent created three files.
- Agent did not install packages.

Turn 3:
- User reported validation bug.
- Agent inspected app.js and changed validation logic.

Nếu agent tự ý làm gì đó, ghi thẳng:

Unexpected behavior:
- Agent created package.json without approval.
- Agent changed README although not requested.

Case study tốt không cần agent hoàn hảo. Nó cần honest evidence.

Review diff như dev

Sau mỗi slice:

git status --short
git diff --stat
git diff

Audit câu hỏi:

File list có đúng scope không.
Có dependency mới không.
Có secret hoặc .env nào bị tạo không.
Có network call không.
Có code dead/generated thừa không.
State shape có đơn giản không.
Validation có thật sự chạy không.
Error state có rõ không.

Nếu agent tạo quá nhiều file, đừng cố review hết. Dừng và yêu cầu rollback hoặc revert về checkpoint.

Review UI như người dùng

Manual test:

Mở app.
Xác nhận empty state.
Submit form rỗng.
Nhập amount âm.
Nhập amount hợp lệ.
Thêm nhiều expense.
Kiểm tra total.
Refresh nếu app dùng localStorage.
Test mobile viewport.

Ghi result:

UI test result:
- Empty state: pass
- Negative amount validation: fail, form accepts -3
- Total calculation: pass for integer, not tested for decimal
- Mobile: layout overflows at 320px

Đừng chỉ ghi “works”. Works không có nghĩa gì nếu không nói test gì.

Review data và auth

Vì lab này không có backend/auth, audit phải xác nhận điều đó:

Data:
- No backend.
- No external API.
- No real user data.
- localStorage only, if used.

Auth:
- Not implemented.
- Not needed for this slice.
- Do not reuse this app as production finance tracker.

Nếu agent tự thêm fake auth, xóa. Fake auth làm người đọc hiểu sai app.

Nếu agent dùng sample data, label rõ:

Sample data only. Not connected to real account or API.

What got reverted or rewritten manually

Đây là section quan trọng nhất của case study. AI-assisted build thường cần chỉnh tay. Hãy nói rõ.

Ví dụ:

Manual changes:
- Removed unused package.json because no build tool was needed.
- Rewrote amount parsing to handle decimals.
- Removed console.log left by the agent.
- Simplified CSS that caused mobile overflow.

Nếu không chỉnh tay gì, vẫn ghi:

Manual changes:
- None. Only reviewed diff and manual tested.

Đừng biến case study thành quảng cáo rằng agent làm tất cả. Người đọc cần biết phần nào agent làm, phần nào human sửa.

Verdict format

Verdict nên ngắn và cụ thể:

Verdict:
- Useful for prototype: yes.
- Safe for production: no.
- Main value: generated first UI/state slice quickly.
- Main risk: validation and mobile layout needed manual review.
- Next step: add tests or rewrite into real project structure before reuse.

Nếu tool fail:

Verdict:
- Useful for prototype: not yet.
- Failure reason: repeated tool-call loop after file write.
- Next step: retry with CLI, lower max turns, or switch model/provider.

Failure verdict vẫn đáng publish nếu evidence rõ.

Chốt lại

Một case study vibe coding tốt không cần chứng minh agent thần kỳ. Nó cần chứng minh bạn biết kiểm soát agent.

Ghi setup. Giữ scope nhỏ. Tóm tắt transcript. Review diff. Test UI. Xác nhận data/auth. Nói rõ phần revert hoặc rewrite. Kết luận bằng verdict thực tế.

Nếu chạy lab này bằng Hermes hoặc OpenClaw, kết quả tốt nhất không phải “app chạy”. Kết quả tốt nhất là bạn biết chính xác app chạy đến đâu, hỏng ở đâu, và có nên tin diff đó hay không.