Blog | nghia-pham.dev

LLM hoạt động thế nào: mental model cho dev

Bạn gõ câu hỏi vào ChatGPT, 3 giây sau nhận được câu trả lời. Ở giữa có gì? Bài viết mở hộp đen: tokenize, embed, attention, sample — không dùng một công thức toán nào, chỉ mental model cho dev đã quen code nhưng lần đầu đọc kỹ về LLM.

Apr 22, 2026 ~10 min read

llm ai machine-learning transformer tutorial
Calculus cho LLM: gradient, chain rule, backprop intuition

Đạo hàm nghe sợ nhưng cốt lõi chỉ là đo độ dốc. Gradient là đạo hàm của hàm nhiều biến. Chain rule là cách chuyền gradient ngược qua nhiều layer. Backprop = chain rule áp dụng có hệ thống. Bài này xây intuition cho dev, không giải bài tập toán.

Apr 22, 2026 ~10 min read

llm ai machine-learning math calculus
LLM từ zero: Series Plan

Roadmap 30 bài học LLM từ foundation math đến production deployment cho senior dev muốn pivot AI — mental model, tokenization, attention, training, fine-tuning, inference, advanced topics. Hybrid approach: 70% hands-on code + 30% blog.

Apr 22, 2026 ~6 min read

llm ai machine-learning series learning-path
Linear algebra cho LLM: vector, matrix, dot product

Bài 1 nói mọi thứ bên trong LLM đều là vector và matrix. Vector là gì? Matrix là gì? Tại sao dot product là backbone của attention và RAG? Bài này phá băng math foundation cho dev — chỉ 4 khái niệm, không công thức phức tạp.

Apr 22, 2026 ~13 min read

llm ai machine-learning math linear-algebra
Neural network tối giản: perceptron, MLP từ zero

Ghép linear algebra + calculus + probability thành neural network đầu tiên. Từ perceptron 1957 đến MLP đa layer, code 60 dòng NumPy train XOR không cần PyTorch. Sau bài này, bạn hiểu building block gốc của mọi LLM hiện đại.

Apr 22, 2026 ~12 min read

llm ai machine-learning neural-network perceptron
Probability cho LLM: softmax, cross-entropy, perplexity

LLM output là xác suất, không phải lựa chọn cứng. Softmax biến logits thành phân phối. Cross-entropy là loss function chuẩn. Perplexity là metric đánh giá model. Bài này giải thích tại sao mấy khái niệm này là trái tim của training và evaluation, với code NumPy minh hoạ.

Apr 22, 2026 ~11 min read

llm ai machine-learning math probability
Build BPE tokenizer từ đầu (theo Karpathy minbpe)

Bài 6 giới thiệu BPE. Bài này code từ zero — 150 dòng Python thuần không dependency. Train tokenizer trên Shakespeare, encode/decode, visualize merges. Sau bài này hiểu 100% BPE thay vì chỉ đọc paper thấy abstract.

Apr 22, 2026 ~12 min read

llm ai tokenization bpe python
Attention mechanism: Query, Key, Value intuition

Paper 'Attention is All You Need' (2017) là điểm bùng nổ của Transformer. Nhưng Q/K/V từ đâu ra, nghĩa gì, tại sao 3 cái thay vì 1? Bài này giải thích bằng analogy thư viện, không công thức - xây intuition trước khi đến code ở bài 10.

Apr 22, 2026 ~11 min read

llm ai attention transformer qkv
Embeddings: word2vec, contextual, và positional encoding (RoPE)

Token ID biến thành vector - đó là embedding. Nhưng vector đó từ đâu? word2vec (2013) dạy model hiểu semantic. Contextual embedding (BERT/GPT) khác word2vec thế nào? Tại sao cần thêm positional encoding, và RoPE làm điều đó cách nào?

Apr 22, 2026 ~11 min read

llm ai embeddings word2vec rope
Multi-head attention: tại sao chia nhiều head

Bài 10 code single-head attention. GPT/Llama có 32-128 heads. Tại sao chia? Mỗi head làm gì khác nhau? Cost tính thêm bao nhiêu? Bài này: intuition + code multi-head bằng NumPy, visualize head specialization (syntax, coreference, long-range).

Apr 22, 2026 ~13 min read

llm ai attention multi-head transformer
nanoGPT: 300 dòng PyTorch tái tạo GPT từ đầu

Capstone Part 3. Karpathy nanoGPT là implementation GPT-2 trọn vẹn trong ~300 dòng PyTorch. Bài này walk-through code, train GPT nhỏ trên Shakespeare trong 15 phút CPU, generate text. Sau bài này bạn code được GPT-2 nhỏ không cần HuggingFace.

Apr 22, 2026 ~12 min read

llm ai gpt pytorch nanogpt
Self-attention: code từ đầu bằng NumPy

Bài 9 đã xây intuition QKV. Bài này code từ zero một self-attention layer hoàn chỉnh bằng NumPy thuần - 80 dòng, xử lý batch, causal mask, scaling. Verify output matches PyTorch implementation. Sau bài này, attention không còn là hộp đen.

Apr 22, 2026 ~10 min read

llm ai attention self-attention numpy
Transformer block: attention + MLP + layer norm + residual

Multi-head attention là một nửa Transformer. Nửa còn lại: MLP (feed-forward), layer normalization, residual connection. Bài này ghép 4 thành phần thành 1 block hoàn chỉnh, stack 12 block thành GPT-2, giải thích thứ tự (pre-norm vs post-norm) và tại sao residual quan trọng.

Apr 22, 2026 ~13 min read

llm ai transformer mlp layer-norm
Tokenization: BPE, WordPiece, SentencePiece

Bài 1 nói input text biến thành tokens. Nhưng cách biến là gì? BPE, WordPiece, SentencePiece có gì khác nhau? Tại sao tokenizer quyết định nhiều hơn bạn nghĩ - từ cost API đến khả năng model xử lý tiếng Việt. Deep dive cho dev.

Apr 22, 2026 ~14 min read

llm ai machine-learning tokenization bpe
AI Coding Providers Series: Chọn đúng plan cho workload của bạn

Series research và so sánh các AI coding plan (subscription + API pay-per-token) của Anthropic, Alibaba, GLM, Moonshot, OpenAI. Giúp lập trình viên chọn đúng provider cho ngân sách và workflow thực tế.

Apr 21, 2026 ~1 min read

ai llm coding pricing comparison
Mua AI Coding Plan nào? Research 5 providers lớn (2026-04)

So sánh chi tiết subscription plan và API pay-per-token pricing của Anthropic, Alibaba, GLM, Moonshot, OpenAI tại thời điểm tháng 4/2026. Kèm decision framework và cảnh báo billing pitfall.

Apr 21, 2026 ~11 min read

ai llm coding pricing comparison
Tiếng Việt tốn hơn x2 token? Data nói khác

Benchmark trên 5626 prompt thực tế từ 555 sessions Claude Code. Claim 'tiếng Việt tốn hơn x2 token' chỉ đúng 2.9% use case. Phần lớn thời gian mix-lang Việt-Anh còn tiết kiệm hơn pure English, và data cho thấy lý do.

Apr 21, 2026 ~14 min read

llm prompt-engineering token-optimization benchmark
Does Vietnamese really cost 2x+ tokens in LLM prompts? Data from 5626 real messages

Benchmark across 5626 real prompts from 555 Claude Code sessions shows the '>2x token' claim for Vietnamese only applies to 2.9% of actual usage. Mixed Vietnamese-English prompts are more token-efficient than pure English on longer messages, and the data shows why.

Apr 21, 2026 ~13 min read

llm prompt-engineering token-optimization benchmark
Canvas: dựng report branded cho stakeholder

Dùng Canvas của Kibana để dựng infographic pixel-precise có brand công ty: khác Dashboard thế nào, expression language pipeline, data source ESSQL, dynamic image/color theo value, và export PDF multi-page giao CEO/CFO — dành cho developer backend và platform team.

Apr 16, 2026 ~8 min read

kibana canvas reporting essql visualization
Discover nâng cao: Runtime fields, filter phức tạp, highlighting

Nâng Discover từ mức cơ bản lên power-user: tạo Runtime field không cần reindex, filter nested object và regex, bật highlighting để scan log nhanh, phân biệt Saved Query với Saved Search, inspect request để debug query và tối ưu performance.

Apr 16, 2026 ~8 min read

kibana discover runtime-fields painless elasticsearch
KQL và ES|QL: So sánh hai ngôn ngữ query của Kibana

Phân biệt KQL và ES|QL trong Kibana 8.x: triết lý khác nhau, cú pháp đối chiếu, pitfall phổ biến, và quy tắc tay chọn ngôn ngữ nào cho filter, aggregation, alert và dashboard — dành cho developer backend và DevOps.

Apr 16, 2026 ~10 min read

kibana elasticsearch kql esql query-language
Lens: từ drag-drop tới công thức phức tạp

Dựng visualization trong Kibana 8.x bằng Lens: drag-drop chart cơ bản, Formula mode với function và time shift, annotation layer cho deploy marker, reference line cho SLO, pitfall về cardinality và time interval — dành cho developer backend muốn tự làm dashboard production-grade.

Apr 16, 2026 ~8 min read

kibana lens visualization dashboard formula
Kibana cho Developer: Filter log, Saved Search, Dashboard và REST API

Hướng dẫn toàn diện sử dụng Kibana cho lập trình viên backend: filter error log bằng KQL, tránh pitfall với ES|QL, tạo Saved Search và Dashboard qua GUI, tương tác Kibana qua REST API và quản lý API key an toàn.

Apr 15, 2026 ~11 min read

kibana elasticsearch logging elk observability
Kibana từ A đến Z: Series Plan

Roadmap series 28 bài học Kibana từ cơ bản đến production — cover Discover, KQL/ES|QL, Lens, Dashboard, Alerts, RBAC, ILM, automation và troubleshooting cho developer backend.

Apr 15, 2026 ~5 min read

kibana series learning-path elk observability
Backstage on Kubernetes: Practical Platform Engineering Guide

Implement a practical Internal Developer Platform with Backstage on Kubernetes, software templates, service catalog, and golden paths for engineering teams.

~2 min read

backstage kubernetes platform-engineering idp developer-experience
ArgoCD Advanced Patterns: App of Apps and Promotion Flows

Implement advanced ArgoCD patterns for scalable GitOps: App of Apps, environment promotion, sync waves, and safe progressive delivery workflows.

~2 min read

argocd gitops kubernetes progressive-delivery cicd
[24/24] E is for Etcd: Understanding the Brain of Kubernetes

A deep dive into etcd, the distributed key-value store that powers Kubernetes. Learn about consistency, high availability, and backup strategies.

~2 min read

kubernetes a-to-z-series etcd database distributed-systems
[23/24] B is for Best Practices: Building Secure and Reliable Apps

The second post in our Kubernetes A-to-Z series covering essential best practices for security, reliability, and resource management.

~3 min read

kubernetes a-to-z-series best-practices security reliability
[19/24] A is for Authentication and RBAC: Securing Your Cluster

The sixteenth post in our Kubernetes A-to-Z series covering authentication mechanisms, Role-Based Access Control, security contexts, and cluster security best practices.

~6 min read

kubernetes a-to-z-series authentication rbac security
[4/24] D is for Deployments: Managing Application Lifecycle

The fourth post in our Kubernetes A-to-Z series covering Deployments, rolling updates, rollbacks, and application lifecycle management strategies.

~7 min read

kubernetes a-to-z-series deployments rolling-updates rollbacks
[2/24] C is for Containers: Docker Fundamentals Before Kubernetes

The second post in our Kubernetes A-to-Z series covering container fundamentals, Docker basics, and essential concepts needed before learning Kubernetes.

~8 min read

docker containers a-to-z-series kubernetes fundamentals
[20/24] F is for Federation: Multi-Cluster Management

The seventeenth post in our Kubernetes A-to-Z series covering multi-cluster architectures, federation patterns, service mesh, disaster recovery, and cross-cluster communication.

~6 min read

kubernetes a-to-z-series federation multi-cluster service-mesh
[22/24] G is for GitOps: Modern Deployment Workflows

A comprehensive guide to GitOps principles and practices, comparing ArgoCD and FluxCD with practical examples, deployment strategies, and production best practices.

~10 min read

gitops argocd fluxcd kubernetes ci-cd
Building Internal Developer Platforms on Kubernetes: A Comprehensive Guide

Learn how to build an Internal Developer Platform (IDP) on Kubernetes with Backstage, self-service capabilities, golden paths, and platform engineering best practices.

~12 min read

platform-engineering kubernetes backstage developer-experience devops
[11/24] I is for Ingress: Managing External Access

The tenth post in our Kubernetes A-to-Z series covering Ingress controllers, routing rules, TLS termination, and advanced traffic management patterns.

~6 min read

kubernetes a-to-z-series ingress networking tls
[1/24] K is for Kubernetes: Understanding the Basics and Architecture

The first post in our Kubernetes A-to-Z series covering Kubernetes fundamentals, architecture, components, and basic cluster setup.

~7 min read

kubernetes a-to-z-series architecture basics tutorial
[7/24] J is for Jobs and CronJobs: Batch Processing in Kubernetes

Learn how to run one-off tasks and scheduled batch jobs in Kubernetes using Jobs and CronJobs resources.

~2 min read

kubernetes a-to-z-series jobs cronjobs batch-processing
Kafka Partition Design for IoT: Throughput and Ordering

Design Kafka topic and partition strategy for IoT workloads with practical guidance on throughput, ordering, consumer scaling, and operational limits.

~3 min read

kafka iot streaming partitions architecture
Kubernetes Backup and Disaster Recovery: Velero and etcd

Design a practical backup and disaster recovery strategy for Kubernetes with etcd snapshots, Velero, restore drills, and RTO/RPO planning.

~2 min read

kubernetes disaster-recovery backup velero etcd
[12/24] H is for Helm: Package Management for Kubernetes

The eleventh post in our Kubernetes A-to-Z series covering Helm charts, repositories, templating, values, and application lifecycle management.

~7 min read

kubernetes a-to-z-series helm package-management charts
Kubernetes Multi-Tenancy: Namespace, RBAC, and Quota Design

Design a practical multi-tenant Kubernetes model with namespace boundaries, RBAC, network isolation, quotas, and operational guardrails.

~2 min read

kubernetes multi-tenancy rbac namespace resourcequota
Kubernetes Cost Optimization in Production

A practical guide to reducing Kubernetes infrastructure spend with right-sizing, autoscaling, scheduling strategy, and workload-level optimization.

~3 min read

kubernetes finops cost-optimization autoscaling performance
Kubernetes Security Hardening Checklist for Production

A practical security hardening checklist for production Kubernetes clusters, covering identity, network, workloads, supply chain, and runtime controls.

~3 min read

kubernetes security hardening rbac networkpolicy
Kubernetes A-to-Z Series: Complete Learning Path

A comprehensive 24-part blog series covering Kubernetes from beginner to advanced level, with practical examples and real-world scenarios.

~5 min read

kubernetes series learning-path devops containers
Kubernetes vs Docker Swarm: Complete Comparison Guide with Command Cheatsheets

A comprehensive comparison of Kubernetes and Docker Swarm container orchestration platforms, including detailed command cheatsheets, architecture differences, and practical examples.

~8 min read

kubernetes docker-swarm container-orchestration devops comparison
[10/24] M is for ConfigMaps and Secrets: Managing Configuration

The ninth post in our Kubernetes A-to-Z series covering ConfigMaps, Secrets, configuration management patterns, and environment-specific deployments.

~7 min read

kubernetes a-to-z-series configmaps secrets configuration
[15/24] L is for Logging and Monitoring: Observability in Kubernetes

The thirteenth post in our Kubernetes A-to-Z series covering logging architectures, Prometheus metrics, distributed tracing, and observability best practices.

~7 min read

kubernetes a-to-z-series logging monitoring observability
[13/24] O is for Operators: Extending Kubernetes Functionality

The twelfth post in our Kubernetes A-to-Z series covering Operators, Custom Resource Definitions (CRDs), controller patterns, and extending Kubernetes.

~6 min read

kubernetes a-to-z-series operators crd custom-resources
[8/24] N is for Namespaces: Organizing Your Cluster

The seventh post in our Kubernetes A-to-Z series covering Namespaces, multi-tenancy, resource quotas, and cluster organization strategies.

~8 min read

kubernetes a-to-z-series namespaces multi-tenancy resource-quotas
[3/24] P is for Pods: The Basic Building Blocks of Kubernetes

The third post in our Kubernetes A-to-Z series covering pods, their lifecycle, networking, storage, and multi-container patterns.

~10 min read

kubernetes a-to-z-series pods containers multi-container
Kubernetes Observability Stack: Prometheus, OpenTelemetry, and Loki

Build a practical Kubernetes observability stack using metrics, logs, and traces with Prometheus, OpenTelemetry, Loki, and actionable SLO-driven alerting.

~2 min read

kubernetes observability prometheus opentelemetry loki
PostgreSQL Index Size Deep Dive: Why Indexes Grow Fast

Understand why PostgreSQL indexes can grow quickly in production and how to control index bloat with better schema design, maintenance, and query patterns.

~2 min read

postgresql database index performance storage
[17/24] Q is for Quality Assurance: Testing in Kubernetes

The fifteenth post in our Kubernetes A-to-Z series covering testing strategies, chaos engineering, CI/CD integration, and quality assurance best practices.

~6 min read

kubernetes a-to-z-series testing quality-assurance chaos-engineering
[6/24] R is for ReplicaSets: Ensuring High Availability

The sixth post in our Kubernetes A-to-Z series covering ReplicaSets, scaling strategies, pod disruption budgets, and high availability patterns.

~7 min read

kubernetes a-to-z-series replicasets high-availability scaling
Stateful Workloads on Kubernetes: PostgreSQL and Kafka Operators

Run stateful workloads safely on Kubernetes with operator-based patterns for PostgreSQL and Kafka, including storage, scaling, backup, and failure recovery.

~2 min read

kubernetes stateful postgresql kafka operators
Service Mesh Deep Dive: Istio vs Linkerd vs Consul Connect

A comprehensive comparison of service mesh platforms including architecture, features, performance benchmarks, and practical implementation guides for Istio, Linkerd, and Consul Connect.

~11 min read

service-mesh istio linkerd consul kubernetes
[5/24] S is for Services: Networking and Service Discovery

The fifth post in our Kubernetes A-to-Z series covering Services, networking patterns, service discovery, and load balancing in Kubernetes.

~7 min read

kubernetes a-to-z-series services networking service-discovery
[16/24] T is for Troubleshooting: Common Issues and Solutions

The fourteenth post in our Kubernetes A-to-Z series covering debugging techniques, common issues, diagnostic commands, and systematic troubleshooting approaches.

~8 min read

kubernetes a-to-z-series troubleshooting debugging diagnostics
[18/24] U is for Upgrades: Managing Cluster Lifecycle

Master the art of Kubernetes upgrades. Learn about version skew policies, node draining, and strategies for zero-downtime cluster maintenance.

~2 min read

kubernetes a-to-z-series upgrades maintenance lifecycle
[9/24] V is for Volumes: Persistent Storage in Kubernetes

The eighth post in our Kubernetes A-to-Z series covering Volumes, PersistentVolumes, PersistentVolumeClaims, storage classes, and stateful application patterns.

~8 min read

kubernetes a-to-z-series volumes persistent-storage pv
[14/24] Y is for YAML: Mastering the Language of Kubernetes

Love it or hate it, YAML is the language of Kubernetes. Learn syntax tips, common pitfalls, and tools to validate your manifests.

~2 min read

kubernetes a-to-z-series yaml configuration tools
[21/24] Z is for Zero-Downtime Deployments: Advanced Deployment Strategies

The final post in our Kubernetes A-to-Z series covering advanced deployment strategies, GitOps, progressive delivery, canary deployments, and production-ready patterns.

~6 min read

kubernetes a-to-z-series zero-downtime deployment-strategies gitops

> ls ./blog/

LLM hoạt động thế nào: mental model cho dev

Calculus cho LLM: gradient, chain rule, backprop intuition

LLM từ zero: Series Plan

Linear algebra cho LLM: vector, matrix, dot product

Neural network tối giản: perceptron, MLP từ zero

Probability cho LLM: softmax, cross-entropy, perplexity

Build BPE tokenizer từ đầu (theo Karpathy minbpe)

Attention mechanism: Query, Key, Value intuition

Embeddings: word2vec, contextual, và positional encoding (RoPE)

Multi-head attention: tại sao chia nhiều head

nanoGPT: 300 dòng PyTorch tái tạo GPT từ đầu

Self-attention: code từ đầu bằng NumPy

Transformer block: attention + MLP + layer norm + residual

Tokenization: BPE, WordPiece, SentencePiece

AI Coding Providers Series: Chọn đúng plan cho workload của bạn

Mua AI Coding Plan nào? Research 5 providers lớn (2026-04)

Tiếng Việt tốn hơn x2 token? Data nói khác

Does Vietnamese really cost 2x+ tokens in LLM prompts? Data from 5626 real messages

Canvas: dựng report branded cho stakeholder

Discover nâng cao: Runtime fields, filter phức tạp, highlighting

KQL và ES|QL: So sánh hai ngôn ngữ query của Kibana

Lens: từ drag-drop tới công thức phức tạp

Kibana cho Developer: Filter log, Saved Search, Dashboard và REST API

Kibana từ A đến Z: Series Plan

Backstage on Kubernetes: Practical Platform Engineering Guide

ArgoCD Advanced Patterns: App of Apps and Promotion Flows

[24/24] E is for Etcd: Understanding the Brain of Kubernetes

[23/24] B is for Best Practices: Building Secure and Reliable Apps

[19/24] A is for Authentication and RBAC: Securing Your Cluster

[4/24] D is for Deployments: Managing Application Lifecycle

[2/24] C is for Containers: Docker Fundamentals Before Kubernetes

[20/24] F is for Federation: Multi-Cluster Management

[22/24] G is for GitOps: Modern Deployment Workflows

Building Internal Developer Platforms on Kubernetes: A Comprehensive Guide

[11/24] I is for Ingress: Managing External Access

[1/24] K is for Kubernetes: Understanding the Basics and Architecture

[7/24] J is for Jobs and CronJobs: Batch Processing in Kubernetes

Kafka Partition Design for IoT: Throughput and Ordering

Kubernetes Backup and Disaster Recovery: Velero and etcd

[12/24] H is for Helm: Package Management for Kubernetes

Kubernetes Multi-Tenancy: Namespace, RBAC, and Quota Design

Kubernetes Cost Optimization in Production

Kubernetes Security Hardening Checklist for Production

Kubernetes A-to-Z Series: Complete Learning Path

Kubernetes vs Docker Swarm: Complete Comparison Guide with Command Cheatsheets

[10/24] M is for ConfigMaps and Secrets: Managing Configuration

[15/24] L is for Logging and Monitoring: Observability in Kubernetes

[13/24] O is for Operators: Extending Kubernetes Functionality

[8/24] N is for Namespaces: Organizing Your Cluster

[3/24] P is for Pods: The Basic Building Blocks of Kubernetes

Kubernetes Observability Stack: Prometheus, OpenTelemetry, and Loki

PostgreSQL Index Size Deep Dive: Why Indexes Grow Fast

[17/24] Q is for Quality Assurance: Testing in Kubernetes

[6/24] R is for ReplicaSets: Ensuring High Availability

Stateful Workloads on Kubernetes: PostgreSQL and Kafka Operators

Service Mesh Deep Dive: Istio vs Linkerd vs Consul Connect

[5/24] S is for Services: Networking and Service Discovery

[16/24] T is for Troubleshooting: Common Issues and Solutions

[18/24] U is for Upgrades: Managing Cluster Lifecycle

[9/24] V is for Volumes: Persistent Storage in Kubernetes

[14/24] Y is for YAML: Mastering the Language of Kubernetes

[21/24] Z is for Zero-Downtime Deployments: Advanced Deployment Strategies

ls ./blog/