fcung.com

Feng · software engineer & AI tinkerer · based in Tokyo

写写代码，玩玩模型，偶尔整理点笔记。This is where I keep track of things I'm learning and building.

最近在玩什么 / Recent Projects

用 ollama + llamacpp 在本机跑了几个 7B~13B 模型，对比了 Qwen2.5 和 Llama 3 的中文表现。部署笔记整理中。

Trying out Chroma + sentence-transformers for document retrieval. Roughly following the LangChain recipe but with minimal deps.

用 LoRA 微调了一个 1.5B 的 Qwen 模型做中文摘要。一张 4090 刚好够用，loss 曲线还算漂亮。

搭了一条 CI pipeline 用 GPT-4o 自动 review PR 里的 diff。还在调试 prompt 稳定性。

2026-06-18 · 6 min read

量化到 4-bit 损失多少？8-bit 推理到底能省多少显存？这篇文章记录了我用 AutoGPTQ 和 bitsandbytes 做的一组对比测试。

LLM quantization deep learning

2026-06-02 · 10 min read

不想为了跑个模型就上一整套 Python 栈。试用 ort 在 Rust 里直接加载 ONNX 模型做推理，内存占用比 Python 方案低了不少。

Rust ONNX inference

2026-05-15 · 8 min read

Using BGE-small-zh for Chinese document retrieval. Surprisingly good for a 384-dim model. Here's the architecture and some failure cases I hit.

RAG embeddings NLP

2026-04-28 · 5 min read

在 Windows 上用 WSL2 跑 PyTorch 训练有多坑？从驱动版本到 shared memory 限制，记录一路遇到的问题和解决方法。

WSL2 CUDA PyTorch

2026-04-10 · 7 min read

把两台旧机器凑成了一个小 GPU 集群，跑些实验和 batch 推理。Docker 化部署确实省心，就是电源功耗有点感人。

Docker GPU self-hosting

在看的项目 / Reading:

About

Software engineer by trade, AI enthusiast by curiosity. 之前在云基础设施和分布式系统方向工作了几年，最近一年主要在研究 LLM 应用和模型部署。Contact: feng at fcung dot com