一次从Hexo成功迁移至Halo的过程
环境版本
方案调研
插件市场暂无「Hexo → Halo」官方迁移插件,主要原因在于 Hexo 的目录结构非常灵活。
发现 halo-plugin-export-md 插件,但它无法解析 Hexo 的 front-matter,也无法正确处理图片路径。
Halo 官方 VS Code 扩展 vscode-extension-halo 支持单篇发布并自动上传图片,但不支持批量处理,对 front-matter 的支持也有限,且会把图片路径改写为本地绝对路径,导致上传后图片无法显示。
整体思路
思考:可能是因为Hexo过于自由,导致没有统一的规范化,决定参考halo vscode插件的实现方式,编写了一套 Python 批处理脚本,整体流程如下:
- 统一图片:收集并上传所有图片至 Halo 附件库,获取统一的 URL 前缀;
- 批量修正链接:在本地替换 Markdown 中的图片路径,使其指向新的附件地址;
- 批量发布文章:调用 Halo API,一次性发布全部 Markdown 文件。
在实现过程中还需要注意以下细节:
- 分类(categories)和标签(tags)需要去重,避免重复创建(Halo 虽允许同名,但没必要);
- front-matter需要修正,不支持二级category,只能后期在halo手动调整category层级关系(比较简单)
- front-matter的cover也需要修正为halo附件库链接
- 我的Hexo博客除了md的图片还有专门的封面图片库gallery,所以需要新建附件库,并在修正md的时候,区分cover修正。
我的Hexo现状
主题:icarus
资源文件夹:启用了post_asset_folder: true配置,为每篇文章创建同名资源文件夹
图片引用格式:
或 {% asset_img image.png %}
文件组织结构:
source/
└── _posts/
├── 2021-H2/
│ └── 2021-12-13-20-36-12/
│ ├── 1718889960570.png (图片)
│ └── ...
│ └── 2021-12-13-20-36-12.md (文章)
├── 2023-H1/
│ └── ...
├── 2023-H2/
│ └── ...
├── 2024-H1/
│ └── 2024-01-04-15-48-34/
│ ├── 1718889620424.png (图片)
│ ├── 1718889620599.png (图片)
│ └── 1718889620719.png (图片)
│ └── 2024-01-04-15-48-34.md (文章)
│ └── ...
└── 2024-H2/
└── ...
脚本规则
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
halo_batch.py (v1.8.1) – 针对 Halo 2.21.x
-------------------------------------------------------------
find-dup-images <folder>
fix-md <src> --out <dst> [--img-prefix ..]
collect-images <src> --dst <dst>
list-meta <folder>
purge-meta <folder>
publish-md <folder> [--draft] [--no-auto-create] [--comment] [--strict]
"""
from __future__ import annotations
import os, sys, re, json, argparse, shutil, unicodedata, datetime, mimetypes
from pathlib import Path
from collections import defaultdict
from typing import Dict, List, Set
import requests, frontmatter as fm
from dotenv import load_dotenv; load_dotenv()
from tqdm import tqdm
import markdown
# ───────────────── Halo 连接 ─────────────────
HALO_URL = os.getenv("HALO_URL", "").rstrip("/")
HALO_TOKEN = os.getenv("HALO_TOKEN", "")
if not HALO_URL or not HALO_TOKEN:
sys.exit("❌ 请先在 .env 设置 HALO_URL / HALO_TOKEN")
HEADERS = {"Authorization": f"Bearer {HALO_TOKEN}"}
TIMEOUT = 30
API_BASE = "content.halo.run/v1alpha1"
API_CAT = f"/apis/{API_BASE}/categories"
API_TAG = f"/apis/{API_BASE}/tags"
API_POST_PUBLIC = f"/apis/{API_BASE}/posts" # 只读
API_POST_UC = "/apis/uc.api.content.halo.run/v1alpha1/posts" # ↑ 推荐
API_POST_CONSOLE = "/apis/api.console.halo.run/v1alpha1/posts" # 后台草稿流
DEFAULT_PREFIX = "/upload/~/.halo2/attachments/upload/images"
PREFIX_GALLERY = "/upload/~/.halo2/attachments/upload/gallery"
IMG_EXT = {".png", ".jpg", ".jpeg", ".gif", ".webp"}
# ───────────────── 通用工具 ──────────────────
def make_slug(text: str) -> str:
text = unicodedata.normalize("NFKD", str(text))
slug = re.sub(r"[^\w\-]+", "-", text.lower()).strip("-")
return slug or "untitled"
def gvk(kind: str, spec: dict, *, name: str | None = None,
gen_prefix: str = "") -> dict:
meta = {"name": name} if name else {"generateName": f"{gen_prefix}-"}
return {"apiVersion": API_BASE, "kind": kind,
"metadata": meta, "spec": spec}
def rfc3339(dt) -> str | None:
if not dt:
return None
try:
if isinstance(dt, str) and " " in dt:
dt = dt.replace(" ", "T")
return datetime.datetime.fromisoformat(str(dt)).astimezone().isoformat()
except Exception:
return None
_md = markdown.Markdown(extensions=["extra", "toc", "tables"])
def _markdown_html(raw: str) -> str:
return _md.convert(raw)
# ─────────── 分类/标签缓存 & 工具 ────────────
_cache_cat: Dict[str, str] = {}
_cache_tag: Dict[str, str] = {}
def _load_all(api: str) -> List[dict]:
r = requests.get(HALO_URL + api,
headers=HEADERS,
params={"page": 1, "size": 1000},
timeout=TIMEOUT)
body = r.json() if r.ok else {}
return body.get("items") or body.get("data") or []
def preload_taxonomies():
global _cache_cat, _cache_tag
if _cache_cat and _cache_tag:
return
for item in _load_all(API_CAT):
_cache_cat[item["spec"]["displayName"]] = item["metadata"]["name"]
for item in _load_all(API_TAG):
_cache_tag[item["spec"]["displayName"]] = item["metadata"]["name"]
def ensure_taxonomy(display: str, kind: str, *, auto_create=True) -> str:
cache = _cache_cat if kind == "cat" else _cache_tag
if display in cache:
return cache[display]
if not auto_create:
return ""
spec = {"displayName": display, "slug": make_slug(display)}
if kind == "cat":
spec.update(description="", cover="", template="",
priority=len(_cache_cat), children=[])
else:
spec.update(color="#ffffff", cover="")
obj = gvk("Category" if kind == "cat" else "Tag",
spec, gen_prefix="category" if kind == "cat" else "tag")
res = requests.post(HALO_URL + (API_CAT if kind == "cat" else API_TAG),
headers=HEADERS, json=obj, timeout=TIMEOUT)
if res.status_code in (200, 201):
name = res.json()["metadata"]["name"]
cache[display] = name
print(f"➕ 新建 {kind}: {display}")
return name
print(f"⚠️ 创建 {kind} '{display}' 失败 — {res.status_code} {res.text[:80]}",
file=sys.stderr)
return ""
# ─────────── 工具:扁平化列表 ───────────
def _flatten(lst):
"""递归展开多级 list,只保留 str"""
for item in lst:
if isinstance(item, list):
yield from _flatten(item)
elif isinstance(item, str):
yield item
# ───────────── 1) 查重图片 ─────────────
def find_dup_images(folder: Path):
grp: Dict[str, List[Path]] = defaultdict(list); total = 0
for p in folder.rglob("*"):
if p.suffix.lower() in IMG_EXT:
grp[p.name].append(p); total += 1
print(f"🔍 共扫描 {total} 张图片")
dup = {k: v for k, v in grp.items() if len(v) > 1}
if not dup:
print("✅ 未发现重名图片"); return
print("⚠️ 以下文件名重复:")
for n, ps in dup.items():
print(f"\n{n} ({len(ps)})"); [print(" └─", p) for p in ps]
# ───────────── 2) fix-md ─────────────
RE_MD_IMG = re.compile(r'!\[([^\]]*?)\]\(([^)]+)\)')
RE_ASSET = re.compile(r'{%\s*asset_img\s+([^\s]+)(?:\s+[^\}]+)?\s*%}')
def cover_path(path: str) -> str:
pre = PREFIX_GALLERY if path.startswith("/gallery/") else DEFAULT_PREFIX
return f"{pre}/{Path(path).name}"
def rewrite_md(txt: str, pre: str) -> str:
txt = RE_MD_IMG.sub(
lambda m: m.group(0) if m.group(2).startswith(("http", pre))
else f").name})", txt)
return RE_ASSET.sub(
lambda m: f").name})",
txt)
def rewrite_meta(meta: dict) -> dict:
# ---- ① 扁平化 categories ----
cats_raw = meta.get("categories", [])
cats_new = list(_flatten(cats_raw))
if cats_new:
meta["categories"] = cats_new
else:
meta.pop("categories", None)
# ---- ② cover 相对路径处理 ----
if (c := meta.get("cover")) and not c.startswith("http"):
meta["cover"] = cover_path(c)
# ---- ③ 去掉不需要的字段 ----
meta.pop("thumbnail", None)
meta.pop("author", None)
return meta
def fix_md(src: Path, dst: Path, prefix: str):
if dst.exists(): shutil.rmtree(dst)
dst.mkdir(parents=True, exist_ok=True)
tot = chg = 0
for md in tqdm(list(src.rglob("*.md")), ncols=80):
tot += 1
post = fm.loads(md.read_text(encoding="utf-8-sig"))
c0, m0 = post.content, dict(post.metadata)
post.content = rewrite_md(c0, prefix)
post.metadata = rewrite_meta(m0.copy())
out = dst / md.relative_to(src)
out.parent.mkdir(parents=True, exist_ok=True)
fm.dump(post, out, encoding="utf-8")
chg += (post.content != c0 or post.metadata != m0)
print(f"✅ 已处理 {tot} 篇,修改 {chg} 篇,输出 → {dst}")
# ───────────── 3) collect-images ─────
def collect_images(src: Path, dst: Path):
if dst.exists(): shutil.rmtree(dst)
dst.mkdir(parents=True, exist_ok=True)
rep: Dict[str, int] = defaultdict(int); cnt = 0
for p in tqdm(list(src.rglob("*")), ncols=80):
if p.suffix.lower() not in IMG_EXT: continue
name = p.name
while (dst / name).exists():
rep[p.name] += 1
name = f"{p.stem}_{rep[p.name]}{p.suffix}"
shutil.copy2(p, dst / name); cnt += 1
print(f"✅ 已复制 {cnt} 张图片 → {dst}")
# ───────────── 4) list-meta ───────────
def gather_meta(folder: Path) -> tuple[Set[str], Set[str]]:
cat: Set[str] = set(); tag: Set[str] = set()
for md in folder.rglob("*.md"):
post = fm.loads(md.read_text(encoding="utf-8-sig"))
cat.update([c for c in post.metadata.get("categories", [])
if isinstance(c, str)])
tag.update([t for t in post.metadata.get("tags", [])
if isinstance(t, str)])
return cat, tag
def list_meta(folder: Path):
cats, tags = gather_meta(folder)
print("\nCategories:"); [print(" •", c) for c in sorted(cats)]
print("\nTags:"); [print(" •", t) for t in sorted(tags)]
# ───────────── 5) purge-meta ──────────
def purge_meta(folder: Path):
cats_need, tags_need = gather_meta(folder)
all_cat, all_tag = _load_all(API_CAT), _load_all(API_TAG)
def _del(item: dict, api: str, need_set: Set[str]) -> bool:
if item["spec"]["displayName"] in need_set:
return False
name = item["metadata"]["name"]
res = requests.delete(f"{HALO_URL}{api}/{name}",
headers=HEADERS, timeout=TIMEOUT)
if res.status_code in (200, 204):
print(f"🗑️ 删除 {item['spec']['displayName']} ({name})")
return True
print(f"⚠️ 无法删除 {item['spec']['displayName']} — "
f"{res.status_code} {res.text[:80]}", file=sys.stderr)
return False
cat_del = sum(_del(i, API_CAT, cats_need) for i in all_cat)
tag_del = sum(_del(i, API_TAG, tags_need) for i in all_tag)
print(f"\n✅ 完成:删除分类 {cat_del} 个,标签 {tag_del} 个")
# ───────────── 6) publish-md ──────────
def publish_md(folder: Path, *, draft=False, no_auto_create=False,
comment=False, strict=False):
preload_taxonomies()
for md in tqdm(list(folder.rglob("*.md")), ncols=80):
post = fm.loads(md.read_text(encoding="utf-8-sig"))
meta, body = post.metadata, post.content
cats_raw = [c for c in meta.get("categories", []) if isinstance(c, str)]
tags_raw = [t for t in meta.get("tags", []) if isinstance(t, str)]
cats = [n for n in (ensure_taxonomy(c, "cat",
auto_create=not no_auto_create)
for c in cats_raw) if n]
tags = [n for n in (ensure_taxonomy(t, "tag",
auto_create=not no_auto_create)
for t in tags_raw) if n]
desc = meta.get("description") or meta.get("excerpt") or ""
excerpt = {"autoGenerate": not bool(desc), "raw": desc}
spec = {
"title": meta.get("title") or md.stem,
"slug": meta.get("slug") or make_slug(md.stem),
"content": body,
"publish": not draft,
"publishTime": rfc3339(meta.get("date")),
"visible": "PUBLIC",
"deleted": False,
"allowComment": comment,
"pinned": False,
"priority": 0,
"categories": cats,
"tags": tags,
"cover": meta.get("cover"),
"excerpt": excerpt
}
annotations = {
"content.halo.run/content-json": json.dumps({
"rawType": "markdown",
"raw": body,
"content": _markdown_html(body)
}, ensure_ascii=False)
}
payload = gvk("Post", spec, gen_prefix="post")
payload["metadata"]["annotations"] = annotations
if strict:
# Console 草稿流(需站长权限)
res = requests.post(HALO_URL + API_POST_CONSOLE,
headers=HEADERS, json=payload, timeout=TIMEOUT)
if res.status_code not in (200, 201):
print(f"\n❌ {md} 创建失败 — {res.status_code} "
f"{res.text[:120]}"); continue
name = res.json()["metadata"]["name"]
if not draft:
pub = requests.put(f"{HALO_URL}{API_POST_CONSOLE}/{name}/publish",
headers=HEADERS, timeout=TIMEOUT)
if not pub.ok:
print(f"⚠️ publish 失败 {name} — {pub.status_code}")
print(f"\n✅ 发布 {md.name}")
else:
# UC 组:一次 POST 即可
res = requests.post(HALO_URL + API_POST_UC,
headers=HEADERS, json=payload, timeout=TIMEOUT)
if res.status_code in (200, 201):
print(f"\n✅ 发布 {md.name}")
else:
print(f"\n❌ {md}: {res.status_code} {res.text[:120]}")
# ───────────── CLI ─────────────
def main():
ap = argparse.ArgumentParser(description="Halo 批量工具 v1.8.1 (for 2.21)")
sb = ap.add_subparsers(dest="cmd", required=True)
sb.add_parser("find-dup-images").add_argument("folder")
fx = sb.add_parser("fix-md")
fx.add_argument("src"); fx.add_argument("--out", required=True)
fx.add_argument("--img-prefix", default=DEFAULT_PREFIX)
cp = sb.add_parser("collect-images")
cp.add_argument("src"); cp.add_argument("--dst", required=True)
lg = sb.add_parser("list-meta"); lg.add_argument("folder")
pg = sb.add_parser("purge-meta"); pg.add_argument("folder")
pb = sb.add_parser("publish-md")
pb.add_argument("folder")
pb.add_argument("--draft", action="store_true",
help="保留为草稿(spec.publish = false)")
pb.add_argument("--no-auto-create", action="store_true",
help="只引用已有分类/标签,不自动创建")
pb.add_argument("--comment", action="store_true",
help="允许评论")
pb.add_argument("--strict", action="store_true",
help="使用后台 draft → publish 流(需 Console 权限)")
args = ap.parse_args()
cmd, f = args.cmd, lambda x: Path(x).expanduser()
if cmd == "find-dup-images":
find_dup_images(f(args.folder))
elif cmd == "fix-md":
fix_md(f(args.src), f(args.out), args.img_prefix.rstrip("/"))
elif cmd == "collect-images":
collect_images(f(args.src), f(args.dst))
elif cmd == "list-meta":
list_meta(f(args.folder))
elif cmd == "purge-meta":
purge_meta(f(args.folder))
elif cmd == "publish-md":
publish_md(f(args.folder),
draft=args.draft,
no_auto_create=args.no_auto_create,
comment=args.comment,
strict=args.strict)
if __name__ == "__main__":
main()
1. 排查重名图片
(halo-py) xiamu@xiamudeMacBook-Air Hexo % python halo_batch.py find-dup-images ./source/_posts
🔍 共扫描 350 张图片
✅ 未发现重名图片
如果脚本检测到同名文件,请先手动改名;否则上传到 Halo 时系统会自动附加后缀,可能导致引用地址与预期不符。
2. 收集图片
(halo-py) xiamu@xiamudeMacBook-Air Hexo % python halo_batch.py collect-images ./source/_posts --dst ./all_imgs
100%|███████████████████████████████████████| 569/569 [00:00<00:00, 3210.56it/s]
✅ 已复制 350 张图片 → all_imgs
脚本会把所有图片复制到 all_imgs
目录,方便一次性上传。我的附件策略实际存储路径为 ~/.halo2/attachments/upload/images
。封面图库(gallery)同理,上传至 ~/.halo2/attachments/upload/gallery
。
3. 修正 Markdown
(halo-py) xiamu@xiamudeMacBook-Air Hexo % python halo_batch.py fix-md ./source/_posts --out ./fix_md
100%|███████████████████████████████████████| 120/120 [00:00<00:00, 1525.92it/s]
✅ 已处理 120 篇,修改 119 篇,输出 → fix_md
(halo-py) xiamu@xiamudeMacBook-Air Hexo %
4. 预览分类与标签
(halo-py) xiamu@xiamudeMacBook-Air Hexo % python halo_batch.py list-meta ./fix_md
Categories:
• Docker
• Elasticsearch
• Front-End
5. 批量发布
在脚本同目录新建.env
,放入以下属性
HALO_URL=【你的halo站点ip+port】
HALO_TOKEN=【你的halo令牌,最好全权限】
随后对指定目录的md进行发布
(halo-py) xiamu@xiamudeMacBook-Air Hexo % python halo_batch.py publish-md ./fix_md/2021-H2
0%| | 0/1 [00:00<?, ?it/s]➕ 新建 cat: 其他
➕ 新建 tag: c++
➕ 新建 tag: 排序算法
✅ 发布 2021-12-13-20-36-12.md
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 2.31it/s]
(halo-py) xiamu@xiamudeMacBook-Air Hexo %