AI & Open Source

AI & Open Source Daily — 1 Juli 2026: MiMo Code, DeepSpec, GLiGuard, Leanstral, Rocky

Speculative decoding framework dari DeepSeek, coding agent Xiaomi yang beat Claude Code, guardrail 300M yang 16x lebih cepat, formal proof model dari Mistral, dan Rust SQL engine dengan type-checking.

Muhammad Amien·Jul 1, 2026·6 min read

AI & Open Source Daily — 1 Juli 2026: MiMo Code, DeepSpec, GLiGuard, Leanstral, Rocky

Hari ini panas banget — dua rilis dari tim besar yang bikin pipeline AI production makin murah, plus satu model guardrail 300M parameter yang ngalahin model 90x lebih besar. Dan semua open source.

DeepSeek open-source framework speculative decoding mereka. Xiaomi bikin coding agent yang beat Claude Code di task 200+ steps. GLiGuard buktiin bahwa encoder architecture bisa gantiin decoder untuk safety moderation. Dan Rocky bawa type-checking ke SQL pipeline — kayak TypeScript untuk data.

MiMo Code — Xiaomi's Coding Agent dengan Cross-Session Memory

Dari XiaomiMiMo/MiMo-Code — terminal-based AI coding agent yang dibangun di atas OpenCode, MIT licensed, dan udah dapet 11,151 stars dalam 21 hari. HN thread-nya 557 points.

Inovasi utamanya bukan di model-nya — tapi di memory system. MiMo Code punya checkpoint-writer subagent yang jalan paralel sama main agent, ngekstrak structured state ke SQLite FTS5 di 20%/45%/70% context utilization. Artinya agent gak akan amnesia di step 150, 200, atau berapapun.

Bandingin sama Claude Code atau Codex CLI yang mulai lupa decision sebelumnya pas context window makin penuh. MiMo Code punya empat layer memory: session checkpoint, project MEMORY.md, global memory, dan full SQLite history. Plus /dream command tiap 7 hari buat deduplikasi, /distill tiap 30 hari buat ekstrak workflow pattern.

Benchmarks-nya impresif: 82% di SWE-bench Verified (Claude Code 79%), 62% di SWE-bench Pro (GPT-5.5 cuma 58.6%), dan 73% di Terminal Bench 2. Double-blind A/B test sama 576 developer — win rate MiMo Code di atas 65% setelah 200 steps.

Model-agnostic juga. Lo bisa pake DeepSeek, Kimi, GLM, atau any OpenAI-compatible API. Coba liat setup-nya:

Status: Rilis — open source (MIT).

DeepSpec — DeepSeek Open-Source Speculative Decoding Framework

Dari deepseek-ai/DeepSpec — full-stack codebase buat training dan evaluasi draft model untuk speculative decoding. Rilis 26 Juni 2026, udah 5,450 stars dalam 5 hari. HN thread 793 points — salah satu thread terpanas tahun ini.

Speculative decoding itu teknik buat ngurangin LLM inference latency tanpa ngurangin quality. Intinya: draft model kecil prediksi multiple tokens, target model verifikasi paralel. Hasilnya: throughput naik 2-3x. DeepSpec unify tiga algoritma dalam satu framework: DSpark (baru, proprietary DeepSeek), DFlash, dan Eagle3.

Pipelinenya tiga stage. Data preparation — download prompt, regenerate target answers, build cache. Training — train draft model against cached output, distributed across 8 GPU. Evaluation — measure acceptance rate di benchmark gsm8k, math500, aime25, humaneval, mbpp.

Yang bikin ini serious: cache data untuk Qwen3-4B default setting butuh ~38 TB storage. Ini bukan toy project — ini infrastruktur riset skala production. DSpark sendiri adalah arsitektur draft model yang predict multiple future tokens, beda dari Eagle3 yang predict token-by-token.

Bedanya sama SpecForge yang cuma implement Eagle3, atau DFlash yang standalone repo — DeepSpec unify tiga algoritma dengan standardized evaluation dan released checkpoints di HuggingFace. Ini pertama kalinya major lab open-source seluruh training pipeline untuk speculative decoding.

Status: Rilis — open source.

GLiGuard — Guardrail 300M Parameter, 16x Lebih Cepat dari 27B Competitor

GLiGuard dari Fastino Labs — 300 juta parameter encoder-based model buat content moderation dan safety classification. Apache 2.0. Yang gila: model ini ngalahin model 90x lebih besar di benchmark safety.

Bedanya sama semua kompetitor: GLiGuard pake encoder architecture, bukan decoder. LlamaGuard4 (12B), ShieldGemma (27B), Qwen3Guard-8B — semuanya decoder-based, generate safety verdict autoregressively satu token per waktu. GLiGuard process seluruh input dan semua task label dalam satu forward pass.

Hasilnya: 26ms per request vs 426ms untuk ShieldGemma-27B. 16x lebih cepat. Dan nambah dimensi safety baru gak nambah latency — lo cuma nambahin label di input. Empat task dalam satu pass: safety classification, jailbreak detection (11 strategies), harm category (14 categories), refusal detection.

Guardrail model evaluasi setiap input dan output di production — latency compounded linearly sama panjang conversation. Di 26ms, real-time safety moderation jadi practical at scale. Dan 300M parameter artinya jalan di satu GPU. Coba liat kodenya:

vs LlamaGuard4 (12B): GLiGuard outperform di prompt classification padahal 40x lebih kecil. vs ShieldGemma (27B): outperform, 90x lebih kecil. Paradigm shift — dari text generation ke classification problem.

Status: Rilis — open source (Apache 2.0).

Leanstral 1.5 — Mistral 119B MoE untuk Formal Proof Engineering

Mistral rilis Leanstral 1.5 — 119B total parameter, 6.5B active per inference, 256K context window. Model ini purpose-built buat automated theorem proving dan autoformalization — translate natural language math ke Lean 4 formal proof. HN thread 154 points.

Formal verification itu cara ngebuahin software atau math itu correct dengan mathematical certainty. Tapi harus nulis di bahasa khusus kayak Lean 4 — skill barrier yang bikin ini niche banget. Kalau AI bisa autoformalize natural language proof ke Lean 4, ini bisa democratize formal verification buat critical systems: verified compilers, cryptographic protocols, consensus algorithm.

MoE architecture-nya cerdas. 119B total parameter tapi cuma 6.5B active per token — inference cost setara model dense 6.5B, tapi capacity 119B. 256K context window cukup buat process entire Lean 4 theory file plus dependency chain. Ekonomis untuk iterative proof search — lo bisa jalanin ribuan proof attempt murah.

Bedanya sama GPT-5.5 atau Claude Sonnet 5 yang bisa attempt Lean 4 sebagai side task — Leanstral purpose-built. Dan AlphaProof (DeepMind) pake reinforcement learning over Lean, sementara Leanstral pake supervised fine-tuning di proof data. Lebih predictable, lebih murah buat iterate.

Status: Beta — accessible via labs-leanstral-1-5 endpoint di Mistral API.

Rocky — Rust SQL Engine dengan Type-Check untuk Data Pipeline

Dari rocky-data/rocky — SQL transformation engine di Rust yang type-check seluruh pipeline lo sebelum jalan. Apache 2.0, 268 stars, HN 122 points. Bekerja sama Databricks, Snowflake, BigQuery, dan DuckDB.

Silent failure di data pipeline itu mahal. Source column type berubah, downstream model break. Column direname, tiga model berhenti kerja. Query jalan di dev tapi fail di prod. Rocky catch semua ini di check time, sebelum apapun jalan.

Yang menarik — Rocky jalan sebagai language server (LSP). Lo liat type mismatch dan broken reference real-time pas nulis SQL, bukan setelah CI jalan. Column-level lineage track dari source ke final table. lineage-diff generate PR-ready output yang nunjukin persis table dan column mana yang affected tiap change.

vs dbt: dbt basic compilation tapi gak full type-checking across pipeline. Rocky generate structured PR output, dbt butuh tool terpisah. vs SQLMesh: Rocky Rust-native (fast compilation), SQLMesh Python-based. Rocky lebih baru (April 2026) tapi LSP integration-nya lebih deep.

Status: Rilis — open source (Apache 2.0). Databricks adapter GA, Snowflake/BigQuery Beta.