Kimi-Dev-72B

Highlights

1. Kimi-Dev-72B achieves 60.4% performance on SWE-bench Verified. It surpasses the runner-up, setting a new state-of-the-art result among open-source models.

2. Kimi-Dev-72B is optimized via large-scale reinforcement learning. It autonomously patches real repositories in Docker and gains rewards only when the entire test suite passes. This ensures correct and robust solutions, aligning with real-world development standards.

3. Kimi-Dev-72B is available for download and deployment on Hugging Face and GitHub. We welcome developers and researchers to explore its capabilities and contribute to development.

Performance of Open-source Models on SWE-bench Verified.

Performance of Closed-source Models on SWE-bench Verified.

Kimi-Dev-72B

We introduce the design philosophy and technical details of Kimi-Dev-72B, including the duo of BugFixer and TestWriter, mid-training, reinforcement learning, and test-time self-play.

Duo of BugFixer and TestWriter

A successful patch that fixes a bug should pass the unit tests that accurately reflect the bug. Meanwhile, a successful test that reproduces the bug should raise an assertion error, and pass when the correct bugfix patch is applied to the repository. This leads to the complementary roles of BugFixer and TestWriter, and a strong enough coding LLM should excel at both.

BugFixer and TestWriter share a similar routine: either of them first finds the right file to edit, then edits the correct code updates, whether rectifying fragile implementations or inserting unittest functions. Therefore, for both roles, Kimi-Dev-72B adopts the same minimal framework that contains only two stages: (1) File Localization and (2) Code Edits. The duo design of BugFixer and TestWriter lays the foundation of Kimi-Dev-72B.

Mid-training

To enhance the Kimi-Dev-72B's prior as both a BugFixer and a TestWriter, we perform mid-training with ~150B tokens in high-quality and real-world data. With the Qwen 2.5-72B base model as a starting point, we gather millions of GitHub issues and PR commits as its mid-training dataset. The data recipe is carefully constructed to enable Kimi-Dev-72B to learn how human developers reason with GitHub issues, craft code fixes, and write unit tests. We also conducted strict data decontamination to exclude any repository from SWE-bench Verified. Mid-training sufficiently enhances the base model's knowledge of practical bug fixes and unit tests, making the model a better starting point for later RL training.

Reinforcement Learning

With proper mid-training and SFT, Kimi-Dev-72B achieves strong performance in File Localization. Therefore, our RL stage focuses on improving its capability of Code Edits. We use a policy optimization method described in Kimi k1.5, which has demonstrated outstanding results in reasoning tasks. For SWE-bench Verified, we highlight the following 3 key designs:

Outcome-based Reward Only. We use only the final execution result from Docker (0 or 1) as the reward, without any format-based or process rewards during training.

Efficient Prompt Set. We filter out prompts where the model achieves a zero success rate under multi-sample evaluation, enabling more effective use of large batch sizes. We apply curriculum learning, where new prompts are introduced to gradually increase task difficulty.

Positive Example Reinforcement. We include recent successful samples from previous iterations in the current batch in the final stage of training. This helps the model enhance successful patterns and improve performance.

Kimi-Dev-72B greatly benefits from training over a scalable number of issue resolution tasks, using the highly parallel, robust, and efficient internal agent infrastructure.

RL-training Scaling on SWE-bench Verified.

Test-time Scaling on SWE-bench Verified.

Test-time Self-Play.

After RL, Kimi-Dev-72B masters the roles of both a BugFixer and a TestWriter. During test time, it adopts a self-play mechanism to coordinate its bug-fixing and test-writing abilities.

Test-time Self-Play between BugFixer and TestWriter.

With up to 40 patch candidates and up to 40 test candidates generated per issue (following the standard Agentless setting), a scaling effect is observed for test-time self-play.

Open Access

Kimi-Dev-72B is released to the community for further research and development. Key resources include: Model Weight, Source Code, and Technical Report (coming soon).

As an open project, its advancement benefits from community involvement. We hope that developers and organizations will explore, integrate, and expand the model's applications.

What's Next

We are actively researching and developing ways to extend Kimi-Dev-72B's capabilities and explore more complex software engineering tasks. Future iterations will focus on deeper integrations with popular Integrated Development Environments (IDEs), version control systems, and CI/CD pipelines, making Kimi-Dev-72B even more seamless within a developer's workflow. Constant effort would be paid for Kimi-Dev-72B's endless improvement, careful red-teaming, and releasing stronger models to the community.

Introducing Kimi-Dev:

A Strong and Open-source Coding LLM for Issue Resolution

Highlights

Kimi-Dev-72B

Duo of BugFixer and TestWriter

Mid-training

Reinforcement Learning

Test-time Self-Play.

Open Access

What's Next