Makora — Automatically unlock peak GPU performance

Products

Try for free

Products

Models

Pricing

Blog

Try for free

Our products

MakoraInference

MakoraGenerate

RESOURCES

Docs

CASE STUDIES

Code Translation

Performance Optimization

COMPANY

Try for free

Our products

MakoraInference

MakoraGenerate

RESOURCES

Docs

CASE STUDIES

Code Translation

Performance Optimization

COMPANY

Try for free

Blog Posts

The latest findings, insights, and publications from the Makora team.

Makora's AI Performance Engineering Manifesto

Modern tools have permanently changed performance engineering for the better

Jun 29, 2026

One Data Type is Not All You Need for 4-bit Quantization

MixFP4 is an extension to NVFP4 that improves accuracy with no additional memory cost

Jun 22, 2026

Open-sourcing 600,000 Triton kernels via Hugging Face

triton-gpu-latency is a dataset with 600,000 Triton kernels with full evaluation results

Jun 5, 2026

Hierarchical SMC-SD: Composing Speculative Decoding Techniques

Achieving 2x performance improvement by combining Eagle3 and SMC-SD

May 27, 2026

Maximizing Intelligence per Second: Fast Inference Endpoints for Agentic Systems

Makora inference endpoints are built for AI agents that need to think, respond, and act in real time.

May 12, 2026

Agentic Memory Management for GPU Code Generation

A post contributed to the AI-Driven Research for Systems (ADRS) blog series from the Berkeley Sky Computing Lab

Apr 20, 2026

SMC-SD: The Fastest GPU-based LLM Inference in the World

Sequential Monte Carlo Speculative Decoding enables new highs for tokens per second per user

Apr 16, 2026

AI-Generated FP8 GEMMs on AMD MI355X

Writing fast FP8 GEMMs just got faster

Mar 6, 2026

Introducing the MakoraGenerate CLI

Everyone's favourite kernel generation agent, now in your CLI!

Feb 18, 2026

Code generated by MakoraGenerate is wrong... or brilliant?

The code was correct. The problem wasn't.

A cinematic illustration of a powerful shark training inside a futuristic “RL GYM” (Reinforcement Learning Gym). The shark is anthropomorphic, muscular but sleek, wearing high-tech workout gear with glowing circuit patterns. It is lifting massive GPU-shaped weights labeled “H100,” “Triton,” and “KernelBench.” Around the gym are holographic screens displaying GPU kernel code, performance graphs, speedup metrics (2.12×), and neural network diagrams. Floating UI elements show reward signals, compile checks, and benchmarking results. The walls have neon signs reading “Optimize,” “Benchmark,” and “No Reward Hacking.”

Feb 12, 2026

We RL'd GPT-5 to Write Better Kernels

Pushing frontier model capabilities with reinforcement learning

Jan 15, 2026

Discovery & Mitigation of Reward Hacks in Automated Kernel Optimization

A systematic study of reward hacking, adversarial detection, and robust evaluation for LLM-optimized GPU kernels

Dec 16, 2025

Fast LLM-Generated Kimi Delta Attention Kernels

MakoraGenerate implements functional and fast KDA kernels with evolutionary search

Dec 3, 2025

Mako is now Makora

Same team. Same mission. Two new letters.

Sep 18, 2025

From Optimizing Kernels to Optimizing Benchmarks

Creating a representative subset of KernelBench to evaluate a long-running agent more efficiently

Aug 12, 2025

We Raised $8.5M to Make Peak GPU Performance Universally Accessible

Announcing Makora's seed round

Aug 6, 2025

MakoraGenerate Achieves 1.83x Performance over torch.compile on DeepSeek MOE Kernels

MakoraGenerate outperforms torch.compile when optimizing DeepSeek MOE Kernels

Jul 29, 2025

How MakoraGenerate Leverages PTX and Tensor Cores for Fast Matrix Multiplication

MakoraGenerate writes inline PTX to achieve near-optimial GEMM performance

Jul 22, 2025

15x Faster CUDA Kernel Compilation for MakoraGenerate

Optimizing the kernel generation pipeline through accelerated compilation

Jun 25, 2025

Introducing MakoraGenerate: AI-Powered GPU Kernel Generation in Under 60 Seconds

MakoraGenerate is an LLM-powered AI agent that writes GPU kernels

May 29, 2025

Unlocking AI Model Performance with Makora on Microsoft Azure

Makora improves the performance of vLLM and SGLang

Apr 2, 2025

Kernels Together Strong 🦧 Improving Performance using Multiple Kernel Providers

Achieve state-of-the-art latency on FLUX.1-schnell by leveraging multiple executor backends

Jan 29, 2025

1-Click deploy models on AMD MI300X

Easily deploy models on Makora

Oct 29, 2024

GPU go brrrrr, but at what cost?

Identifying the most price efficient AI inference accelerators

Try MAKORA for free

Try for free

Talk to an engineer

Try MAKORA for free

Try for free

Talk to an engineer

Try MAKORA for free

Try for free

Talk to an engineer

Try MAKORA for free

Try for free

Talk to an engineer

Join our Discord

Join our Discord

Products

MakoraGenerate

MakoraInference

Resources

Blog

Status

company

About

Careers

Legal

Cookie Policy

DPA

Join our Discord

Products

MakoraGenerate

MakoraInference

Resources

Blog

Status

company

About

Careers

Legal

Cookie Policy

DPA

Join our Discord

Products

MakoraGenerate

MakoraInference

Resources

Blog

Status

company

About

Careers

Legal

Cookie Policy

DPA