← Back to Projects

burmese-coding-eval

The premier evaluation suite and dataset collection for Burmese programming assistants.

The Evaluation Suite

Properly measuring the safety and accuracy of language models requires rigid benchmarks. burmese-coding-eval is a specialized multi-track framework built to test code correctness, linguistic coherence, and cultural appropriateness of AI-generated code from Burmese prompts.

Core Datasets

  • burmese-mbpp: A localized, translated, and culturally aligned variant of the Mostly Basic Python Problems dataset.
  • burmese-human-eval: A rigorous adaptation of the standard HumanEval logic programming tests optimized for Myanmar syntax parameters.

Impact

By standardizing how we measure AI performance in Myanmar languages, burmese-coding-eval accelerates the development and reliability of local AI coding assistants, allowing researchers to compete objectively and refine model architectures based on empirical linguistic criteria.


Related Internal Research