← Back to Projects
burmese-coding-eval
The premier evaluation suite and dataset collection for Burmese programming assistants.
The Evaluation Suite
Properly measuring the safety and accuracy of language models requires rigid benchmarks. burmese-coding-eval is a specialized multi-track framework built to test code correctness, linguistic coherence, and cultural appropriateness of AI-generated code from Burmese prompts.
Core Datasets
- burmese-mbpp: A localized, translated, and culturally aligned variant of the Mostly Basic Python Problems dataset.
- burmese-human-eval: A rigorous adaptation of the standard HumanEval logic programming tests optimized for Myanmar syntax parameters.
Impact
By standardizing how we measure AI performance in Myanmar languages, burmese-coding-eval accelerates the development and reliability of local AI coding assistants, allowing researchers to compete objectively and refine model architectures based on empirical linguistic criteria.