A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
-
Updated
Jun 12, 2024
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
Python SDK for running evaluations on LLM generated responses
Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
Programming Language Selector based on language metadata and user-specified values.
The official evaluation suite and dynamic data release for MixEval.
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
☁️ 🚀 📊 📈 Evaluating state of the art in AI
🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
LangSmith Client SDK Implementations
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Documentation for langsmith
A fairly robust mathematics parsing engine for C++ projects.
A task generation and model evaluation system.
Python client for Kolena's machine learning testing platform
The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.
CyclOps for clinical ML evaluation & monitoring workshop
Add a description, image, and links to the evaluation topic page so that developers can more easily learn about it.
To associate your repository with the evaluation topic, visit your repo's landing page and select "manage topics."