Skip to content

Add a --json mode to evals #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 5, 2025
Merged

Add a --json mode to evals #55

merged 6 commits into from
Jun 5, 2025

Conversation

sgoedecke
Copy link
Collaborator

@sgoedecke sgoedecke commented Jun 5, 2025

Adds a --json flag to gh models eval so it can output machine-readable results:

@sgoedecke ➜ /workspaces/gh-models (sgoedecke/evals-json-mode) $ ./gh-models eval --json fixtures/test_single_evaluator.yml
{
  "name": "Test Single Evaluator",
  "description": "Testing a single built-in evaluator",
  "model": "openai/gpt-4o",
  "testResults": [
    {
      "testCase": {
        "expected": "Machine learning is a subset of artificial intelligence that enables computers to learn and make decisions from data without being explicitly programmed.",
        "input": "What is machine learning?"
      },
      "modelResponse": "Machine learning is a subset of **artificial intelligence (AI)**  ...  making it a crucial technology in modern AI-driven applications.",
      "evaluationResults": [
        {
          "evaluatorName": "fluency-test",
          "score": 1,
          "passed": true,
          "details": "LLM evaluation matched choice: '5'"
        }
      ]
    }
  ],
  "summary": {
    "totalTests": 1,
    "passedTests": 1,
    "failedTests": 0,
    "passRate": 100
  }
}

Also adopts some refactors suggested in #54 (extracting some functions and adding a docs link)

@sgoedecke sgoedecke marked this pull request as ready for review June 5, 2025 01:12
@Copilot Copilot AI review requested due to automatic review settings June 5, 2025 01:12
@sgoedecke sgoedecke requested a review from a team as a code owner June 5, 2025 01:12
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new machine-readable output mode (via --json) to the gh models eval command and refactors chat‐completion helpers into the prompt package.

  • Introduce --json flag to emit structured JSON summary instead of human output.
  • Extract GetAzureChatMessageRole and BuildChatCompletionOptions into pkg/prompt/prompt.go for reuse.
  • Update CLI handlers, tests, and documentation to support JSON mode.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/prompt/prompt.go Added role‐parsing and option‐building helpers; missing import needed.
cmd/run/run.go Refactored message‐role switch to use new prompt helper.
cmd/eval/eval.go Implemented --json flag, JSON serialization, and refactored model call.
cmd/eval/eval_test.go Added tests covering JSON output format and CLI behavior.
README.md Documented gh models eval usage, including JSON flag.
Comments suppressed due to low confidence (1)

pkg/prompt/prompt.go:9

  • The new GetAzureChatMessageRole function uses fmt.Errorf but fmt is not imported; add import "fmt" to the import block.
github.com/github/gh-models/internal/azuremodels

Copy link

@johan-j johan-j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job on the json support 🚢

@sgoedecke sgoedecke merged commit d7631e6 into main Jun 5, 2025
5 checks passed
@sgoedecke sgoedecke deleted the sgoedecke/evals-json-mode branch June 5, 2025 01:52
}

// BuildChatCompletionOptions creates a ChatCompletionOptions with the file's model and parameters
func (f *File) BuildChatCompletionOptions(messages []azuremodels.ChatMessage) azuremodels.ChatCompletionOptions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the new functions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants