How to Create a Public Optimization Benchmark

Benchmarks are standardized test suites that enable fair comparison of optimization algorithms. This guide shows you how to create high-quality benchmarks that the community can use to evaluate and compare their algorithms.

What is an Optimization Benchmark?

A benchmark is a collection of:

  • Problem Instances: Standardized test problems
  • Evaluation Metrics: Consistent measurement criteria
  • Reference Solutions: Known optimal or best-known solutions
  • Evaluation Protocol: Rules for fair comparison

Prerequisites

Before creating a benchmark:

  1. Domain Expertise: Deep understanding of the optimization problem
  2. Problem Collection: Diverse set of problem instances
  3. Reference Solutions: Known optimal or high-quality solutions
  4. GitHub Login: Sign in with GitHub
  5. Uploaded Repositories: Upload problem definitions
  6. Platform Access: Access to benchmark creation features

Planning Your Benchmark

1. Define Benchmark Scope

benchmark_specification = {
    "name": "TSP Benchmark Suite 2024",
    "domain": "Traveling Salesman Problem",
    "problem_types": ["symmetric", "asymmetric", "euclidean", "geographic"],
    "size_range": {"min": 10, "max": 1000},
    "instance_count": 50,
    "difficulty_levels": ["easy", "medium", "hard", "extreme"],
    "evaluation_metrics": ["solution_quality", "runtime", "convergence_rate"]
}

2. Collect Problem Instances

Instance Sources

  • Literature: Classic problems from research papers
  • Real-world: Practical applications and datasets
  • Generated: Systematically created test cases
  • Community: Contributed by other researchers

Instance Diversity

instance_categories = {
    "size_distribution": {
        "small": {"range": "10-50 nodes", "count": 15},
        "medium": {"range": "51-200 nodes", "count": 20}, 
        "large": {"range": "201-500 nodes", "count": 10},
        "xlarge": {"range": "501-1000 nodes", "count": 5}
    },
    "structure_types": {
        "random": "Randomly generated coordinates",
        "clustered": "Nodes grouped in clusters", 
        "grid": "Regular grid layout",
        "real_world": "Actual city coordinates"
    }
}

Creating Benchmark Components

1. Problem Repository Structure

tsp_benchmark_2024/
├── problems/
│   ├── small/
│   │   ├── tsp_10_random_01.json
│   │   ├── tsp_20_clustered_01.json
│   │   └── ...
│   ├── medium/
│   │   ├── tsp_100_grid_01.json
│   │   └── ...
│   └── large/
├── solutions/
│   ├── optimal/
│   │   ├── tsp_10_random_01_optimal.json
│   │   └── ...
│   └── best_known/
├── config.json
├── benchmark_spec.json
└── README.md

2. Problem Definition Format

{
  "name": "tsp_berlin52",
  "type": "tsp_symmetric",
  "description": "52 locations in Berlin (Groetschel)",
  "size": 52,
  "optimal_value": 7542,
  "source": "TSPLIB",
  "coordinates": [
    {"id": 1, "x": 565.0, "y": 575.0},
    {"id": 2, "x": 25.0, "y": 185.0},
    ...
  ],
  "metadata": {
    "difficulty": "medium",
    "structure": "real_world",
    "tags": ["classic", "tsplib", "geographic"]
  }
}

3. Solution Format

{
  "problem_name": "tsp_berlin52",
  "solution_type": "optimal",
  "objective_value": 7542,
  "tour": [1, 49, 32, 45, 19, 41, 8, 9, 10, 43, 33, 51, 11, 52, 14, 13, 47, 26, 27, 28, 12, 25, 4, 6, 15, 5, 24, 48, 38, 37, 40, 39, 36, 35, 34, 44, 46, 16, 29, 50, 20, 23, 30, 2, 7, 42, 21, 17, 3, 18, 31, 22],
  "verification": {
    "verified": true,
    "method": "exact_solver",
    "solver": "Concorde",
    "computation_time": 1.23
  }
}

Current Benchmark Access

1. Benchmark Page

Visit rastion.com/benchmark to access:

Protected Access

  • Authentication Required: Must be logged in to access benchmarks
  • User Permissions: Benchmark creation may require specific permissions
  • Community Features: View and interact with existing benchmarks

Benchmark Interface

  • Browse Benchmarks: View available benchmark suites
  • Performance Metrics: See algorithm performance on standard problems
  • Comparison Tools: Compare different algorithms side-by-side
  • Download Options: Access benchmark data and results

2. Integration with Leaderboards

Benchmark-Leaderboard Connection

Benchmarks often integrate with the leaderboard system:

  • Standardized Problems: Benchmarks provide problems for leaderboard competitions
  • Fair Evaluation: Consistent evaluation criteria across submissions
  • Performance Tracking: Long-term performance monitoring
  • Community Engagement: Encourage participation in benchmark challenges

Creating Leaderboard Problems

If you have admin access, you can create leaderboard problems:

  1. Access Admin Interface: Use admin-only problem creation modal
  2. Repository Selection: Choose from your uploaded repositories
  3. Problem Configuration: Set name and optimization direction (min/max)
  4. Simplified Setup: Minimal fields for quick problem creation
  5. Automatic Integration: Problems automatically appear in leaderboard

3. Repository-Based Benchmarks

Upload Benchmark Repositories

Create benchmarks by uploading comprehensive problem repositories:

import qubots.rastion as rastion

# Authenticate
rastion.authenticate("your_rastion_token")

# Upload benchmark repository
benchmark_url = rastion.upload_model_from_path(
    path="./tsp_benchmark_suite",
    repository_name="tsp_benchmark_2024",
    description="Comprehensive TSP benchmark with diverse instances",
    tags=["benchmark", "tsp", "evaluation"],
    private=False
)

Repository Structure for Benchmarks

tsp_benchmark_suite/
├── problems/
│   ├── small_instances/
│   ├── medium_instances/
│   └── large_instances/
├── solutions/
│   ├── optimal_solutions/
│   └── best_known_solutions/
├── config.json
├── benchmark_spec.json
└── README.md

Evaluation Protocol Design

1. Performance Metrics

evaluation_metrics = {
    "solution_quality": {
        "gap_to_optimal": "(solution_value - optimal_value) / optimal_value * 100",
        "success_rate": "percentage of runs finding optimal solution",
        "best_value": "best objective value found",
        "average_value": "average objective value across runs"
    },
    "efficiency": {
        "runtime": "total execution time",
        "iterations": "number of algorithm iterations",
        "evaluations": "number of solution evaluations"
    },
    "robustness": {
        "standard_deviation": "consistency across multiple runs",
        "convergence_rate": "speed of convergence to good solutions"
    }
}

2. Evaluation Rules

evaluation_protocol = {
    "execution_environment": {
        "cpu_limit": "4 cores",
        "memory_limit": "8GB",
        "time_limit": 300,  # seconds
        "platform": "Rastion Playground"
    },
    "statistical_requirements": {
        "min_runs": 10,
        "confidence_level": 0.95,
        "significance_test": "Mann-Whitney U"
    },
    "submission_rules": {
        "algorithm_description": "required",
        "parameter_settings": "must be documented",
        "reproducibility": "code must be available"
    }
}

Quality Assurance

1. Instance Validation

def validate_benchmark_instance(instance):
    checks = {
        "format_valid": validate_json_format(instance),
        "solution_exists": check_reference_solution(instance),
        "solution_verified": verify_optimal_solution(instance),
        "metadata_complete": check_required_metadata(instance),
        "difficulty_appropriate": assess_difficulty_level(instance)
    }
    return all(checks.values()), checks

# Validate all instances
validation_results = []
for instance in benchmark_instances:
    is_valid, details = validate_benchmark_instance(instance)
    validation_results.append({
        "instance": instance["name"],
        "valid": is_valid,
        "details": details
    })

2. Reference Solution Verification

def verify_reference_solutions():
    for problem, solution in problem_solution_pairs:
        # Verify solution feasibility
        assert is_feasible_solution(problem, solution["tour"])
        
        # Verify objective value calculation
        calculated_value = calculate_objective(problem, solution["tour"])
        assert abs(calculated_value - solution["objective_value"]) < 1e-6
        
        # Verify optimality claim (if applicable)
        if solution["solution_type"] == "optimal":
            assert verify_optimality(problem, solution)

Benchmark Management

1. Benchmark Maintenance

Version Control

  • Repository Updates: Update benchmark repositories with new instances
  • Version Tracking: Maintain compatibility across benchmark versions
  • Documentation: Keep comprehensive documentation updated
  • Community Feedback: Incorporate community suggestions and improvements

Quality Assurance

# Validate benchmark instances
def validate_benchmark_instance(instance):
    checks = {
        "format_valid": validate_json_format(instance),
        "solution_exists": check_reference_solution(instance),
        "solution_verified": verify_optimal_solution(instance),
        "metadata_complete": check_required_metadata(instance)
    }
    return all(checks.values()), checks

2. Community Integration

Sharing and Collaboration

  • Public Repositories: Make benchmark repositories publicly accessible
  • Community Contributions: Accept community-contributed problem instances
  • Collaborative Improvement: Work with community to enhance benchmarks
  • Knowledge Sharing: Document best practices and lessons learned

Integration with Platform Features

  • Leaderboard Problems: Use benchmark instances for leaderboard competitions
  • Playground Testing: Enable easy testing of algorithms on benchmark problems
  • Experiment Templates: Provide experiment templates using benchmark problems
  • Performance Tracking: Monitor algorithm performance across benchmark instances

Maintenance and Updates

1. Version Control

# Update benchmark with new instances
benchmark.add_version("1.1.0", changes=[
    "Added 10 new large-scale instances",
    "Fixed coordinate precision in 3 instances", 
    "Updated reference solutions for 2 instances"
])

# Maintain backward compatibility
benchmark.set_compatibility({
    "1.0.0": "fully_compatible",
    "1.1.0": "current_version"
})

2. Community Feedback

  • Issue Tracking: Monitor reported problems
  • Solution Updates: Incorporate better reference solutions
  • Instance Additions: Add community-contributed instances
  • Protocol Refinements: Improve evaluation procedures

Best Practices

1. Benchmark Design

  • Diversity: Include varied problem characteristics
  • Scalability: Cover different problem sizes
  • Difficulty: Range from easy to extremely challenging
  • Relevance: Include real-world inspired instances

2. Documentation

# Benchmark Documentation Template

## Overview
- Purpose and scope
- Target algorithms
- Evaluation objectives

## Instance Description
- Problem types and characteristics
- Size distribution
- Difficulty levels
- Source and generation methods

## Evaluation Protocol
- Performance metrics
- Execution environment
- Statistical requirements
- Submission guidelines

## Reference Solutions
- Optimality verification
- Solution quality assessment
- Computational methods used

## Usage Instructions
- How to download instances
- Submission process
- Result interpretation

Getting Started with Benchmarks

1. Current Access

Explore Existing Benchmarks

  • Visit Benchmark Page: Go to rastion.com/benchmark
  • Study Examples: Learn from existing benchmark implementations
  • Understand Structure: See how successful benchmarks are organized
  • Performance Analysis: Review algorithm performance on standard benchmarks

Contribute to Leaderboards

  • Admin Access: Contact platform administrators for benchmark creation permissions
  • Problem Contribution: Contribute high-quality problem instances
  • Community Engagement: Participate in benchmark discussions and improvements

2. Future Development

Platform Evolution

The benchmark system continues to evolve with:

  • Enhanced Creation Tools: Improved interfaces for benchmark creation
  • Automated Validation: Better tools for verifying benchmark quality
  • Community Features: Enhanced collaboration and sharing capabilities
  • Integration Improvements: Better integration with leaderboards and experiments

Next Steps

Benchmarks are essential for advancing optimization research. By contributing high-quality problem instances and participating in benchmark development, you help create valuable resources for the entire optimization community!