Spaces:

hage2000
/

code_eval_stdio

Sleeping

App Files Files Community

Kewen Zhao commited on Nov 22, 2024

Commit

54f1bd3

1 Parent(s): 7e44765

update README

Browse files

Files changed (1) hide show

README.md +26 -18

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Code Eval
 emoji: 🤗
 colorFrom: blue
 colorTo: red
@@ -20,6 +20,8 @@ description: >-
 ## Metric description
 The CodeEval metric estimates the pass@k metric for code synthesis.
 It implements the evaluation harness for the HumanEval problem solving dataset described in the paper ["Evaluating Large Language Models Trained on Code"](https://arxiv.org/abs/2107.03374).
@@ -31,7 +33,9 @@ The Code Eval metric calculates how good are predictions given a set of referenc
 `predictions`: a list of candidates to evaluate. Each candidate should be a list of strings with several code candidates to solve the problem.
-`references`: a list with a test for each prediction. Each test should evaluate the correctness of a code candidate.
 `k`: number of code candidates to consider in the evaluation. The default value is `[1, 10, 100]`.
@@ -41,10 +45,11 @@ The Code Eval metric calculates how good are predictions given a set of referenc
 ```python
 from evaluate import load
-code_eval = load("code_eval")
-test_cases = ["assert add(2,3)==5"]
-candidates = [["def add(a,b): return a*b", "def add(a, b): return a+b"]]
-pass_at_k, results = code_eval.compute(references=test_cases, predictions=candidates, k=[1, 2])
 ```
 N.B.
@@ -73,10 +78,11 @@ Full match at `k=1`:
 ```python
 from evaluate import load
-code_eval = load("code_eval")
-test_cases = ["assert add(2,3)==5"]
-candidates = [["def add(a, b): return a+b"]]
-pass_at_k, results = code_eval.compute(references=test_cases, predictions=candidates, k=[1])
 print(pass_at_k)
 {'pass@1': 1.0}
 ```
@@ -85,10 +91,11 @@ No match for k = 1:
 ```python
 from evaluate import load
-code_eval = load("code_eval")
-test_cases = ["assert add(2,3)==5"]
-candidates = [["def add(a,b): return a*b"]]
-pass_at_k, results = code_eval.compute(references=test_cases, predictions=candidates, k=[1])
 print(pass_at_k)
 {'pass@1': 0.0}
 ```
@@ -97,10 +104,11 @@ Partial match at k=1, full match at k=2:
 ```python
 from evaluate import load
-code_eval = load("code_eval")
-test_cases = ["assert add(2,3)==5"]
-candidates = [["def add(a, b): return a+b", "def add(a,b): return a*b"]]
-pass_at_k, results = code_eval.compute(references=test_cases, predictions=candidates, k=[1, 2])
 print(pass_at_k)
 {'pass@1': 0.5, 'pass@2': 1.0}
 ```

 ---
+title: Code Eval Stdio
 emoji: 🤗
 colorFrom: blue
 colorTo: red
 ## Metric description
+The stdio version of of the ["code eval"](https://huggingface.co/spaces/evaluate-metric/code_eval) metrics, which handles python programs that read inputs from STDIN and print answers to STDOUT, which is common in competitive programming (e.g. CodeForce, USACO)
 The CodeEval metric estimates the pass@k metric for code synthesis.
 It implements the evaluation harness for the HumanEval problem solving dataset described in the paper ["Evaluating Large Language Models Trained on Code"](https://arxiv.org/abs/2107.03374).
 `predictions`: a list of candidates to evaluate. Each candidate should be a list of strings with several code candidates to solve the problem.
+`references`: a list of expected output for each prediction.
+`inputs`: a list of inputs for each problem
 `k`: number of code candidates to consider in the evaluation. The default value is `[1, 10, 100]`.
 ```python
 from evaluate import load
+code_eval_stdio = load("hage2000/code_eval_stdio")
+inputs = ["2 3"]
+references = ["5"]
+candidates = [[ "nums = list(map(int, input().split()))\nprint(sum(nums))"]]
+pass_at_k, results = code_eval_stdio.compute(references=references, predictions=candidates, inputs = inputs, k=[1, 2])
 ```
 N.B.
 ```python
 from evaluate import load
+code_eval_stdio = load("hage2000/code_eval_stdio")
+inputs = ["2 3"]
+references = ["5"]
+candidates = [[ "nums = list(map(int, input().split()))\nprint(sum(nums))"]]
+pass_at_k, results = code_eval_stdio.compute(references=references, predictions=candidates, inputs = inputs, k=[1, 2])
 print(pass_at_k)
 {'pass@1': 1.0}
 ```
 ```python
 from evaluate import load
+code_eval_stdio = load("hage2000/code_eval_stdio")
+inputs = ["2 3"]
+references = ["5"]
+candidates = [[ "nums = list(map(int, input().split()))\nprint(nums[0]*nums[1])"]]
+pass_at_k, results = code_eval_stdio.compute(references=references, predictions=candidates, inputs = inputs, k=[1, 2])
 print(pass_at_k)
 {'pass@1': 0.0}
 ```
 ```python
 from evaluate import load
+code_eval_stdio = load("hage2000/code_eval_stdio")
+inputs = ["2 3"]
+references = ["5"]
+candidates = [[ "nums = list(map(int, input().split()))\nprint(sum(nums))", "nums = list(map(int, input().split()))\nprint(nums[0]*nums[1])"]]
+pass_at_k, results = code_eval_stdio.compute(references=references, predictions=candidates, inputs = inputs, k=[1, 2])
 print(pass_at_k)
 {'pass@1': 0.5, 'pass@2': 1.0}
 ```