-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathprompt_template_experiment.json
235 lines (234 loc) · 57.4 KB
/
prompt_template_experiment.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
{
"task_description": {
"base_geval": "You will be given one summary written for a news article.\n\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 1 (worst) to 5 (best).\n\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.",
"base_summeval": "In this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:",
"role_expert_geval": "I want you to act as an expert human annotator. Your role is to critically evaluate the provided a news article summary.\n You will be given one summary written for a news article.\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 1 (worst) to 5 (best).\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.",
"role_expert_summeval": "I want you to act as an expert human annotator. Your role is to critically evaluate the provided a news article summary. In this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:",
"role_expert2_summeval": "You read and summarize a lot of news articles, and you're an expert at summarizing news articles.\nIn this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:",
"role_professor_summeval": "Your role is a professor who looks at students' summaries and does a lot of evaluating.\nIn this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:",
"role_linguist_summeval" : "Your role is that of a linguist studying what makes a good summary.\nIn this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:",
"role_child1_geval": "Please evaluate the summary quality as if English is not your strong suit. You will be given one summary written for a news article.\n\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 1 (worst) to 5 (best).\n\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.",
"role_child_geval": "Please evaluate the summary like a student who ranks last in the English language subject in the class.\n\nYou will be given one summary written for a news article.\n\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 1 (worst) to 5 (best).\n\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.",
"role_child_summeval": "Please evaluate the summary as if your native language is not English, you are not familiar with English sentence structures, and you may not summarize effectively. In this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:",
"short_summeval": "Evaluate the news article summary quality.",
"mid_summeval": "Please follow the steps below to evaluate the quality of a summary written for a news article.",
"long_summeval": "In this task, you will evaluate the quality of a summary written for a news article. Please take your time to carefully evaluate the provided summary, and don't hesitate to refer back to this instruction document if you need clarification or guidance at any point during your evaluation. To correctly solve this task, follow these steps:"
},
"aspect": {
"consistency": {
"summeval": "Consistency:\nThe rating measures whether the facts in the summary are consistent with the facts in the original article.\nConsider whether the summary does reproduce all facts accurately and does not make up untrue information.",
"geval": "Consistency - the factual alignment between the summary and the summarized source. A factually consistent summary contains only statements that are entailed by the source document. Annotators were also asked to penalize summaries that contained hallucinated facts.",
"gptscore": "Consistency - Is the generated text consistent in the information it provides?",
"gptscore_new": "Consistency - How uniformly does the generated summary present its information without contradictions?",
"summeval_new": "Consistency:\nThis rating evaluates the alignment of facts in the summary with those in the original article.\nEnsure the summary maintains factual accuracy and refrains from introducing any misleading or false details.",
"geval_new": "Consistency - measures the factual alignment of the summary with its original article. Ensure every fact in the summary directly reflects the source content. Deduct points for any unfounded or misleading statements. To assist, cross-reference summary points with the source for verification.",
"gptscore_new_no_ref": "Consistency - The extent to which the generated text uniformly presents and maintains its informational content without contradictions.",
"gptscore_new_summeval_ref": "Consistency: This rating evaluates the alignment of details in the summary with the information presented in the source text.",
"gptscore_new_geval_ref": "Consistency - the degree of factual congruence between the summary and its originating text. A consistent summary accurately mirrors the source without introducing unverified or fabricated details. Summaries deviating from the source's truth should be scored lower."
},
"fluency": {
"summeval": "Fluency:\nThis rating measures the quality of individual sentences, are they well-written and grammatically correct.\nConsider the quality of individual sentences.",
"geval": "Fluency : the quality of the summary in terms of grammar, spelling, punctuation, word choice, and sentence structure.",
"gptscore": "Fluency - Is the generated text well-written and grammatical?",
"gptscore_new": "Fluency - How smoothly and grammatically correct is the generated summary in its expression?",
"summeval_new": "Fluency:\nThis rating evaluates the clarity and grammatical integrity of each sentence in the summary.\nExamine each sentence for its structural soundness and linguistic clarity.",
"geval_new": "Fluency - assesses the summary's linguistic quality, encompassing grammar, spelling, punctuation, vocabulary, and sentence construction.\n1: Poor. Frequent errors hinder comprehension or give an unnatural feel.\n2: Fair. Some errors present, but overall meaning remains clear.\n3: Good. Nearly error-free and reads smoothly.",
"gptscore_new_no_ref": "Fluency - The quality of the generated text in terms of its adherence to grammatical rules and its ability to convey ideas without awkwardness or interruption in flow.",
"gptscore_new_summeval_ref": "Fluency: This rating assesses the linguistic proficiency and grammatical accuracy of the summary's individual sentences.",
"gptscore_new_geval_ref": "Fluency - evaluates the summary's adherence to linguistic norms, encompassing grammar, punctuation, vocabulary selection, and overall sentence construction."
},
"relevance": {
"summeval": "Relevance:\nThe rating measures how well the summary captures the key points of the article.\nConsider whether all and only the important aspects are contained in the summary.",
"geval": "Relevance - selection of important content from the source. The summary should include only important information from the source document. Annotators were instructed to penalize summaries which contained redundancies and excess information.",
"gptscore": "Relevance - How well is the generated text relevant to its source text?",
"gptscore_new": "Relevance - To what extent does the generated summary capture and reflect the core details of its source text?",
"summeval_new": "Relevance:\nThis rating assesses the extent to which the summary highlights the central themes of the original article.\nEvaluate if the summary encompasses the crucial elements while omitting any non-essential details.",
"geval_new": "Relevance - gauges the summary's alignment with the article's primary ideas. Check if the summary includes essential points and omits unrelated details. It may help to list the article's main points and verify their presence in the summary.",
"gptscore_new_no_ref": "Relevance - The extent to which the generated text accurately reflects and pertains to the primary themes and context of the source text.",
"gptscore_new_summeval_ref": "Relevance: This rating gauges the alignment of the summary with the central themes and elements of the source text.",
"gptscore_new_geval_ref": "Relevance - gauges the alignment of the summary's content with the significance of the original text. The summary should prioritize and capture pivotal details from the source. Summaries that introduce superfluous data or redundant points should receive a lower relevance score."
},
"coherence": {
"summeval": "Coherence:\nThe rating measures the quality of all sentences collectively, to the fit together and sound naturally.\nConsider the quality of the summary as a whole.",
"geval": "Coherence - the collective quality of all sentences. We align this dimension with the DUC quality question of structure and coherence whereby \"the summary should be well-structured and well-organized. The summary should not just be a heap of related information, but should build from sentence to a coherent body of information about a topic.\"",
"gptscore": "Coherence - How much does the generated text make sense?",
"gptscore_new": "Coherence - To what extent does the generated summary display a logical flow and connectivity of ideas?",
"summeval_new": "Coherence:\nThis rating evaluates how seamlessly the sentences of the summary flow together, creating a unified whole.\nAssess how smoothly the content transitions from one point to the next, ensuring it reads as a cohesive unit.",
"geval_new": "Coherence - evaluates the structured flow and organization of sentences in the summary. Assess the summary's progression from one sentence to the next, ensuring it forms a unified narrative about the topic. A coherent summary doesn't just gather related facts, but seamlessly connects them.",
"gptscore_new_no_ref": "Coherence - The degree to which the generated text presents ideas in a logically connected and comprehensible manner.",
"gptscore_new_summeval_ref": "Coherence: This rating evaluates the logical flow and interconnectedness of the summary's sentences, ensuring a unified and natural progression.",
"gptscore_new_geval_ref": "Coherence - evaluates the logical flow and structure of the summary, ensuring it progresses from sentence to sentence to form a unified, comprehensible narrative on the topic."
},
"readability": {
"gpt4_v1": "Readability: Readability pertains to how easily the summary can be read and understood by the intended audience. It considers aspects like sentence structure, language complexity, and organization of content, which contribute to a smooth reading experience.",
"gptscore": "Readability - To what extent does the generated summary capture and reflect the core details of its source text?",
"gptscore_new": "Readability - To what extent does the generated summary capture and reflect the core details of its source text?",
"summeval_new": "Readability:\nThis rating evaluates the ease with which a reader can understand and follow the content of the summary.\nAssess the summary for clarity, structure, and simplicity, ensuring it is accessible and straightforward for a diverse audience to comprehend.",
"geval_new": "Readability - measures the ease with which a summary can be understood and processed by readers. A summary with high readability will not only be grammatically correct but will also present its content in an organized and easily digestible manner. Summaries that demand excessive re-reading or leave readers puzzled should be rated lower on this aspect."
},
"factuality": {
"gpt4_v1": "Factuality: Factuality refers to the accuracy and truthfulness of the information presented in the summary. A summary should not distort or misrepresent the facts from the original material, and it should adhere to the truth without incorporating misleading or false information.",
"gptscore": "Factuality - Does the generated text preserve the factual statements of the source text?",
"gptscore_new": "Factuality - How accurately does the generated summary retain the factual content from its source text?",
"summeval_new": "Factuality:\nThis rating gauges the accuracy and truthfulness of the information presented in the summary compared to the original article.\nScrutinize the summary to ensure it presents facts without distortion or misrepresentation, staying true to the source content's details and intent.",
"geval_new": "Factuality - evaluates the extent to which the summary's statements align truthfully and accurately with the original source. Assess the summary's claims directly with the source material. A factual summary will contain information that is both accurate and free from exaggeration, omission, or misrepresentation.",
"gptscore_new_no_ref": "Factuality - The degree to which the generated text maintains the accuracy and truthfulness of the source text's factual content.",
"gptscore_new_summeval_ref": "Factuality: This rating measures how accurately the summary reflects the factual statements of the source text.",
"gptscore_new_geval_ref": "Factuality - measures how closely the summary adheres to the verifiable truths of the original content. Compare the summary's assertions against the source for accuracy. A factually sound summary should mirror the source without embellishments, omissions, or distortions."
},
"informativeness": {
"gpt4_v1": "Informativeness: This aspect assesses the extent to which the summary provides valuable, insightful, and substantial information to the reader. It evaluates if the summary is successful in conveying the necessary information in a concise manner, without including irrelevant or superficial details.",
"gptscore": "Informativeness - How well does the generated text capture the key ideas of its source text?",
"gptscore_new": "Informativeness - To what extent does the summarized text encompass the essential points from the original content?",
"summeval_new": "Informativeness:\nThis rating assesses the degree to which the summary provides essential and relevant details from the original article.\nEvaluate the summary for its coverage of key information and its exclusion of superfluous details. Ensure it effectively communicates the main points of the original content.",
"geval_new": "Informativeness - gauges the extent to which the summary provides essential details and insights from the original source, ensuring that the reader is well-informed. A highly informative summary should make a reader feel well-informed about the central theme of the source material without feeling overwhelmed by unnecessary information. If significant details from the source are omitted, or if the summary introduces irrelevant points, the score for informativeness should be lowered.",
"gptscore_new_no_ref": "Informativeness - The measure of how effectively the generated text encompasses the central concepts and messages from the source text.",
"gptscore_new_summeval_ref": "Informativeness: This rating gauges the extent to which the summary encapsulates the primary concepts of the source text.",
"gptscore_new_geval_ref": "Informativeness - assesses the efficiency with which the summary encapsulates the core concepts from the original content. An optimal summary conveys pivotal information, granting the reader clarity on the primary theme without excess or irrelevant details. Omissions of key facts or the addition of unrelated elements should reduce the informativeness score."
}
},
"evaluation_step": {
"fluency": {
"geval": ""
},
"consistency": {
"geval": "1. Read the news article carefully and identify the main facts and details it presents.\n2. Read the summary and compare it to the article. Check if the summary contains any factual errors that are not supported by the article.\n3. Assign a score for consistency based on the Evaluation Criteria."
},
"relevance": {
"geval": "1. Read the summary and the source document carefully.\n2. Compare the summary to the source document and identify the main points of the article.\n3. Assess how well the summary covers the main points of the article, and how much irrelevant or redundant information it contains.\n4. Assign a relevance score from 1 to 5."
},
"coherence": {
"geval": "1. Read the news article carefully and identify the main topic and key points.\n2. Read the summary and compare it to the news article. Check if the summary covers the main topic and key points of the news article, and if it presents them in a clear and logical order.\n3. Assign a score for coherence on a scale of 1 to 5, where 1 is the lowest and 5 is the highest based on the Evaluation Criteria."
},
"readability": {
"geval": "1. Read the summary without referencing the original source, noting any sections that are difficult to understand or ambiguous.\n2. Examine the summary for grammatical correctness, including sentence structure, punctuation, and spelling.\n3. Assess the structure and organization of the summary, ensuring that it flows logically and that transitions between points are smooth.\n4. Consider the choice of vocabulary. Identify any words or phrases that could be deemed too complex or jargon-heavy for a general audience.\n5. Assign a readability score from 1 to 5 based on the summary's clarity, organization, and grammar."
},
"factuality": {
"geval": "1. Begin by reading the original source document to grasp its main content and significant details.\n2. Read the provided summary, taking note of its claims and key statements.\n3. Directly compare each statement in the summary with the original source to check its accuracy and truthfulness.\n4. Identify any instances where the summary may exaggerate, omit, or misrepresent the source's information.\n5. Assign a factuality score from 1 to 5, based on the accuracy and truthfulness of the summary in relation to the source material."
},
"informativeness": {
"geval": "1. Begin by thoroughly reading the original source document to comprehend its main themes and essential details.\n2. Read the provided summary, making note of the details and insights it emphasizes.\n3. Compare the contents of the summary with the original source, identifying any significant details that might have been omitted.\n4. Examine the summary for any unnecessary or irrelevant information not present in the original source.\n5. Assign an informativeness score from 1 to 5, considering the depth of details, insights, and the overall representation of the source's central theme."
}
},
"example": {},
"experiment": {
"zero_shot": {
"base": {
"single_aspect": {
"five": {
"train": "{u_prompt}\nScore the summarization with respect to the summarized document on a continuous scale from 1 to 5, where a score of 1 means \"irrelevant, not coherent, not fluent, and inconsistent\" and score of 5 means \"relevant, coherent, fluent, consistent\".\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"test": "{u_prompt}\nScore the summarization with respect to the summarized document on a continuous scale from 1 to 5, where a score of 1 means \"irrelevant, factually incorrect and not readable\" and score of 5 means \"relevant, factually correct, good readability\".\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"all": "{u_prompt}\nScore the summarization with respect to the summarized document on a continuous scale from 1 to 5, where a score of 1 means \"irrelevant, not coherent, not fluent, inconsistent, factually incorrect and not readable\" and score of 5 means \"relevant, coherent, fluent, consistent, factually correct, good readability\".\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}"
},
"hundred": {
"train": "{u_prompt}\nScore the summarization with respect to the summarized document on a continuous scale from 0 to 100, where a score of 0 means \"irrelevant, not coherent, not fluent, and inconsistent\" and score of 100 means \"relevant, coherent, fluent, consistent\".\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"test": "{u_prompt}\nScore the summarization with respect to the summarized document on a continuous scale from 0 to 100, where a score of 0 means \"irrelevant, factually incorrect and not readable\" and score of 100 means \"relevant, factually correct, good readability\".\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"all": "{u_prompt}\nScore the summarization with respect to the summarized document on a continuous scale from 0 to 100, where a score of 0 means \"irrelevant, not coherent, not fluent, inconsistent, factually incorrect and not readable\" and score of 100 means \"relevant, coherent, fluent, consistent, factually correct, good readability\".\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}"
}
},
"multi_aspect": {
"five": "{u_prompt}\nScore the summarization with respect to the summarized document on a continuous scale from 1 to 5, where a score of 1 means \"{negative_aspect_verb}\" and score of 5 means \"{positive_aspect_verb}\".\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"hundred": "{u_prompt}\nScore the summarization with respect to the summarized document on a continuous scale from 0 to 100, where a score of 0 means \"{negative_aspect_verb}\" and score of 100 means \"{positive_aspect_verb}\".\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}"
}
},
"summeval": {
"single_aspect": {
"five": {
"train": "{u_prompt}\nIn this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:\n\n1. Carefully read the news article, be aware of the information it contains.\n2. Read the proposed summary.\n3. Rate each summary on a scale from 1 (worst) to 5 (best) by its coherence, relevance, consistency, fluency.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"test": "{u_prompt}\nIn this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:\n\n1. Carefully read the news article, be aware of the information it contains.\n2. Read the proposed summary.\n3. Rate each summary on a scale from 1 (worst) to 5 (best) by its relevance, factuality, readability.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"all": "{u_prompt}\nIn this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:\n\n1. Carefully read the news article, be aware of the information it contains.\n2. Read the proposed summary.\n3. Rate each summary on a scale from 1 (worst) to 5 (best) by its coherence, relevance, consistency, fluency, factuality, readability.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}"
},
"hundred": {
"train": "{u_prompt}\nIn this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:\n\n1. Carefully read the news article, be aware of the information it contains.\n2. Read the proposed summary.\n3. Rate each summary on a scale from 0 (worst) to 100 (best) by its coherence, relevance, consistency, fluency.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"test": "{u_prompt}\nIn this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:\n\n1. Carefully read the news article, be aware of the information it contains.\n2. Read the proposed summary.\n3. Rate each summary on a scale from 0 (worst) to 100 (best) by its relevance, factuality, readability.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"all": "{u_prompt}\nIn this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:\n\n1. Carefully read the news article, be aware of the information it contains.\n2. Read the proposed summary.\n3. Rate each summary on a scale from 0 (worst) to 100 (best) by its coherence, relevance, consistency, fluency, factuality, readability.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}"
}
},
"multi_aspect": {
"five": "{u_prompt}\nIn this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:\n\n1. Carefully read the news article, be aware of the information it contains.\n2. Read the proposed summary.\n3. Rate each summary on a scale from 1 (worst) to 5 (best) by its {aspect}.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"hundred": "{u_prompt}\nIn this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:\n\n1. Carefully read the news article, be aware of the information it contains.\n2. Read the proposed summary.\n3. Rate each summary on a scale from 0 (worst) to 100 (best) by its {aspect}.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"five_definition": "{u_prompt}\nIn this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:\n\n1. Carefully read the news article, be aware of the information it contains.\n2. Read the proposed summary.\n3. Rate each summary on a scale from 1 (worst) to 5 (best) by its {aspect}.\n\n# Definition:\n{definition}\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"hundred_definition": "{u_prompt}\nIn this task you will evaluate the quality of a summary written for a news article.\nTo correctly solve this task, follow these steps:\n\n1. Carefully read the news article, be aware of the information it contains.\n2. Read the proposed summary.\n3. Rate each summary on a scale from 0 (worst) to 100 (best) by its {aspect}.\n\n# Definition:\n{definition}\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}"
}
},
"geval": {
"single_aspect": {
"five": {
"train": "{u_prompt}\nYou will be given one summary written for a news article.\n\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 1 (worst) to 5 (best) by its coherence, relevance, consistency, fluency.\n\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"test": "{u_prompt}\nYou will be given one summary written for a news article.\n\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 1 (worst) to 5 (best) by its relevance, factuality, readability.\n\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"all": "{u_prompt}\nYou will be given one summary written for a news article.\n\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 1 (worst) to 5 (best) by its coherence, relevance, consistency, fluency, factuality, readability.\n\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}"
},
"hundred": {
"train": "{u_prompt}\nYou will be given one summary written for a news article.\n\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 0 (worst) to 100 (best) by its coherence, relevance, consistency, fluency.\n\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"test": "{u_prompt}\nYou will be given one summary written for a news article.\n\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 0 (worst) to 100 (best) by its relevance, factuality, readability.\n\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"all": "{u_prompt}\nYou will be given one summary written for a news article.\n\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 0 (worst) to 100 (best) by its coherence, relevance, consistency, fluency, factuality, readability.\n\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}"
}
},
"multi_aspect": {
"five": "{u_prompt}\nYou will be given one summary written for a news article.\n\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 1 (worst) to 5 (best).\n\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n\nEvaluation Steps:\n\n{evaluation_steps}\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"hundred": "{u_prompt}\nYou will be given one summary written for a news article.\n\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 0 (worst) to 100 (best).\n\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n\nEvaluation Steps:\n\n{evaluation_steps}\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"five_definition": "{u_prompt}\nYou will be given one summary written for a news article.\n\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 1 (worst) to 5 (best).\n\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n\nEvaluation Criteria:\n\n{definition}\n\nEvaluation Steps:\n\n{evaluation_steps}\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}",
"hundred_definition": "{u_prompt}\nYou will be given one summary written for a news article.\n\nYour task is to evaluate the summary based on a specific metric, rating it on a scale from 0 (worst) to 100 (best).\n\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n\nEvaluation Criteria:\n\n{definition}\n\nEvaluation Steps:\n\n{evaluation_steps}\n----\nSource text: {source}\nSummary: {summary}\n{a_prompt}"
}
}
},
"few_shot": {
"coherence": {
"1": {
"src": "\"A man has been arrested in connection with the death of a transgender escort who was found strangled and beaten last month . Vanessa Santillan 's body was found in a \u00a3400,000 flat in Fulham , south west London , at the end of March . The 33-year-old Mexican national , who worked as a transgender escort , died as a result of injuries to the head and neck . Vanessa Santillan , 33 , was killed at the end of March . Her body was found in a flat in Fulham having been strangled and beaten to death A 23-year-old man was arrested in connection with her death but has been bailed . He has been told to return to a London police station for further questioning at a later date . Meanwhile Scotland Yard is appealing to anyone who may have had contact with Miss Santillan in the hours before her death . According to her website Miss Santillan worked in London , Paris and Miami as an escort . Police did not confirm whether her profession was central to the investigation but insisted they would do 'everything ' to solve the case . London Ambulance Service was called to a flat in Romily Court , Fulham , on March 28 at around 9.30pm . Miss Santillan was pronounced dead at the scene having suffered injuries to her head and neck . The woman had been working as a transgender escort , her website revealed . Miss Santillan is understood to have moved to London from Mexico The woman , who described herself as visiting London from Miami , was pronounced dead at the scene last month Miss Santillan spoke of her love for London and Paris online in the weeks before her death . Police are urging anyone who had contact with her in the hours before her death to come forward Detective Chief Inspector Rebecca Reeves , who leads the investigation , said : 'We want to speak to anyone who saw Vanessa on Friday or Saturday . 'We need to know why this has happened and we want help from anyone who knew her while she was in London . ' In the weeks before her death Miss Santillan took to social media to talk of her love for London . On her website she described herself as visiting from Miami in search of 'upscale ' gentlemen . Miss Santillan 's body was found when London Ambulance Service was called to an address in Fulham , south west London Sorry we are not currently accepting comments on this article .\u201d",
"hyp": "\"Vanessa Santillan worked as a transgender escort in London . She was found strangled and beaten to death on March 28 She wrote : 'My favourite city is London . I love the food , the culture , the art , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the\u201d",
"score": "1",
"explain": "The summary has significant coherence issues. While it begins by providing relevant information about Vanessa Santillan and the circumstances of her death, it quickly becomes repetitive, especially with the phrase \"the nightlife\" being mentioned multiple times consecutively without any clear context or purpose. This repetition disrupts the flow and understanding of the summary, making it less coherent as a whole. The score of 1 is appropriate given these shortcomings."
},
"5": {
"src": "\"Twice French Open champion Serena Williams said her struggle to beat Sara Errani in the Fed Cup on Sunday had been a real 'eye-opener ' as the claycourt season gets into full swing . World No 1 Williams eventually prevailed 4-6 7-6 ( 3 ) 6-3 against the dogged Italian to take her career record over her to 8-0 but the American was not impressed . The US were beaten 3-2 as Williams and Alison Riske were thrashed 6-0 6-3 in the doubles rubber by Errani and Flavia Pennetta , meaning they were relegated to World Group II . American tennis star Serena Williams fought back to beat Italian Sara Errani in the Fed Cup play-off on Sunday Tough weather conditions made it difficult for both players who had to keep on re-tossing their serves Errani gave Williams a real scare but in the end the world No 1 's power proved to be too much 'Today has been a big eye opener , ' Williams said afterwards . ' I 'm totally not as ready for the claycourt season as I thought I was . Now I 'm in the mindset of , `` You know what , I 'm not on hard court . '' I 'm playing like I 'm on hard court and I 'm not . 'So I have to play and be ready to hit a thousand shots if necessary . ' Williams , 33 , won her 19th singles grand slam at the Australian Open and her dominance has raised talk of her claiming all the majors this year . The French Open has been her least successful of the four though despite claiming the title in Paris in 2002 and 2013 . Her doubles defeat on Sunday blotted an otherwise flawless Fed Cup record and left the US facing a battle to get back amongst the elite nations next year . 'We have to work harder , ' US captain Mary Joe Fernandez said . 'We came close today and need to just keep plugging away . 'The good news is that we have a lot of players in the top 100 and , hopefully , we can get two wins next year and get back into the World Group . ' Williams congratulates Italy captain Corrado Barazzutti after competing in America 's doubles defeat\"",
"hyp": "'Serena Williams beat Sara Errani 4-6 7-6 ( 3 ) 6-3 in the Fed Cup play-off . The US were beaten 3-2 as Williams and Alison Riske were thrashed in the doubles rubber . The doubles defeat saw the US relegated to World Group II .\u2019",
"score": "5",
"explain": "The summary provides a concise and clear representation of the key events in the source text. The sentences flow logically, from Serena Williams' individual match against Sara Errani to the doubles match and its consequences for the US team. The coherence is maintained throughout the summary, with all information presented in a way that fits naturally together. Given this natural flow and representation of the main events from the source text, a score of 5 (best) is appropriate for the summary's coherence."
}
},
"relevance": {
"1": {
"src": " \"Wales ' crunch Euro 2016 qualifier with Belgium this summer has been declared a 33,000 sell-out . The top two sides in Group B meet at the Cardiff City Stadium on June 12 with Wales in their best position to qualify for a major tournament since the 1958 World Cup finals in Sweden . Belgium and Wales both have 11 points from five games with Marc Wilmots ' side - ranked fourth in the world - on top spot because of a superior goal difference . Wales ' Euro 2016 qualifier with Belgium this summer has been declared a 33,000 sell-out Gareth Bale fires homes a brilliant free-kick during Wales ' 3-0 victory over Israel in Euro 2016 qualifying Real Madrid star Bale celebrates as the victory took Wales to the top of the Group B table on goal difference But Wales ' comprehensive 3-0 victory in Israel last weekend has seen expectations rise that Chris Coleman 's charges can claim one of the two automatic qualifying spots and make it all the way to the 2016 finals in France . 'The stunning performance in Israel has created a huge buzz around Chris Coleman 's team and the FAW has been inundated with orders for tickets , ' the Football Association of Wales said in a statement on its website . 'Due to overwhelming demand , general admission tickets for the European qualifiers match between Wales and Belgium at the Cardiff City Stadium have now sold out . ' It is understood the Wales squad prefer to play at the Cardiff City Stadium rather than the Millennium Stadium There had been speculation that Wales would play Belgium at the 74,500 capacity Millennium Stadium with so much interest in the match . But that was never going to happen as UEFA rules prevent the venue being changed within 120 days of the scheduled date . It is understood Gareth Bale and company would prefer playing at the more intimate Cardiff City Stadium rather than the Millennium Stadium , where they have not played since meeting England in a Euro 2012 qualifier in March 2011 .\u201d",
"hyp": "\"The top two sides in group b meet at cardiff city stadium on june 12 . Wales ' crunch euro 2016 qualifier with belgium this summer . Wales play belgium at the 1958 world cup on june 1958 . Chris coleman 's charges can claim one of the two automatic qualifiers spots . Belgium and wales have 11 points from five games in 1958 .\"",
"score": "1",
"explain": "The summary provided is largely inaccurate and misleading. Firstly, it incorrectly states that \"Wales play Belgium at the 1958 world cup on June 1958,\" which is not mentioned in the source text. The source text merely states that Wales is in their best position to qualify since the 1958 World Cup. Secondly, the repetition of \"in 1958\" at the end is irrelevant and does not accurately reflect the current points standing of both teams. The summary does not accurately capture the key points, such as the game being a sell-out, the significance of the match in terms of qualifying, or the venue's importance. Given these discrepancies, a score of 1 (worst) is appropriate for the summary's relevance to the source text."
},
"5": {
"src": "\"Twice French Open champion Serena Williams said her struggle to beat Sara Errani in the Fed Cup on Sunday had been a real 'eye-opener ' as the claycourt season gets into full swing . World No 1 Williams eventually prevailed 4-6 7-6 ( 3 ) 6-3 against the dogged Italian to take her career record over her to 8-0 but the American was not impressed . The US were beaten 3-2 as Williams and Alison Riske were thrashed 6-0 6-3 in the doubles rubber by Errani and Flavia Pennetta , meaning they were relegated to World Group II . American tennis star Serena Williams fought back to beat Italian Sara Errani in the Fed Cup play-off on Sunday Tough weather conditions made it difficult for both players who had to keep on re-tossing their serves Errani gave Williams a real scare but in the end the world No 1 's power proved to be too much 'Today has been a big eye opener , ' Williams said afterwards . ' I 'm totally not as ready for the claycourt season as I thought I was . Now I 'm in the mindset of , `` You know what , I 'm not on hard court . '' I 'm playing like I 'm on hard court and I 'm not . 'So I have to play and be ready to hit a thousand shots if necessary . ' Williams , 33 , won her 19th singles grand slam at the Australian Open and her dominance has raised talk of her claiming all the majors this year . The French Open has been her least successful of the four though despite claiming the title in Paris in 2002 and 2013 . Her doubles defeat on Sunday blotted an otherwise flawless Fed Cup record and left the US facing a battle to get back amongst the elite nations next year . 'We have to work harder , ' US captain Mary Joe Fernandez said . 'We came close today and need to just keep plugging away . 'The good news is that we have a lot of players in the top 100 and , hopefully , we can get two wins next year and get back into the World Group . ' Williams congratulates Italy captain Corrado Barazzutti after competing in America 's doubles defeat\"",
"hyp": "'Serena Williams beat Sara Errani 4-6 7-6 ( 3 ) 6-3 in the Fed Cup play-off . The US were beaten 3-2 as Williams and Alison Riske were thrashed in the doubles rubber . The doubles defeat saw the US relegated to World Group II .\u2019",
"score": "5",
"explain": "The summary effectively captures the key points from the article. It mentions Serena Williams' challenging match against Sara Errani and her eventual victory. The summary also highlights the US team's overall defeat and its consequence \u2013 relegation to World Group II. These details are central to the main storyline of the source text, making the summary highly relevant. Thus, a score of 5 (best) is appropriate for the summary's relevance."
}
},
"consistency": {
"1": {
"src": "\"Paul Merson has restarted his row with Andros Townsend after the Tottenham midfielder was brought on with only seven minutes remaining in his team 's 0-0 draw with Burnley on Sunday . 'Just been watching the game , did you miss the coach ? # RubberDub # 7minutes , ' Merson put on Twitter . Merson initially angered Townsend for writing in his Sky Sports column that 'if Andros Townsend can get in ( the England team ) then it opens it up to anybody . ' Paul Merson had another dig at Andros Townsend after his appearance for Tottenham against Burnley Townsend was brought on in the 83rd minute for Tottenham as they drew 0-0 against Burnley Andros Townsend scores England 's equaliser in their 1-1 friendly draw with Italy in Turin on Tuesday night The former Arsenal man was proven wrong when Townsend hit a stunning equaliser for England against Italy and he duly admitted his mistake . 'It 's not as though I was watching hoping he would n't score for England , I 'm genuinely pleased for him and fair play to him \u2013 it was a great goal , ' Merson said . 'It 's just a matter of opinion , and my opinion was that he got pulled off after half an hour at Manchester United in front of Roy Hodgson , so he should n't have been in the squad . 'When I 'm wrong , I hold my hands up . I do n't have a problem with doing that - I 'll always be the first to admit when I 'm wrong . ' Townsend hit back at Merson on Twitter after scoring for England against Italy Sky Sports pundit Merson ( centre ) criticised Townsend 's call-up to the England squad last week Townsend hit back at Merson after netting for England in Turin on Wednesday , saying 'Not bad for a player that should be 'nowhere near the squad ' ay @ PaulMerse ? ' Any bad feeling between the pair seemed to have passed but Merson was unable to resist having another dig at Townsend after Tottenham drew at Turf Moor .\u201d",
"hyp": " \"Paul merson was brought on with only seven minutes remaining in his team 's 0-0 draw with burnley . Andros townsend scored the tottenham midfielder in the 89th minute . Paul merson had another dig at andros townsend after his appearance . The midfielder had been brought on to the england squad last week . Click here for all the latest arsenal news news .\u201d",
"score": "1",
"explain": "The summary contains several inaccuracies and is not consistent with the original article. First, Paul Merson is incorrectly identified as being brought on during the game; it was Andros Townsend. Second, there's no mention in the article of Townsend scoring in the 89th minute for Tottenham. Moreover, the phrase \"Andros townsend scored the tottenham midfielder\" is nonsensical. The reference to the England squad and the mention of \"Click here for all the latest arsenal news news\" are also not directly related to the main points in the article. The summary is therefore inconsistent with the original article, supporting a score of 1."
},
"5": {
"src": "\"( CNN ) Donald Sterling 's racist remarks cost him an NBA team last year . But now it 's his former female companion who has lost big . A Los Angeles judge has ordered V. Stiviano to pay back more than $ 2.6 million in gifts after Sterling 's wife sued her . In the lawsuit , Rochelle Shelly '' Sterling accused Stiviano of targeting extremely wealthy older men . She claimed Donald Sterling used the couple 's money to buy Stiviano a Ferrari , two Bentleys and a Range Rover , and that he helped her get a $ 1.8 million duplex . Who is V. Stiviano ? Stiviano countered that there was nothing wrong with Donald Sterling giving her gifts and that she never took advantage of the former Los Angeles Clippers owner , who made much of his fortune in real estate . Shelly Sterling was thrilled with the court decision Tuesday , her lawyer told CNN affiliate KABC . This is a victory for the Sterling family in recovering the $ 2,630,000 that Donald lavished on a conniving mistress , '' attorney Pierce O'Donnell said in a statement . It also sets a precedent that the injured spouse can recover damages from the recipient of these ill-begotten gifts . '' Stiviano 's gifts from Donald Sterling did n't just include uber-expensive items like luxury cars . According to the Los Angeles Times , the list also includes a $ 391 Easter bunny costume , a $ 299 two-speed blender and a $ 12 lace thong . Donald Sterling 's downfall came after an audio recording surfaced of the octogenarian arguing with Stiviano . In the tape , Sterling chastises Stiviano for posting pictures on social media of her posing with African-Americans , including basketball legend Magic Johnson . In your lousy fing Instagrams , you do n't have to have yourself with -- walking with black people , '' Sterling said in the audio first posted by TMZ . He also tells Stiviano not to bring Johnson to Clippers games and not to post photos with the Hall of Famer so Sterling 's friends can see . `` Admire him , bring him here , feed him , fk him , but do n't put ( Magic ) on an Instagram for the world to have to see so they have to call me , '' Sterling said . NBA Commissioner Adam Silver banned Sterling from the league , fined him $ 2.5 million and pushed through a charge to terminate all of his ownership rights in the franchise . Fact check : Donald Sterling 's claims vs. reality CNN 's Dottie Evans contributed to this report .\"",
"hyp": "\"Donald sterling 's racist remarks cost him an nba team last year . But now it 's his former female companion who has lost big . A judge has ordered v. stiviano to pay back more than $ 2.6 million in gifts .\u201d",
"score": "5",
"explain": "The summary accurately captures key details from the source text without introducing any incorrect or contradictory information. The information about Donald Sterling's racist remarks leading to the loss of his NBA team, the legal judgment against V. Stiviano, and the amount she is ordered to pay back are all consistent with the facts presented in the original article. As a result, a score of 5 (best) is appropriate for the summary's consistency."
}
},
"fluency": {
"1": {
"src": "\"A man has been arrested in connection with the death of a transgender escort who was found strangled and beaten last month . Vanessa Santillan 's body was found in a \u00a3400,000 flat in Fulham , south west London , at the end of March . The 33-year-old Mexican national , who worked as a transgender escort , died as a result of injuries to the head and neck . Vanessa Santillan , 33 , was killed at the end of March . Her body was found in a flat in Fulham having been strangled and beaten to death A 23-year-old man was arrested in connection with her death but has been bailed . He has been told to return to a London police station for further questioning at a later date . Meanwhile Scotland Yard is appealing to anyone who may have had contact with Miss Santillan in the hours before her death . According to her website Miss Santillan worked in London , Paris and Miami as an escort . Police did not confirm whether her profession was central to the investigation but insisted they would do 'everything ' to solve the case . London Ambulance Service was called to a flat in Romily Court , Fulham , on March 28 at around 9.30pm . Miss Santillan was pronounced dead at the scene having suffered injuries to her head and neck . The woman had been working as a transgender escort , her website revealed . Miss Santillan is understood to have moved to London from Mexico The woman , who described herself as visiting London from Miami , was pronounced dead at the scene last month Miss Santillan spoke of her love for London and Paris online in the weeks before her death . Police are urging anyone who had contact with her in the hours before her death to come forward Detective Chief Inspector Rebecca Reeves , who leads the investigation , said : 'We want to speak to anyone who saw Vanessa on Friday or Saturday . 'We need to know why this has happened and we want help from anyone who knew her while she was in London . ' In the weeks before her death Miss Santillan took to social media to talk of her love for London . On her website she described herself as visiting from Miami in search of 'upscale ' gentlemen . Miss Santillan 's body was found when London Ambulance Service was called to an address in Fulham , south west London Sorry we are not currently accepting comments on this article .\u201d",
"hyp": "\"Vanessa Santillan worked as a transgender escort in London . She was found strangled and beaten to death on March 28 She wrote : 'My favourite city is London . I love the food , the culture , the art , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the nightlife , the\u201d",
"score": "1",
"explain": "Examining the summary for fluency, the beginning sentences are well-written and grammatically correct. However, the repetitive phrase \"the nightlife\" appears numerous times in a row without any variation or context. This repetition makes the latter part of the summary not only redundant but also grammatically awkward. It detracts from the overall fluency of the summary. Given this significant issue, a score of 1 is appropriate for the summary's fluency."
},
"5": {
"src": "\"( CNN ) Donald Sterling 's racist remarks cost him an NBA team last year . But now it 's his former female companion who has lost big . A Los Angeles judge has ordered V. Stiviano to pay back more than $ 2.6 million in gifts after Sterling 's wife sued her . In the lawsuit , Rochelle Shelly '' Sterling accused Stiviano of targeting extremely wealthy older men . She claimed Donald Sterling used the couple 's money to buy Stiviano a Ferrari , two Bentleys and a Range Rover , and that he helped her get a $ 1.8 million duplex . Who is V. Stiviano ? Stiviano countered that there was nothing wrong with Donald Sterling giving her gifts and that she never took advantage of the former Los Angeles Clippers owner , who made much of his fortune in real estate . Shelly Sterling was thrilled with the court decision Tuesday , her lawyer told CNN affiliate KABC . This is a victory for the Sterling family in recovering the $ 2,630,000 that Donald lavished on a conniving mistress , '' attorney Pierce O'Donnell said in a statement . It also sets a precedent that the injured spouse can recover damages from the recipient of these ill-begotten gifts . '' Stiviano 's gifts from Donald Sterling did n't just include uber-expensive items like luxury cars . According to the Los Angeles Times , the list also includes a $ 391 Easter bunny costume , a $ 299 two-speed blender and a $ 12 lace thong . Donald Sterling 's downfall came after an audio recording surfaced of the octogenarian arguing with Stiviano . In the tape , Sterling chastises Stiviano for posting pictures on social media of her posing with African-Americans , including basketball legend Magic Johnson . In your lousy fing Instagrams , you do n't have to have yourself with -- walking with black people , '' Sterling said in the audio first posted by TMZ . He also tells Stiviano not to bring Johnson to Clippers games and not to post photos with the Hall of Famer so Sterling 's friends can see . `` Admire him , bring him here , feed him , fk him , but do n't put ( Magic ) on an Instagram for the world to have to see so they have to call me , '' Sterling said . NBA Commissioner Adam Silver banned Sterling from the league , fined him $ 2.5 million and pushed through a charge to terminate all of his ownership rights in the franchise . Fact check : Donald Sterling 's claims vs. reality CNN 's Dottie Evans contributed to this report .\"",
"hyp": "\"Donald sterling 's racist remarks cost him an nba team last year . But now it 's his former female companion who has lost big . A judge has ordered v. stiviano to pay back more than $ 2.6 million in gifts .\u201d",
"score": "5",
"explain": "The summary is composed of well-structured sentences that are grammatically correct and easy to understand. Each sentence flows naturally and presents the information clearly. There are no obvious grammatical or syntactic errors, and each sentence stands on its own while still maintaining a clear connection to the overall narrative. Therefore, a score of 5 (best) is appropriate for the summary's fluency."
}
}
}
}
}