{"id":264162,"date":"2023-11-30T12:52:01","date_gmt":"2023-11-30T17:52:01","guid":{"rendered":"https:\/\/sftarticles.wpenginepowered.com\/en\/?p=264162"},"modified":"2025-07-01T17:30:42","modified_gmt":"2025-07-02T00:30:42","slug":"amazon-thinks-a-human-touch-to-ai-is-necessary","status":"publish","type":"post","link":"https:\/\/cms-articles.softonic.io\/en\/amazon-thinks-a-human-touch-to-ai-is-necessary\/","title":{"rendered":"Amazon thinks a human touch to AI is necessary"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Amazon is introducing a groundbreaking tool, <a href=\"https:\/\/aws.amazon.com\/tr\/blogs\/aws\/evaluate-compare-and-select-the-best-foundation-models-for-your-use-case-in-amazon-bedrock-preview\/\" target=\"_blank\" rel=\"noopener\" title=\"\">Model Evaluation on Bedrock<\/a>, aiming to transform the evaluation process for AI models. Revealed during the recent <a href=\"https:\/\/en.softonic.com\/articles\/amazon-q-ai\" target=\"_blank\" rel=\"noopener\" title=\"\">AWS re: Invent<\/a> conference, the tool addresses the challenge of accurately selecting models for specific projects, preventing developers from using models that may not meet accuracy requirements or are too large for their needs.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Inside Amazon&#8217;s Bedrock Model Evaluation Revolution<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The tool consists of two components: automated evaluation and human evaluation. In the automated version, developers can assess a model&#8217;s performance on metrics like robustness and accuracy, covering tasks such as summarization, text classification, question and answer, and text generation. Bedrock includes popular third-party AI models, enhancing the variety of choices available.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"591\" src=\"https:\/\/articles-img.sftcdn.net\/sft\/articles\/auto-mapping-folder\/sites\/3\/2023\/11\/aws--1024x591.jpg\" alt=\"\" class=\"wp-image-264163\" srcset=\"https:\/\/articles-img.sftcdn.net\/auto-mapping-folder\/sites\/3\/2023\/11\/aws--1024x591.jpg 1024w, https:\/\/articles-img.sftcdn.net\/auto-mapping-folder\/sites\/3\/2023\/11\/aws--300x173.jpg 300w, https:\/\/articles-img.sftcdn.net\/auto-mapping-folder\/sites\/3\/2023\/11\/aws--768x443.jpg 768w, https:\/\/articles-img.sftcdn.net\/auto-mapping-folder\/sites\/3\/2023\/11\/aws--150x87.jpg 150w, https:\/\/articles-img.sftcdn.net\/auto-mapping-folder\/sites\/3\/2023\/11\/aws-.jpg 1200w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">(<a href=\"https:\/\/aws.amazon.com\/tr\/blogs\/aws\/top-announcements-of-aws-reinvent-2023\/\" target=\"_blank\" rel=\"noopener\" title=\"\">Credit<\/a>)<\/figcaption><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">AWS provides standard test datasets, but developers can also bring their own data into the benchmarking platform, offering a more realistic evaluation. The system generates a comprehensive report, shedding light on the model&#8217;s strengths and weaknesses.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Human benchmarking<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">For human evaluation, users can collaborate with AWS&#8217;s team or use their own resources, specifying task type, evaluation metrics, and preferred datasets. This human touch allows for insights that automated systems may miss, such as empathy or friendliness.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Importantly, Amazon recognizes the diverse needs of developers and doesn&#8217;t mandate all customers to benchmark models. This flexibility is particularly beneficial for developers familiar with Bedrock&#8217;s foundation models or those with a clear understanding of their preferences.<\/p>\n\n\n<div class=\"sc-card-program\">\r\n  <div class=\"sc-card-program__body\">\r\n    <div class=\"sc-card-program__row clearfix\">\r\n      <div class=\"sc-card-program__col-logo\">\r\n        <img decoding=\"async\" class=\"sc-card-program__img\" alt=\"AWS Console\" src=\"https:\/\/images.sftcdn.net\/images\/t_app-icon-m\/p\/5b632178-51a9-11e7-abbd-e84135ddb12b\/9042986\/aws-console-logo\" width=\"100px\" height=\"100px\">\r\n      <\/div>\r\n      <div class=\"sc-card-program__col-title\">\r\n        <span class=\"sc-card-program__title\">AWS Console<\/span>\r\n        <a class=\"sc-card-program__button sc-card-program-internal\" href=\"https:\/\/aws-console.en.softonic.com\/android\" target=\"_self\" rel=\"noopener noreferrer\">Download<\/a>\r\n      <\/div>\r\n      <div class=\"sc-card-program__col-rating\">\r\n        <svg class=\"rating-score__content\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" version=\"1.1\" x=\"0\" y=\"0\" viewbox=\"0 0 50 50\" enable-background=\"new 0 0 50 50\" xml:space=\"preserve\"><path class=\"rating-score__background rating-score--good\" fill=\"none\" stroke-width=\"6\" stroke-miterlimit=\"10\" d=\"M40 40c8.3-8.3 8.3-21.7 0-30s-21.7-8.3-30 0 -8.3 21.7 0 30\"><\/path><path class=\"rating-score__value rating-score__value--0\" fill=\"none\" stroke-width=\"6\" stroke-dashoffset=\"0\" stroke-miterlimit=\"10\" d=\"M40 40c8.3-8.3 8.3-21.7 0-30s-21.7-8.3-30 0 -8.3 21.7 0 30\"><\/path><text class=\"rating-score__number\" content=\"\" text-anchor=\"middle\" transform=\"matrix(1 0 0 1 25 31.0837)\" data-auto=\"app-user-score\"><\/text><\/svg>\r\n      <\/div>\r\n    <\/div>\r\n    <div class=\"sc-card-program__row\">\r\n      <span class=\"sc-card-program__description\"><\/span>\r\n    <\/div>\r\n    <div class=\"sc-card-program__row\">\r\n      <img decoding=\"async\" class=\"sc-card-program__bigpic\" src=\"\" onerror=\"this.style.display='none'\">\r\n    <\/div>\r\n    <a class=\"sc-card-program__link track-link sc-card-program-internal\" href=\"https:\/\/aws-console.en.softonic.com\/android\" target=\"_self\" rel=\"noopener noreferrer\"><\/a>\r\n  <\/div>\r\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\">During the preview phase, AWS will only charge for model inference used during evaluation, making the benchmarking service accessible. This move reflects Amazon&#8217;s commitment to facilitating responsible and effective AI practices, providing a tailored solution for companies to measure the impact of models on their projects.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In essence, Amazon&#8217;s Bedrock Model Evaluation addresses the ongoing challenge of selecting the right AI models by offering both automated and human-driven evaluations. This initiative aligns with Amazon&#8217;s commitment to empowering developers and fostering responsible AI practices in the rapidly evolving landscape of artificial intelligence.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Amazon is introducing a groundbreaking tool, Model Evaluation on Bedrock, aiming to transform the evaluation process for AI models. Revealed during the recent AWS re: Invent conference, the tool addresses the challenge of accurately selecting models for specific projects, preventing developers from using models that may not meet accuracy requirements or are too large for &hellip; <a href=\"https:\/\/cms-articles.softonic.io\/en\/amazon-thinks-a-human-touch-to-ai-is-necessary\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Amazon thinks a human touch to AI is necessary&#8221;<\/span><\/a><\/p>\n","protected":false},"author":9288,"featured_media":264164,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","wpcf-pageviews":1},"categories":[1015],"tags":[],"usertag":[],"vertical":[],"content-category":[],"class_list":["post-264162","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/posts\/264162","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/users\/9288"}],"replies":[{"embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/comments?post=264162"}],"version-history":[{"count":1,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/posts\/264162\/revisions"}],"predecessor-version":[{"id":316034,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/posts\/264162\/revisions\/316034"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/media\/264164"}],"wp:attachment":[{"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/media?parent=264162"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/categories?post=264162"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/tags?post=264162"},{"taxonomy":"usertag","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/usertag?post=264162"},{"taxonomy":"vertical","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/vertical?post=264162"},{"taxonomy":"content-category","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/content-category?post=264162"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}