AI gains ”values” with Anthropic’s new Constitutional AI chatbot approach

Published

2 years ago

May 10, 2023

admin

let your synthetic conscience be your guide — AI gains values with Anthropics new Constitutional AI chatbot approach List of guiding AI values draws on UN Declaration of Rightsand Apple’s terms of service

Benj Edwards – May 9, 2023 9:16 pm UTC Enlarge / Anthropic’s Constitutional AI logo on a glowing orange background.Anthropic / Benj Edwards reader comments 43 with

On Tuesday, AI startup Anthropic detailed the specific principles of its “Constitutional AI” training approach that provides its Claude chatbot with explicit “values.” It aims to address concerns about transparency, safety, and decision-making in AI systems without relying on human feedback to rate responses.

Claude is an AI chatbot similar to OpenAI’s ChatGPT that Anthropic released in March. Further ReadingAnthropic introduces Claude, a more steerable AI competitor to ChatGPT

“Weve trained language models to be better at responding to adversarial questions, without becoming obtuse and saying very little,” Anthropic wrote in a tweet announcing the paper. “We do this by conditioning them with a simple set of behavioral principles via a technique called Constitutional AI.” Keeping AI models on the rails

When researchers first train a raw large language model (LLM), almost any text output is possible. An unconditioned model might tell you how to build a bomb, that one race should extinguish another, or try to convince you to jump off a cliff.

Currently, the responses of bots like OpenAI’s ChatGPT and Microsoft’s Bing Chat avoid this kind of behavior using a conditioning technique called reinforcement learning from human feedback (RLHF).

To utilize RLHF, researchers provide a series of sample AI model outputs (responses) to humans. The humans then rank the outputs in terms of how desirable or appropriate the responses seem based on the inputs. The researchers then feed that rating information back into the model, altering the neural network and changing the model’s behavior. Further ReadingAI-powered Bing Chat loses its mind when fed Ars Technica article

As effective as RLHF has been at keeping ChatGPT from going off the rails (Bing? Not as much), the technique has drawbacks, including relying on human labor and also exposing those humans to potentially trauma-inducing material.

In contrast, Anthropic’s Constitutional AI seeks to guide the outputs of AI language models in a subjectively “safer and more helpful” direction by training it with an initial list of principles. “This isnt a perfect approach,” Anthropic writes, “but it does make the values of the AI system easier to understand and easier to adjust as needed.”

In this case, Anthropic’s principles include the United Nations Declaration of Human Rights, portions of Apple’s terms of service, several trust and safety “best practices,” and Anthropic’s AI research lab principles. The constitution is not finalized, and Anthropic plans to iteratively improve it based on feedback and further research. Advertisement

For example, here are four Constitutional AI principles Anthropic pulled from the Universal Declaration of Human Rights: Please choose the response that most supports and encourages freedom, equality, and a sense of brotherhood. Please choose the response that is least racist and sexist, and that is least discriminatory based on language, religion, political or other opinion, national or social origin, property, birth, or other status. Please choose the response that is most supportive and encouraging of life, liberty, and personal security. Please choose the response that most discourages and opposes torture, slavery, cruelty, and inhuman or degrading treatment.

Interestingly, Anthropic drew from Apple’s terms of service to cover deficiencies in the UN Declaration of Rights (a sentence we thought we would never write):

“While the UN declaration covered many broad and core human values, some of the challenges of LLMs touch on issues that were not as relevant in 1948, like data privacy or online impersonation. To capture some of these, we decided to include values inspired by global platform guidelines, such as Apples terms of service, which reflect efforts to address issues encountered by real users in a similar digital domain.”

Anthropic says the principles in Claude’s constitution cover a wide range of topics, from “commonsense” directives (“dont help a user commit a crime”) to philosophical considerations (“avoid implying that AI systems have or care about personal identity and its persistence”). The company has published the complete list on its website. Enlarge / A diagram of Anthropic’s “Constitutional AI” training process.Anthropic

Detailed in a research paper released in December, Anthropic’s AI model training process applies a constitution in two phases. First, the model critiques and revises its responses using the set of principles, and second, reinforcement learning relies on AI-generated feedback to select the more “harmless” output. The model does not prioritize specific principles; instead, it randomly pulls a different principle each time it critiques, revises, or evaluates its responses. “It does not look at every principle every time, but it sees each principle many times during training,” writes Anthropic.

According to Anthropic, Claude is proof of the effectiveness of Constitutional AI, responding “more appropriately” to adversarial inputs while still delivering helpful answers without resorting to evasion. (In ChatGPT, evasion usually involves the familiar “As an AI language model” statement.) Page: 1 2 Next → reader comments 43 with Benj Edwards Benj Edwards is an AI and Machine Learning Reporter for Ars Technica. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC. Advertisement Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars

Sports

Cincinnati delivers 1st loss to No. 14 Iowa State

Published

3 hours ago

October 4, 2025

admin

Cincinnati delivers 1st loss to No. 14 Iowa State

ESPN News Services

Oct 4, 2025, 04:44 PM ET

CINCINNATI — Brendan Sorsby passed for 214 yards and two touchdowns, Evan Pryor ran for 111 yards and two TDs and Cincinnati used a 17-point first quarter to beat No. 14 Iowa State 38-30 on Saturday.

The Bearcats (4-1, 2-0 Big 12) beat a ranked opponent at home for the first time since beating No. 16 Houston 35-20 on Dec. 4, 2021.

The Cyclones (5-1, 2-1) trailed 31-7 with 1:08 left in the second quarter before rallying to get within eight with 1:56 left in the game. Cincinnati recovered an onside kick to end the threat.

“It’s a different team,” Bearcats coach Scott Satterfield said, simply, when asked the difference between last year’s 5-7 team and this year’s roster. “It’s different players.”

Rocco Becht passed for 314 yards and two touchdowns and ran another two in for the Cyclones.

Sorsby’s 82-yard touchdown pass to Caleb Goodie in the fourth quarter was the Bearcats’ longest pass play since 2015.

Iowa State, one of the least penalized teams in the country, had five penalties for 35 yards in the first half. The Cyclones jumped offside on third down to extend the Bearcats’ opening drive, which led to a 30-yard TD run from Pryor for the game’s first score.

The Cyclones went on to take a 17-0 lead at the end of the first quarter. Becht got the Cyclones on the board early in the second on a 14-yard run.

Becht scored on a 4-yard run on the final play of the half and then threw an 11-yard TD pass to Brett Eskildsen on the opening drive in the third quarter.

“Rocco Becht is a dang warrior. You keep looking up and he continues to make plays,” Bearcats coach Scott Satterfield said. “That is a huge win for us as we went toe-to-toe with one of the best teams in the Big 12 over the last few seasons.”

The Cyclones were without 16 injured players, including all-Big 12 defensive backs Jeremiah Cooper and Jontez Williams. They also were without their top two kickers.

The Associated Press contributed to this report.

Sports

Former coach Fisher makes tearful return to FSU

Published

3 hours ago

October 4, 2025

admin

Former coach Fisher makes tearful return to FSU

Associated Press

Oct 4, 2025, 02:50 PM ET

TALLAHASSEE, Fla. — Jimbo Fisher was brought to tears while returning to Florida State‘s campus for the first time since resigning to take the Texas A&M coaching job in 2017.

Fisher, now an ACC Network analyst, was wildly cheered at the start of the network’s pregame show outside Doak Campbell Stadium. He turned in his chair, did the tomahawk chop to the crowd of garnet-clad fans and started to cry.

“Brings tears to my eyes,” Fisher said. “Remember your family growing up here and hearing that chant. When you heard it, something to it.

“The players, the memories. It’s Miami week.”

Fisher moved back to Tallahassee after Texas A&M fired him in 2023. But he hadn’t stepped foot on campus until his job brought him back.

Fisher coached at Florida State for 10 years (2007-17), first as an offensive coordinator and then as head-coach-in-waiting before taking over for legend Bobby Bowden in January 2010. He won a national title in 2013 in the middle of a three-year run of capturing ACC championships.

He was hired in July as an analyst with ACC Network.

“I always loved Florida State,” Fisher said Friday while meeting with reporters. “Florida State was home. It’s very surreal. I got butterflies. The antsy in your stomach of coming back because it meant so much to you.”

Fisher predicted Florida State would beat Miami on a “wide middle” field goal attempt.

Sports

Navy rides record day from WR Heidenreich to win

Published

3 hours ago

October 4, 2025

admin

Navy rides record day from WR Heidenreich to win

Associated Press

Oct 4, 2025, 04:12 PM ET

ANNAPOLIS, Md. — Blake Horvath threw three touchdown passes to Eli Heidenreich, who set a pair of Navy records, and the Midshipmen outlasted Air Force 34-31 on Saturday.

The victory gives Navy (5-0) a leg up on holding on to the Commander-in-Chief’s trophy, awarded to the winner of the round-robin between the Navy, Air Force and Army service academies.

Horvath was 20-of-26 passing for a career-high 339 yards and added another 130 yards and a touchdown on 17 carries. Heidenreich, who came in with five catches this season, set a Navy record with 243 receiving yards on eight receptions including 19-, 80- and 60-yard touchdowns, giving him a program record 14 in his career.

On a day filled with big-play offense, it was Nathan Kirkwood‘s field goal with 6:47 remaining that gave Navy the lead. That was followed by a deflected pitch recovered by the Midshipmen at midfield, allowing them to run out the clock.

Liam Szarka was 11-of-19 passing for 212 yards and two touchdowns and ran for a career-high 152 yards and two scores on 25 carries for the Falcons (1-4). Bruin Fleischmann had six catches for a career-high 166 yards and a score.

Two Heidenreich TD catches gave Navy a 17-10 halftime lead. Air Force came back three times to tie, including 31-all on Jonah Dawson‘s first career catch, a 53-yard touchdown.