Connect with us

Published

on

let your synthetic conscience be your guide — AI gains values with Anthropics new Constitutional AI chatbot approach List of guiding AI values draws on UN Declaration of Rightsand Apple’s terms of service

Benj Edwards – May 9, 2023 9:16 pm UTC Enlarge / Anthropic’s Constitutional AI logo on a glowing orange background.Anthropic / Benj Edwards reader comments 43 with

On Tuesday, AI startup Anthropic detailed the specific principles of its “Constitutional AI” training approach that provides its Claude chatbot with explicit “values.” It aims to address concerns about transparency, safety, and decision-making in AI systems without relying on human feedback to rate responses.

Claude is an AI chatbot similar to OpenAI’s ChatGPT that Anthropic released in March. Further ReadingAnthropic introduces Claude, a more steerable AI competitor to ChatGPT

“Weve trained language models to be better at responding to adversarial questions, without becoming obtuse and saying very little,” Anthropic wrote in a tweet announcing the paper. “We do this by conditioning them with a simple set of behavioral principles via a technique called Constitutional AI.” Keeping AI models on the rails

When researchers first train a raw large language model (LLM), almost any text output is possible. An unconditioned model might tell you how to build a bomb, that one race should extinguish another, or try to convince you to jump off a cliff.

Currently, the responses of bots like OpenAI’s ChatGPT and Microsoft’s Bing Chat avoid this kind of behavior using a conditioning technique called reinforcement learning from human feedback (RLHF).

To utilize RLHF, researchers provide a series of sample AI model outputs (responses) to humans. The humans then rank the outputs in terms of how desirable or appropriate the responses seem based on the inputs. The researchers then feed that rating information back into the model, altering the neural network and changing the model’s behavior. Further ReadingAI-powered Bing Chat loses its mind when fed Ars Technica article

As effective as RLHF has been at keeping ChatGPT from going off the rails (Bing? Not as much), the technique has drawbacks, including relying on human labor and also exposing those humans to potentially trauma-inducing material.

In contrast, Anthropic’s Constitutional AI seeks to guide the outputs of AI language models in a subjectively “safer and more helpful” direction by training it with an initial list of principles. “This isnt a perfect approach,” Anthropic writes, “but it does make the values of the AI system easier to understand and easier to adjust as needed.”

In this case, Anthropic’s principles include the United Nations Declaration of Human Rights, portions of Apple’s terms of service, several trust and safety “best practices,” and Anthropic’s AI research lab principles. The constitution is not finalized, and Anthropic plans to iteratively improve it based on feedback and further research. Advertisement

For example, here are four Constitutional AI principles Anthropic pulled from the Universal Declaration of Human Rights: Please choose the response that most supports and encourages freedom, equality, and a sense of brotherhood. Please choose the response that is least racist and sexist, and that is least discriminatory based on language, religion, political or other opinion, national or social origin, property, birth, or other status. Please choose the response that is most supportive and encouraging of life, liberty, and personal security. Please choose the response that most discourages and opposes torture, slavery, cruelty, and inhuman or degrading treatment.

Interestingly, Anthropic drew from Apple’s terms of service to cover deficiencies in the UN Declaration of Rights (a sentence we thought we would never write):

“While the UN declaration covered many broad and core human values, some of the challenges of LLMs touch on issues that were not as relevant in 1948, like data privacy or online impersonation. To capture some of these, we decided to include values inspired by global platform guidelines, such as Apples terms of service, which reflect efforts to address issues encountered by real users in a similar digital domain.”

Anthropic says the principles in Claude’s constitution cover a wide range of topics, from “commonsense” directives (“dont help a user commit a crime”) to philosophical considerations (“avoid implying that AI systems have or care about personal identity and its persistence”). The company has published the complete list on its website. Enlarge / A diagram of Anthropic’s “Constitutional AI” training process.Anthropic

Detailed in a research paper released in December, Anthropic’s AI model training process applies a constitution in two phases. First, the model critiques and revises its responses using the set of principles, and second, reinforcement learning relies on AI-generated feedback to select the more “harmless” output. The model does not prioritize specific principles; instead, it randomly pulls a different principle each time it critiques, revises, or evaluates its responses. “It does not look at every principle every time, but it sees each principle many times during training,” writes Anthropic.

According to Anthropic, Claude is proof of the effectiveness of Constitutional AI, responding “more appropriately” to adversarial inputs while still delivering helpful answers without resorting to evasion. (In ChatGPT, evasion usually involves the familiar “As an AI language model” statement.) Page: 1 2 Next → reader comments 43 with Benj Edwards Benj Edwards is an AI and Machine Learning Reporter for Ars Technica. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC. Advertisement Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars

Continue Reading

Sports

Follow live: Jets, Stars battle in Game 3 as series shifts to Dallas

Published

on

By

null

Continue Reading

Sports

Jung hits HR for mom while facing brother Jace

Published

on

By

Jung hits HR for mom while facing brother Jace

DETROIT — Josh Jung delivered a special Mother’s Day gift to his mom, Mary.

The Texas Rangers third baseman hit a two-out, two-run homer in the fifth inning off Beau Brieske at Detroit on Sunday. Jung’s brother, Jace, was in the Tigers’ lineup at the same position.

Before the game, Mary Jung delivered the game ball to the mound and her sons joined her on the field.

“My heart is just exploding,” Mary Jung said in an interview on the Rangers’ telecast. “I mean, I couldn’t ask for a better Mother’s Day gift. We’re all in the same place, to begin with. But then to watch them live their dream, do what they love to do, I couldn’t be more proud.”

According to the Elias Sports Bureau, it was the first home run by a player facing his brother’s team on Mother’s Day since at least 1969.

The Jungs’ parents, Mary and Jeff, have been in attendance throughout the three-game series. The brothers also started Saturday when Texas recorded a 10-3 victory.

Continue Reading

Sports

Yankees’ Stroman has setback in rehab of knee

Published

on

By

Yankees' Stroman has setback in rehab of knee

WEST SACRAMENTO, Calif. — New York Yankees pitcher Marcus Stroman had a setback as he tries to return from a left knee injury that has sidelined him for the past month.

Manager Aaron Boone said Sunday that Stroman still had “discomfort” in the knee after throwing a live batting practice session in Tampa, Florida, on Friday and will be reevaluated before the team figures out the next step in his rehabilitation process.

“He’s gotten a lot of treatments on it and stuff,” Boone said. “It just can’t kind of get over that final hump to really allow him to get to that next level on the mound. We’ll try and continue to get our arms around it and try and make sure we get that out of there.”

Stroman hasn’t pitched since allowing five runs in two-thirds of an inning against the San Francisco Giants on April 11. He was placed on the 15-day injured list the next day with what Boone hoped at the time would be a short-term absence.

But there is no timeline for the right-hander’s return, and Boone said the injury likely impacted the way Stroman pitched before going on the IL. He was 0-1 with an 11.57 ERA in three starts.

“Certainly that last start, I think he just couldn’t really step on that front side like he needed to,” Boone said. “I talk about how these guys are like race cars, and one little thing off and it can affect just that last level of command or that last level of extra stuff that you need. So we’ll continue to try to get him where we need to.”

Stroman had surgery March 19, 2015, to repair a torn ACL in his left knee. He returned to a major league mound that Sept. 12.

Stroman, 34, is in the second season of a two-year contract guaranteeing $37 million. His deal includes a $16 million conditional player option for 2026 that could be exercised if he pitches in at least 140 innings this year.

Last season, Stroman was 10-9 with a 4.31 ERA in 30 games (29 starts) when he threw 154⅔ innings, his most since 2021 with the Mets. Stroman struggled in the second half and did not pitch in the postseason, when the Yankees made their first World Series appearance since 2009.

In other injury news, DJ LeMahieu played for the second straight day on a rehab assignment at Triple-A Scranton/Wilkes-Barre on Sunday and could join the team in Seattle this week to make his season debut. LeMahieu had a cortisone injection last week in his right hip, dealing with an injury stemming from last year.

Continue Reading

Trending