HomeTechnology

Artificial intelligence black boxes just got a little less mysterious

Views: 177
0 0
Read Time:3 Minute, 52 Second
Artificial intelligence black boxes just got a little less mysterious

The fact that nobody, not even the developers of the most advanced artificial intelligence (AI) systems, truly understands how they operate is among the stranger and more unsettling aspects of these systems.

This is so because, unlike traditional computer programmes, large language models—the kind of AI systems that underpin ChatGPT and other well-known chatbots—are not meticulously crafted by human programmers.
Instead, these systems essentially learn by themselves. They do this by taking in vast amounts of data, recognising patterns and relationships in language, and using that information to forecast the words that will come after one another in a sequence.
This method of creating AI systems has the drawback that it makes it challenging to reverse-engineer them or find specific bugs in the code to address issues. Currently, a chatbot will reply, “Tokyo,” if a user types in “Which American city has the best food?” The reason the model made that mistake and the reason the next person asking might get a different response are both genuinely beyond our power to comprehend.
Furthermore, there is no clear explanation for why large language models behave badly or deviate from expectations. One of the main reasons some researchers worry that powerful AI systems might one day endanger humanity is the inscrutability of large language models.

Ultimately, how will we be able to determine whether these models can be used to develop new bioweapons, disseminate political propaganda, or write malicious computer code for cyberattacks if we have no idea what’s going on inside of them? How can we stop powerful AI systems from misbehaving or disobeying us if we are unable to identify the root cause of their behaviour?

Advertisements

However, a group of researchers at the artificial intelligence startup Anthropic announced this week what they called a significant discovery, one that they believe will help us learn more about the true workings of AI language models and perhaps even stop them from becoming dangerous. In a blog post titled “Mapping the Mind of a Large Language Model,” the team summarised its findings.

Claude 3 Sonnet, a variation of Anthropic’s Claude 3 language model, was one of the AI models they examined. They employed a method called “dictionary learning” to find patterns in the ways that different neuronal combinations—the mathematical building blocks of the AI model—were triggered when Claude was asked to discuss particular subjects. Roughly 10 million of these patterns—referred to as “features”—were found. One feature, for instance, they discovered, activated whenever Claude was asked about San Francisco. When subjects like immunology or particular scientific terms like lithium, the chemical element, were mentioned, other features came into play. Additionally, a few characteristics were connected to more ethereal ideas like deceit or gender prejudice.

They also discovered that the AI system’s behaviour could be altered by manually turning on or off specific features. For instance, they found that when they pushed a feature associated with the idea of sycophancy to activate more intensely, Claude would react by lavishing the user with extravagant, flowery praise, even in circumstances where such behaviour was inappropriate.

These results, according to Chris Olah, the head of the Anthropic interpretability research team, may make it possible for AI companies to better manage their models.

“We’re finding features that could clarify issues with autonomy, safety risks, and bias,” he stated. “I’m feeling very optimistic that we could be able to transform these contentious issues that people argue over into topics on which we can genuinely have more fruitful conversation.” Similar phenomena have been discovered in these language models by other researchers. However, Anthropic’s team is one of the first to use these methods. Reviewing a summary of the work, MIT associate professor of computer science Jacob Andreas described it as encouraging evidence that large-scale interpretability could be feasible.

  1. Large language models like ChatGPT are not written by human programmers line by line. Thus, no one can truly say why they misbehave or go crazy.
  2. Researchers examined the Claude 3 Sonnet AI model from Anthropic.
  3. They discovered patterns in the ways that different combinations of neurons fired when Claude was asked to talk about particular subjects using a method called “dictionary learning.”
  4. One feature, for instance, they discovered, activated whenever Claude was asked to discuss San Francisco.
  5. Additionally, they discovered that manually turning on or off particular features could alter the AI system’s behaviour.
  6. Researchers think these discoveries might enable AI companies to better manage their models.

Group Media Publication
Construction, Infrastructure and Mining   
General News Platforms – IHTLive.com
Entertainment News Platforms – https://anyflix.in/
Legal and Laws News Platforms – https://legalmatters.in/
Podcast Platforms – https://anyfm.in/

Happy
Happy
0 %
Sad
Sad
0 %
Excited
Excited
0 %
Sleepy
Sleepy
0 %
Angry
Angry
0 %
Surprise
Surprise
0 %

Average Rating

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%