OpenAI Archives - Chgogs News

Apple Engineers Show How Flimsy AI ‘Reasoning’ Can Be

admin — Tue, 15 Oct 2024 23:55:27 +0000

For a while now, companies like OpenAI and Google have been touting advanced “reasoning” capabilities as the next big step in their latest artificial intelligence models. Now, though, a new study from six Apple engineers shows that the mathematical “reasoning” displayed by advanced large language models can be extremely brittle and unreliable in the face of seemingly trivial changes to common benchmark problems.

The fragility highlighted in these new results helps support previous research suggesting that LLMs’ use of probabilistic pattern matching is missing the formal understanding of underlying concepts needed for truly reliable mathematical reasoning capabilities. “Current LLMs are not capable of genuine logical reasoning,” the researchers hypothesize based on these results. “Instead, they attempt to replicate the reasoning steps observed in their training data.”

Mix It Up

In “GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models”—currently available as a preprint paper—the six Apple researchers start with GSM8K’s standardized set of more than 8,000 grade-school level mathematical word problems, which is often used as a benchmark for modern LLMs’ complex reasoning capabilities. They then take the novel approach of modifying a portion of that testing set to dynamically replace certain names and numbers with new values—so a question about Sophie getting 31 building blocks for her nephew in GSM8K could become a question about Bill getting 19 building blocks for his brother in the new GSM-Symbolic evaluation.

This approach helps avoid any potential “data contamination” that can result from the static GSM8K questions being fed directly into an AI model’s training data. At the same time, these incidental changes don’t alter the actual difficulty of the inherent mathematical reasoning at all, meaning models should theoretically perform just as well when tested on GSM-Symbolic as GSM8K.

Instead, when the researchers tested more than 20 state-of-the-art LLMs on GSM-Symbolic, they found average accuracy reduced across the board compared to GSM8K, with performance drops between 0.3 percent and 9.2 percent, depending on the model. The results also showed high variance across 50 separate runs of GSM-Symbolic with different names and values. Gaps of up to 15 percent accuracy between the best and worst runs were common within a single model and, for some reason, changing the numbers tended to result in worse accuracy than changing the names.

This kind of variance—both within different GSM-Symbolic runs and compared to GSM8K results—is more than a little surprising since, as the researchers point out, “the overall reasoning steps needed to solve a question remain the same.” The fact that such small changes lead to such variable results suggests to the researchers that these models are not doing any “formal” reasoning but are instead “attempt[ing] to perform a kind of in-distribution pattern-matching, aligning given questions and solution steps with similar ones seen in the training data.”

Don’t Get Distracted

Still, the overall variance shown for the GSM-Symbolic tests was often relatively small in the grand scheme of things. OpenAI’s ChatGPT-4o, for instance, dropped from 95.2 percent accuracy on GSM8K to a still-impressive 94.9 percent on GSM-Symbolic. That’s a pretty high success rate using either benchmark, regardless of whether or not the model itself is using “formal” reasoning behind the scenes (though total accuracy for many models dropped precipitously when the researchers added just one or two additional logical steps to the problems).

The tested LLMs fared much worse, though, when the Apple researchers modified the GSM-Symbolic benchmark by adding “seemingly relevant but ultimately inconsequential statements” to the questions. For this “GSM-NoOp” benchmark set (short for “no operation”), a question about how many kiwis someone picks across multiple days might be modified to include the incidental detail that “five of them [the kiwis] were a bit smaller than average.”

Adding in these red herrings led to what the researchers termed “catastrophic performance drops” in accuracy compared to GSM8K, ranging from 17.5 percent to a whopping 65.7 percent, depending on the model tested. These massive drops in accuracy highlight the inherent limits in using simple “pattern matching” to “convert statements to operations without truly understanding their meaning,” the researchers write.

Source link

The post Apple Engineers Show How Flimsy AI ‘Reasoning’ Can Be appeared first on Chgogs News.

NYT sends AI startup Perplexity ‘cease and desist’ notice over content use: Report

admin — Tue, 15 Oct 2024 12:13:08 +0000

The New York Times has sent generative AI startup Perplexity a “cease and desist” notice demanding the company stop using its content.
| Photo Credit: Reuters

The New York Times has sent generative AI startup Perplexity a “cease and desist” notice demanding the company stop using its content, the Wall Street Journal reported on Tuesday.

The letter from the news publisher said the way Perplexity was using its content, including to create summaries and other types of output, violates its rights under copyright law, the report said.

Since the introduction of ChatGPT, publishers have been raising the alarm on chatbots which can comb the internet to find information and creating paragraph summaries for the user.

Perplexity and the New York Times did not immediately respond to Reuters’ requests for comment.

NYT is also tussling with OpenAI, which it had sued late last year, accusing the firm of using millions of its newspaper articles without permission to train its AI chatbot.

Other media firms such as The Atlantic and Vox Media have signed content licensing deals with OpenAI which give the ChatGPT-maker access to their content.

In the letter to Perplexity, NYT asked the company to provide information on how it is accessing the publisher’s website despite its prevention efforts, according to the WSJ report.

Perplexity had previously assured the publisher it would stop using “crawling” technology, the report said citing the letter.

Earlier this year, Reuters reported multiple AI companies were bypassing a web standard used by publishers to block the scraping of their data used in generative AI systems.

Perplexity faced accusations from media organizations such as Forbes and Wired for plagiarizing their content, but has since launched a revenue-sharing program to address some concerns put forward by publishers.

Published – October 15, 2024 05:43 pm IST

Source link

The post NYT sends AI startup Perplexity ‘cease and desist’ notice over content use: Report appeared first on Chgogs News.

How to Stop Your Data From Being Used to Train AI

admin — Sat, 12 Oct 2024 13:30:00 +0000

If you’re using a personal Adobe account, it’s easy to opt out of the content analysis. Open up Adobe’s privacy page, scroll down to the Content analysis for product improvement section, and click the toggle off. If you have a business or school account, you are automatically opted out.

Amazon: AWS

AI services from Amazon Web Services, like Amazon Rekognition or Amazon CodeWhisperer, may use customer data to improve the company’s tools, but it’s possible to opt out of the AI training. This used to be one of the most complicated processes on the list, but it’s been streamlined in recent months. Outlined on this support page from Amazon is the full process for opting out your organization.

Figma

Figma, a popular design software, may use your data for model training. If your account is licensed through an Organization or Enterprise plan, you are automatically opted out. On the other hand, Starter and Professional accounts are opted in by default. This setting can be changed at the team level by opening the settings to the AI tab and switching off the Content training.

Google Gemini

For users of Google’s chatbot, Gemini, conversations may sometimes be selected for human review to improve the AI model. Opting out is simple, though. Open up Gemini in your browser, click on Activity, and select the Turn Off drop-down menu. Here you can just turn off the Gemini Apps Activity, or you can opt out as well as delete your conversation data. While this does mean in most cases that future chats won’t be seen for human review, already selected data is not erased through this process. According to Google’s privacy hub for Gemini, these chats may stick around for three years.

Grammarly

Grammarly updated its policies, so personal accounts can now opt out of AI training. Do this by going to Account, then Settings, and turning the Product Improvement and Training toggle off. Is your account through an enterprise or education license? Then, you are automatically opted out.

Grok AI (X)

Kate O’Flaherty wrote a great piece for WIRED about Grok AI and protecting your privacy on X, the platform where the chatbot operates. It’s another situation where millions of users of a website woke up one day and were automatically opted in to AI training with minimal notice. If you still have an X account, it’s possible to opt out of your data being used to train Grok by going to the Settings and privacy section, then Privacy and safety. Open the Grok tab, then deselect your data sharing option.

HubSpot

HubSpot, a popular marketing and sales software platform, automatically uses data from customers to improve its machine-learning model. Unfortunately, there’s not a button to press to turn off the use of data for AI training. You have to send an email to privacy@hubspot.com with a message requesting that the data associated with your account be opted out.

Users of the career networking website were surprised to learn in September that their data was potentially being used to train AI models. “At the end of the day, people want that edge in their careers, and what our gen-AI services do is help give them that assist,” says Eleanor Crum, a spokesperson for LinkedIn.

You can opt out from new LinkedIn posts being used for AI training by visiting your profile and opening the Settings. Tap on Data Privacy and uncheck the slider labeled Use my data for training content creation AI models.

OpenAI: ChatGPT and Dall-E

OpenAI via Matt Burgess

People reveal all sorts of personal information while using a chatbot. OpenAI provides some options for what happens to what you say to ChatGPT—including allowing its future AI models not to be trained on the content. “We give users a number of easily accessible ways to control their data, including self-service tools to access, export, and delete personal information through ChatGPT. That includes easily accessible options to opt out from the use of their content to train models,” says Taya Christianson, an OpenAI spokesperson. (The options vary slightly depending on your account type, and data from enterprise customers is not used to train models).

Source link

The post How to Stop Your Data From Being Used to Train AI appeared first on Chgogs News.

Amazon’s AI for delivery, Microsoft’s healthcare agents, and Writer’s model: This week in new AI launches

admin — Sat, 12 Oct 2024 09:00:00 +0000

EvenUp co-founders (L-R) Raymond Mieszaniec, Rami Karabibar, and Saam Mashhad

Photo: EvenUp

EvenUp, an AI startup focused on personal injury AI and document generation, announced this week that it raised a $135 million Series D funding round, leading to a valuation over $1 billion, per a press release. The round was led by Bain Capital Ventures, and brings EvenUp’s total funding to $235 million.

The startup’s Claims Intelligence Platform is powered by its AI model called Piai. The model was “trained on hundreds of thousands of injury cases, millions of medical records and visits, and internal legal expertise,” according to the startup.

“At EvenUp, we’re committed to revolutionizing the personal injury sector in the U.S,” Rami Karabibar, EvenUp co-founder and chief executive, told Quartz. “With our Series D, we’re dedicated to driving further innovation by bringing new products and features to market to strengthen our leadership position in legal-focused generative AI.”

EvenUp is “fully dedicated to supporting our customers by freeing up their time in routine tasks, allowing them to focus more on what truly matters—their clients,” Karabibar said.

The company says over 1,000 law firms have used its platform to claim over $1.5 billion in damages.

Source link

The post Amazon’s AI for delivery, Microsoft’s healthcare agents, and Writer’s model: This week in new AI launches appeared first on Chgogs News.