In Silicon Valley, a new kind of proofreader is hard at work.

It doesn’t wield a red pen or squint at comma splices. Instead, this digital detective, born from the same artificial intelligence it’s tasked to police, scans lines of code with inhuman speed, hunting for bugs that even the sharpest human eyes might miss.

Welcome to the world of CriticGPT, OpenAI’s latest creation in the quest for more reliable AI. As tech giants race to integrate AI into everything from customer service to complex data analysis, they’re confronting an uncomfortable truth: Their digital wunderkinds, for all their brilliance, have a troubling tendency to make things up.

Now, AI is being taught to catch its own mistakes. And it’s not just OpenAI in the game. Across town at Google, engineers are taking a different tack, feeding their AI a diet of curated data in hopes of grounding its flights of fancy in cold, hard facts.

The introduction of more reliable AI tools could have far-reaching implications for commerce across various sectors. For retailers, improved AI accuracy could lead to more precise inventory management and personalized customer recommendations, potentially increasing sales and reducing waste.

eCommerce platforms might benefit from chatbots that provide more accurate product information and customer support, improving user experience and potentially boosting conversion rates.

In the financial sector, more trustworthy AI-generated analyses could enhance risk assessment and trading strategies, leading to better-informed investment decisions.

For manufacturers, AI with enhanced error detection could optimize production processes, reducing defects and improving quality control.

Small businesses could use these more reliable AI tools to compete more effectively with larger corporations, as improved accuracy could level the playing field in areas like market analysis and customer targeting.

However, the adoption of these advanced AI systems might also necessitate investments in technology and training, potentially creating new barriers to entry for smaller players in the market.

Finding Mistakes

CriticGPT is designed to identify mistakes in code generated by ChatGPT, a large language model (LLM) known for its ability to create human-like text and code. LLMs are AI systems trained on vast amounts of text data, enabling them to understand and generate human-like language. However, these models can sometimes produce errors or “hallucinations,” generating content that seems plausible but is factually incorrect.

As outlined in its research paper “LLM Critics Help Catch LLM Bugs,” CriticGPT acts as an AI assistant to human trainers who review programming code generated by ChatGPT.

“On code containing naturally occurring LLM errors, model-written critiques are preferred over human critiques in 63% of cases, and human evaluation finds that models catch more bugs than human contractors paid for code review,” the paper said.

The model was trained using a novel approach where human trainers intentionally introduced errors into ChatGPT-generated code and then provided feedback as if they had discovered these bugs. This method allowed CriticGPT to learn how to effectively identify and critique various coding errors.

The development of CriticGPT also involved a new technique called Force Sampling Beam Search (FSBS).

“This method helps CriticGPT write more detailed reviews of code,” Ars Technica reported Thursday (June 27). “It lets the researchers adjust how thorough CriticGPT is in looking for problems while also controlling how often it might make up issues that don’t really exist.”

This balance can be fine-tuned depending on the specific requirements of different AI training tasks.

Google Works to Improve Results

Meanwhile, Google is enhancing its Vertex AI platform, which allows companies to build AI services using Google’s machine learning models.

“After rolling out general availability for Vertex AI’s Grounding with Google Search feature in May, … Google has now announced that customers will also have the option to improve their services’ AI results with specialized third-party datasets,” The Verge reported Thursday.

This approach aims to improve the accuracy of AI-generated information by basing it on trusted, up-to-date sources.

“Google says the service will utilize data from providers like Moody’s, MSCI, Thomson Reuters and ZoomInfo,” with this feature set to be available in “Q3 this year,” The Verge reported.

This integration of third-party datasets could be particularly valuable for businesses in sectors like finance, where access to accurate, up-to-date information is crucial.

Additionally, Google is launching a “high-fidelity mode” for Vertex AI, per The Verge report. This feature allows organizations to use their own corporate datasets to inform AI outputs rather than relying solely on the AI’s pre-existing knowledge base. The mode is “powered by a specialized version of Gemini 1.5 Flash and is available now in preview via Vertex AI’s Experiments tool.”

The development is particularly significant for businesses dealing with sensitive or industry-specific information that may not be well-represented in general AI training data. By allowing companies to ground AI responses in their proprietary data, Google offers a solution that could make AI adoption more appealing to organizations with specialized information needs.

These advancements in AI error reduction and accuracy improvement reflect the tech industry’s response to growing concerns about the reliability of AI-generated content. As AI systems become more integrated into various business processes, from customer service chatbots to code generation and data analysis, ensuring their accuracy and reliability becomes increasingly critical.

These new tools could lead to more trustworthy AI-assisted coding and information retrieval for businesses, potentially improving efficiency and decision-making processes. The ability to ground AI responses in specific, trusted datasets — whether third-party or proprietary — could enhance the utility of AI in sectors such as finance, healthcare and manufacturing. This could lead to more accurate and context-specific AI-generated insights, potentially transforming industry decision-making processes.

For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.

Leave a Reply

Your email address will not be published. Required fields are marked *