Blog

The AI that scored 95% — until consultants learned it was AI

Presented by SAP

When SAP ran a quiet internal experiment to gauge consultant attitudes toward AI, the results were striking. Five teams were asked to validate answers to more than 1,000 business requirements completed by SAP’s AI co-pilot, Joule for Consultants — a workload that would normally take several weeks.

Four teams were told the analysis had been completed by junior interns fresh out of school. They reviewed the material, found it impressive, and rated the work about 95% accurate.

The fifth team was told the very same answers had come from AI.

They rejected almost everything.

Only when asked to validate each answer one by one did they discover that the AI was, in fact, highly accurate — surfacing detailed insights the consultants had initially dismissed. The overall accuracy? Again, about 95%.

“The lesson learned here is that we need to be very cautious as we introduce AI — especially in how we communicate with senior consultants about its possibilities and how to integrate it into their workflows,” says Guillermo B. Vazquez Mendez, chief architect, RI business transformation and architecture, SAP America Inc.

The experiment has since become a revealing starting point for SAP’s push toward the consultant of 2030: a practitioner who is deeply human, enabled by AI, and no longer weighed down by the technical grunt work of the past.

Overcoming AI skepticism

Resistance isn’t surprising, Vazquez notes. Consultants with two or three decades of experience carry enormous institutional knowledge — and an understandable degree of caution.

But AI copilots like Joule for Consultants are not replacing expertise. They’re amplifying it.

“What Joule really does is make their very expensive time far more effective,” Vazquez says. “It removes the clerical work, so they can focus on turning out high-quality answers in a fraction of the time.”

He emphasizes this message constantly: “AI is not replacing you. It’s a tool for you. Human oversight is always required. But now, instead of spending your time looking for documentation, you’re gaining significant time and boosting the effectiveness and detail of your answers.”

The consultant time-shift: from tech execution to business insight

Historically, consultants spent about 80% of their time understanding technical systems — how processes run, how data flows, how functions execute. Customers, by contrast, spend 80% of their time focused on their business.

That mismatch is exactly where Joule steps in.

“There’s a gap there — and the bridge is AI,” Vazquez says. “It flips the time equation, enabling consultants to invest more of their energy in understanding the customer’s industry and business goals. AI takes on the heavy technical lift, so consultants can focus on driving the right business outcomes.”

Bringing new consultants up to speed

AI is also transforming how new hires learn.

“We’re excited to see Joule acting as a bridge between senior consultants, who are adapting more slowly, and interns and new consultants who are already technically savvy,” Vazquez says.

Junior consultants ramp up faster because Joule helps them operate independently. Seniors, meanwhile, engage where their insight matters most.

This is also where many consultants learn the fundamentals of today’s AI copilots. Much of the work depends on prompt engineering — for instance, instructing Joule to act as a senior chief technology architect specializing in finance and SAP S/4HANA 2023, then asking it to analyze business requirements and deliver the output as tables or PowerPoint slides.

Once they grasp how to frame prompts, consultants consistently get higher-quality, more structured answers.

New architects are also able to communicate more clearly with their more experienced counterparts. They know what they don’t know and can ask targeted questions, which makes mentorship far smoother. It’s created a real synergy, Vazquez adds — senior consultants see how quickly new hires are adapting and learning with AI, and that momentum encourages them to keep pace and adopt the technology themselves.

Looking ahead to the future of AI copilots

“We’re still in the baby steps of AI — we’re toddlers,” Vazquez says. “Right now, copilots depend on prompt engineering to get good answers. The better you prompt, the better the answer you get.”

But that represents only the earliest phase of what these systems will eventually do. As copilots mature, they’ll move beyond responding to prompts and start interpreting entire business processes — understanding the sequence of steps, identifying where human intervention is needed, and spotting where an AI agent could take over. That shift is what leads directly into agentic AI.

SAP’s depth of process knowledge is what makes that evolution possible. The company has mapped more than 3,500 business processes across industries — a repository Vazquez calls “some of the most valuable, rigorously tested processes developed in the last 50 years.” Every day, SAP systems support roughly $7.3 trillion in global commerce, giving these emerging AI agents a rich foundation to navigate and reason over.

“With that level of process insight and data, we can take a real leap forward,” he says, “equipping our consultants with agentic AI that can solve complex challenges and push us toward increasingly autonomous systems.”

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Aralık 10, 2025

Z.ai debuts open source GLM-4.6V, a native tool-calling vision model for multimodal reasoning

Chinese AI startup Zhipu AI aka Z.ai has released its GLM-4.6V series, a new generation of open-source vision-language models (VLMs) optimized for multimodal reasoning, frontend automation, and high-efficiency deployment.

The release includes two models in "large" and "small" sizes:

GLM-4.6V (106B), a larger 106-billion parameter model aimed at cloud-scale inference
GLM-4.6V-Flash (9B), a smaller model of only 9 billion parameters designed for low-latency, local applications

Recall that generally speaking, models with more parameters — or internal settings governing their behavior, i.e. weights and biases — are more powerful, performant, and capable of performing at a higher general level across more varied tasks.

However, smaller models can offer better efficiency for edge or real-time applications where latency and resource constraints are critical.

The defining innovation in this series is the introduction of native function calling in a vision-language model—enabling direct use of tools such as search, cropping, or chart recognition with visual inputs.

With a 128,000 token context length (equivalent to a 300-page novel's worth of text exchanged in a single input/output interaction with the user) and state-of-the-art (SoTA) results across more than 20 benchmarks, the GLM-4.6V series positions itself as a highly competitive alternative to both closed and open-source VLMs. It's available in the following formats:

API access via OpenAI-compatible interface
Try the demo on Zhipu’s web interface
Download weights from Hugging Face
Desktop assistant app available on Hugging Face Spaces

Licensing and Enterprise Use

GLM‑4.6V and GLM‑4.6V‑Flash are distributed under the MIT license, a permissive open-source license that allows free commercial and non-commercial use, modification, redistribution, and local deployment without obligation to open-source derivative works.

This licensing model makes the series suitable for enterprise adoption, including scenarios that require full control over infrastructure, compliance with internal governance, or air-gapped environments.

Model weights and documentation are publicly hosted on Hugging Face, with supporting code and tooling available on GitHub.

The MIT license ensures maximum flexibility for integration into proprietary systems, including internal tools, production pipelines, and edge deployments.

Architecture and Technical Capabilities

The GLM-4.6V models follow a conventional encoder-decoder architecture with significant adaptations for multimodal input.

Both models incorporate a Vision Transformer (ViT) encoder—based on AIMv2-Huge—and an MLP projector to align visual features with a large language model (LLM) decoder.

Video inputs benefit from 3D convolutions and temporal compression, while spatial encoding is handled using 2D-RoPE and bicubic interpolation of absolute positional embeddings.

A key technical feature is the system’s support for arbitrary image resolutions and aspect ratios, including wide panoramic inputs up to 200:1.

In addition to static image and document parsing, GLM-4.6V can ingest temporal sequences of video frames with explicit timestamp tokens, enabling robust temporal reasoning.

On the decoding side, the model supports token generation aligned with function-calling protocols, allowing for structured reasoning across text, image, and tool outputs. This is supported by extended tokenizer vocabulary and output formatting templates to ensure consistent API or agent compatibility.

Native Multimodal Tool Use

GLM-4.6V introduces native multimodal function calling, allowing visual assets—such as screenshots, images, and documents—to be passed directly as parameters to tools. This eliminates the need for intermediate text-only conversions, which have historically introduced information loss and complexity.

The tool invocation mechanism works bi-directionally:

Input tools can be passed images or videos directly (e.g., document pages to crop or analyze).
Output tools such as chart renderers or web snapshot utilities return visual data, which GLM-4.6V integrates directly into the reasoning chain.

In practice, this means GLM-4.6V can complete tasks such as:

Generating structured reports from mixed-format documents
Performing visual audit of candidate images
Automatically cropping figures from papers during generation
Conducting visual web search and answering multimodal queries

High Performance Benchmarks Compared to Other Similar-Sized Models

GLM-4.6V was evaluated across more than 20 public benchmarks covering general VQA, chart understanding, OCR, STEM reasoning, frontend replication, and multimodal agents.

According to the benchmark chart released by Zhipu AI:

GLM-4.6V (106B) achieves SoTA or near-SoTA scores among open-source models of comparable size (106B) on MMBench, MathVista, MMLongBench, ChartQAPro, RefCOCO, TreeBench, and more.
GLM-4.6V-Flash (9B) outperforms other lightweight models (e.g., Qwen3-VL-8B, GLM-4.1V-9B) across almost all categories tested.
The 106B model’s 128K-token window allows it to outperform larger models like Step-3 (321B) and Qwen3-VL-235B on long-context document tasks, video summarization, and structured multimodal reasoning.

Example scores from the leaderboard include:

MathVista: 88.2 (GLM-4.6V) vs. 84.6 (GLM-4.5V) vs. 81.4 (Qwen3-VL-8B)
WebVoyager: 81.0 vs. 68.4 (Qwen3-VL-8B)
Ref-L4-test: 88.9 vs. 89.5 (GLM-4.5V), but with better grounding fidelity at 87.7 (Flash) vs. 86.8

Both models were evaluated using the vLLM inference backend and support SGLang for video-based tasks.

Frontend Automation and Long-Context Workflows

Zhipu AI emphasized GLM-4.6V’s ability to support frontend development workflows. The model can:

Replicate pixel-accurate HTML/CSS/JS from UI screenshots
Accept natural language editing commands to modify layouts
Identify and manipulate specific UI components visually

This capability is integrated into an end-to-end visual programming interface, where the model iterates on layout, design intent, and output code using its native understanding of screen captures.

In long-document scenarios, GLM-4.6V can process up to 128,000 tokens—enabling a single inference pass across:

150 pages of text (input)
200 slide decks
1-hour videos

Zhipu AI reported successful use of the model in financial analysis across multi-document corpora and in summarizing full-length sports broadcasts with timestamped event detection.

Training and Reinforcement Learning

The model was trained using multi-stage pre-training followed by supervised fine-tuning (SFT) and reinforcement learning (RL). Key innovations include:

Curriculum Sampling (RLCS): Dynamically adjusts the difficulty of training samples based on model progress
Multi-domain reward systems: Task-specific verifiers for STEM, chart reasoning, GUI agents, video QA, and spatial grounding
Function-aware training: Uses structured tags (e.g., <think>, <answer>, <|begin_of_box|>) to align reasoning and answer formatting

The reinforcement learning pipeline emphasizes verifiable rewards (RLVR) over human feedback (RLHF) for scalability, and avoids KL/entropy losses to stabilize training across multimodal domains

Pricing (API)

Zhipu AI offers competitive pricing for the GLM-4.6V series, with both the flagship model and its lightweight variant positioned for high accessibility.

GLM-4.6V: $0.30 (input) / $0.90 (output) per 1M tokens
GLM-4.6V-Flash: Free

Compared to major vision-capable and text-first LLMs, GLM-4.6V is among the most cost-efficient for multimodal reasoning at scale. Below is a comparative snapshot of pricing across providers:

USD per 1M tokens — sorted lowest → highest total cost

Model	Input	Output	Total Cost	Source
Qwen 3 Turbo	$0.05	$0.20	$0.25	Alibaba Cloud
ERNIE 4.5 Turbo	$0.11	$0.45	$0.56	Qianfan
GLM‑4.6V	$0.30	$0.90	$1.20	Z.AI
Grok 4.1 Fast (reasoning)	$0.20	$0.50	$0.70	xAI
Grok 4.1 Fast (non-reasoning)	$0.20	$0.50	$0.70	xAI
deepseek-chat (V3.2-Exp)	$0.28	$0.42	$0.70	DeepSeek
deepseek-reasoner (V3.2-Exp)	$0.28	$0.42	$0.70	DeepSeek
Qwen 3 Plus	$0.40	$1.20	$1.60	Alibaba Cloud
ERNIE 5.0	$0.85	$3.40	$4.25	Qianfan
Qwen-Max	$1.60	$6.40	$8.00	Alibaba Cloud
GPT-5.1	$1.25	$10.00	$11.25	OpenAI
Gemini 2.5 Pro (≤200K)	$1.25	$10.00	$11.25	Google
Gemini 3 Pro (≤200K)	$2.00	$12.00	$14.00	Google
Gemini 2.5 Pro (>200K)	$2.50	$15.00	$17.50	Google
Grok 4 (0709)	$3.00	$15.00	$18.00	xAI
Gemini 3 Pro (>200K)	$4.00	$18.00	$22.00	Google
Claude Opus 4.1	$15.00	$75.00	$90.00	Anthropic

Previous Releases: GLM‑4.5 Series and Enterprise Applications

Prior to GLM‑4.6V, Z.ai released the GLM‑4.5 family in mid-2025, establishing the company as a serious contender in open-source LLM development.

The flagship GLM‑4.5 and its smaller sibling GLM‑4.5‑Air both support reasoning, tool use, coding, and agentic behaviors, while offering strong performance across standard benchmarks.

The models introduced dual reasoning modes (“thinking” and “non-thinking”) and could automatically generate complete PowerPoint presentations from a single prompt — a feature positioned for use in enterprise reporting, education, and internal comms workflows. Z.ai also extended the GLM‑4.5 series with additional variants such as GLM‑4.5‑X, AirX, and Flash, targeting ultra-fast inference and low-cost scenarios.

Together, these features position the GLM‑4.5 series as a cost-effective, open, and production-ready alternative for enterprises needing autonomy over model deployment, lifecycle management, and integration pipel

Ecosystem Implications

The GLM-4.6V release represents a notable advance in open-source multimodal AI. While large vision-language models have proliferated over the past year, few offer:

Integrated visual tool usage
Structured multimodal generation
Agent-oriented memory and decision logic

Zhipu AI’s emphasis on “closing the loop” from perception to action via native function calling marks a step toward agentic multimodal systems.

The model’s architecture and training pipeline show a continued evolution of the GLM family, positioning it competitively alongside offerings like OpenAI’s GPT-4V and Google DeepMind’s Gemini-VL.

Takeaway for Enterprise Leaders

With GLM-4.6V, Zhipu AI introduces an open-source VLM capable of native visual tool use, long-context reasoning, and frontend automation. It sets new performance marks among models of similar size and provides a scalable platform for building agentic, multimodal AI systems.

Aralık 9, 2025

Anthropic’s Claude Code can now read your Slack messages and write code for you

Anthropic on Monday launched a beta integration that connects its fast-growing Claude Code programming agent directly to Slack, allowing software engineers to delegate coding tasks without leaving the workplace messaging platform where much of their daily communication already happens.

The release, which Anthropic describes as a "research preview," is the AI safety company's latest move to embed its technology deeper into enterprise workflows — and comes as Claude Code has emerged as a surprise revenue engine, generating over $1 billion in annualized revenue just six months after its public debut in May.

"The critical context around engineering work often lives in Slack, including bug reports, feature requests, and engineering discussion," the company wrote in its announcement blog post. "When a bug report appears or a teammate needs a code fix, you can now tag Claude in Slack to automatically spin up a Claude Code session using the surrounding context."

From bug report to pull request: how the new Slack integration actually works

The mechanics are deceptively simple but address a persistent friction point in software development: the gap between where problems get discussed and where they get fixed.

When a user mentions @Claude in a Slack channel or thread, Claude analyzes the message to determine whether it constitutes a coding task. If it does, the system automatically creates a new Claude Code session. Users can also explicitly instruct Claude to treat requests as coding tasks.

Claude gathers context from recent channel and thread messages in Slack to feed into the Claude Code session. It will use this context to automatically choose which repository to run the task on based on the repositories you've authenticated to Claude Code on the web.

As the Claude Code session progresses, Claude posts status updates back to the Slack thread. Once complete, users receive a link to the full session where they can review changes, along with a direct link to open a pull request.

The feature builds on Anthropic's existing Claude for Slack integration and requires users to have access to Claude Code on the web. In practical terms, a product manager reporting a bug in Slack could tag Claude, which would then analyze the conversation context, identify the relevant code repository, investigate the issue, propose a fix, and post a pull request—all while updating the original Slack thread with its progress.

Why Anthropic is betting big on enterprise workflow integrations

The Slack integration arrives at a pivotal moment for Anthropic. Claude Code has already hit $1 billion in revenue six months since its public debut in May, according to a LinkedIn post from Anthropic's chief product officer, Mike Krieger. The coding agent continues to barrel toward scale with customers like Netflix, Spotify, and Salesforce.

The velocity of that growth helps explain why Anthropic made its first-ever acquisition earlier this month. Anthropic declined to comment on financial details. The Information earlier reported on Anthropic's bid to acquire Bun.

Bun is a breakthrough JavaScript runtime that is dramatically faster than the leading competition. As an all-in-one toolkit — combining runtime, package manager, bundler, and test runner — it's become essential infrastructure for AI-led software engineering, helping developers build and test applications at unprecedented velocity.

Since becoming generally available in May 2025, Claude Code has grown from its origins as an internal engineering experiment into a critical tool for many of the world's category-leading enterprises, including Netflix, Spotify, KPMG, L'Oreal, and Salesforce — and Bun has been key in helping scale its infrastructure throughout that evolution.

The acquisition signals that Anthropic views Claude Code not as a peripheral feature but as a core business line worth substantial investment. The Slack integration extends that bet, positioning Claude Code as an ambient presence in the workspaces where engineering decisions actually get made.

According to an Anthropic spokesperson, companies including Rakuten, Novo Nordisk, Uber, Snowflake, and Ramp now use Claude Code for both professional and novice developers. Rakuten, the Japanese e-commerce giant, has reportedly reduced software development timelines from 24 days to just 5 days using the tool — a 79% reduction that illustrates the productivity claims Anthropic has been making.

Claude Code's rapid rise from internal experiment to billion-dollar product

The Slack launch is the latest in a rapid series of Claude Code expansions. In late November, Claude Code was added to Anthropic's desktop apps including the Mac version. Claude Code was previously limited to mobile apps and the web. It allows software engineers to code, research, and update work with multiple local and remote sessions running at the same time.

That release accompanied Anthropic's unveiling of Claude Opus 4.5, its newest and most capable model. Claude Opus 4.5 is available today on the company's apps, API, and on all three major cloud platforms. Pricing is $5/$25 per million tokens — making Opus-level capabilities accessible to even more users, teams, and enterprises.

The company has also invested heavily in the developer infrastructure that powers Claude Code. In late November, Anthropic released three new beta features for tool use: Tool Search Tool, which allows Claude to use search tools to access thousands of tools without consuming its context window; Programmatic Tool Calling, which allows Claude to invoke tools in a code execution environment reducing the impact on the model's context window; and Tool Use Examples, which provides a universal standard for demonstrating how to effectively use a given tool.

The Model Context Protocol (MCP) is an open standard for connecting AI agents to external systems. Connecting agents to tools and data traditionally requires a custom integration for each pairing, creating fragmentation and duplicated effort that makes it difficult to scale truly connected systems. MCP provides a universal protocol — developers implement MCP once in their agent and it unlocks an entire ecosystem of integrations.

Inside Anthropic's own AI transformation: what happens when engineers use Claude all day

Anthropic has been unusually transparent about how its own engineers use Claude Code — and the findings offer a preview of broader workforce implications. In August 2025, Anthropic surveyed 132 engineers and researchers, conducted 53 in-depth qualitative interviews, and studied internal Claude Code usage data to understand how AI use is changing work at the company.

Employees self-reported using Claude in 60% of their work and achieving a 50% productivity boost, a 2-3x increase from this time last year. This productivity looks like slightly less time per task category, but considerably more output volume.

Perhaps most notably, 27% of Claude-assisted work consists of tasks that wouldn't have been done otherwise, such as scaling projects, making nice-to-have tools like interactive data dashboards, and exploratory work that wouldn't be cost-effective if done manually.

The internal research also revealed how Claude is changing the nature of engineering collaboration. The maximum number of consecutive tool calls Claude Code makes per transcript increased by 116%. Claude now chains together 21.2 independent tool calls without need for human intervention versus 9.8 tool calls from six months ago.

The number of human turns decreased by 33%. The average number of human turns decreased from 6.2 to 4.1 per transcript, suggesting that less human input is necessary to accomplish a given task now compared to six months ago.

But the research also surfaced tensions. One prominent theme was that Claude has become the first stop for questions that once went to colleagues. "It has reduced my dependence on [my team] by 80%, [but] the last 20% is crucial and I go and talk to them," one engineer explained. Several engineers said they "bounce ideas off" Claude, similar to interactions with human collaborators.

Others described experiencing less interaction with colleagues. Some appreciate the reduced social friction, but others resist the change or miss the older way of working: "I like working with people and it is sad that I 'need' them less now."

How Anthropic stacks up against OpenAI, Google, and Microsoft in the enterprise AI race

Anthropic is not alone in racing to capture the enterprise coding market. OpenAI, Google, and Microsoft (through GitHub Copilot) are all pursuing similar integrations. The Slack launch gives Anthropic a presence in one of the most widely-used enterprise communication platforms — Slack claims over 750,000 organizations use its software.

The deal comes as Anthropic pursues a more disciplined growth path than rival OpenAI, focusing on enterprise customers and coding workloads. Internal financials reported by The Wall Street Journal show Anthropic expects to break even by 2028 — two years earlier than OpenAI, which continues to invest heavily in infrastructure as it expands into video, hardware, and consumer products.

The move also marks an increased push into developer tooling. Anthropic has recently seen backing from some of tech's biggest titans. Microsoft and Nvidia pledged up to $15 billion in fresh investment in Anthropic last month, alongside a $30 billion commitment from Anthropic to run Claude Code on Microsoft's cloud. This is in addition to the $8 billion invested from Amazon and $3 billion from Google.

The cross-investment from both Microsoft and Google — fierce competitors in the cloud and AI spaces — highlights how valuable Anthropic's enterprise positioning has become. By integrating with Slack (which is owned by Salesforce), Anthropic further embeds itself in the enterprise software ecosystem while remaining platform-agnostic.

What the Slack integration means for developers — and whether they can trust it

For engineering teams, the Slack integration promises to collapse the distance between problem identification and problem resolution. A bug report in a Slack channel can immediately trigger investigation. A feature request can spawn a prototype. A code review comment can generate a refactor.

But the integration also raises questions about oversight and code quality. Most Anthropic employees use Claude frequently while reporting they can "fully delegate" only 0-20% of their work to it. Claude is a constant collaborator but using it generally involves active supervision and validation, especially in high-stakes work — versus handing off tasks requiring no verification at all.

Some employees are concerned about the atrophy of deeper skillsets required for both writing and critiquing code — "When producing output is so easy and fast, it gets harder and harder to actually take the time to learn something."

The Slack integration, by making Claude Code invocation as simple as an @mention, may accelerate both the productivity benefits and the skill-atrophy concerns that Anthropic's own research has documented.

The future of coding may be conversational—and Anthropic is racing to prove it

The beta launch marks the beginning of what Anthropic expects will be a broader rollout, with documentation forthcoming for teams looking to deploy the integration and refinements planned based on user feedback during the research preview phase.

For Anthropic, the Slack integration is a calculated bet on a fundamental shift in how software gets written. The company is wagering that the future of coding will be conversational — that the walls between where developers talk about problems and where they solve them will dissolve entirely. The companies that win enterprise AI, in this view, will be the ones that meet developers not in specialized tools but in the chat windows they already have open all day.

Whether that vision becomes reality will depend on whether Claude Code can deliver enterprise-grade reliability while maintaining the security that organizations demand. The early returns are promising: a billion dollars in revenue, a roster of Fortune 500 customers, and a growing ecosystem of integrations suggest Anthropic is onto something real.

But in one of Anthropic's own internal interviews, an engineer offered a more cautious assessment of the transformation underway: "Nobody knows what's going to happen… the important thing is to just be really adaptable."

In the age of AI coding agents, that may be the only career advice that holds up.

Aralık 9, 2025
Design in the age of AI: How small businesses are building big brands faster

Presented by Design.com

For most of history, design was the last step in starting a business — something entrepreneurs invested in once the idea was proven. Today, it’s one of the first. The rise of generative AI has shifted how small businesses imagine, launch, and grow — turning what used to be a months-long creative process into something interactive, iterative, and accessible from day one.

Search data tells the story. Since 2022, global interest in “AI business name generator” has surged more than 700%. Searches for “AI logo generator” are up 1,200%, and “AI website generator” 1,600%. Small businesses aren’t waiting for enterprise AI trickle-down. They’re adopting these tools en masse to move faster from concept to brand identity.

“The appetite for AI-powered design has been extraordinary,” says Alec Lynch, founder and CEO of Design.com. “Entrepreneurs are realizing they can bring their ideas to life immediately — they don’t have to wait for funding, agencies, or a full creative team. They can start now.”

The democratization of design power

For decades, small businesses were boxed out of high-end design. Building a brand required deep pockets and specialized talent. AI has redrawn that map.

Large language models and image generators now act as collaborative partners — sparking ideas, testing directions, and handling tedious layout and copy work. For founders, that means fewer barriers and faster iteration.

Instead of hiring separate agencies for naming, logo design, and web development, small businesses are turning to unified AI platforms that handle the full early-stage design stack. Tools like Design.com merge naming, logo creation, and website generation into a single workflow — turning an entrepreneur’s first sketch into a polished brand system within minutes.

“AI isn’t replacing creativity,” Lynch adds. “It’s giving people the confidence to express it.”

The five frontiers of AI-powered entrepreneurship

Today’s AI tools mirror the creative journey every founder takes — from naming a business to sharing it with the world. The five fastest-growing design categories on Google reflect each stage of that journey.

1. Naming: From idea to identity

AI naming tools do more than spit out clever words — they help founders discover their voice. A good generator blends tone, personality, and domain availability so the result feels like a fit, not a random suggestion.

2. Logos: From visuals to meaning

Logo creation is one of the most emotionally resonant steps in brand-building. AI has turned it into a playground for experimentation. Entrepreneurs can test dozens of looks and get instant feedback.

3. Websites: From static pages to adaptive brands

The surge in “AI website generator” searches signals a deeper shift. Websites are no longer static brochures; they’re dynamic brand environments. AI-driven builders now create layouts, headlines, and imagery that adapt to a company’s tone and focus — drastically reducing time to launch.

4. Business cards and brand collateral

Even in a digital age, tangible touchpoints matter. AI-generated business cards give founders an immediate sense of legitimacy while ensuring design consistency across brand assets.

5. Presentations: From slides to storytelling

Founders aren’t just designing assets; they’re designing narratives. Generative AI turns bullet points into persuasive visual stories — raising the quality of pitches, decks, and demos once out of reach for most small teams.

Together, these five frontiers show that small businesses aren’t just using AI to look more polished — they’re using it to think more strategically about brand, story, and customer experience from the start.

The new design ecosystem

Behind the surge in AI design tools lies a broader ecosystem shift. Companies like Canva and Wix made design accessible; the current wave — led by AI-native platforms like Design.com — is more personal and adaptive.

Unlike templated platforms, these tools understand context. A restaurant founder and a SaaS startup will get not just different visuals, but different copy tones, typography systems, and user flows — automatically.

“What we’re seeing,” Lynch explains, “isn’t just growth in one product category. It’s a movement toward connected creativity — where every part of the brand experience learns from every other.”

From AI tools to AI brand systems

The next evolution of small-business design won’t be about single-purpose tools. It will be about connected systems that share data, context, and creative intent across every brand touchpoint.

Imagine naming a company and watching an AI instantly generate a logo, color palette, and homepage layout that all reflect the same personality. As your audience grows, the same system helps you update your visual identity or tone to match new goals — while preserving your original DNA.

That’s the future Design.com and others are building toward: intelligent brand ecosystems that evolve alongside their founders.

“AI design tools are giving small businesses superpowers,” Lynch says. “They’re removing friction from creativity.”

And that frictionless design process is quietly rewriting what entrepreneurship looks like. The ability to create, iterate, and launch in hours instead of months is changing the tempo of business itself — and redefining what it means to be a designer in the age of AI.

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Aralık 8, 2025
Booking.com’s agent strategy: Disciplined, modular and already delivering 2× accuracy

When many enterprises weren’t even thinking about agentic behaviors or infrastructures, Booking.com had already “stumbled” into them with its homegrown conversational recommendation system.

This early experimentation has allowed the company to take a step back and avoid getting swept up in the frantic AI agent hype. Instead, it is taking a disciplined, layered, modular approach to model development: small, travel-specific models for cheap, fast inference; larger large language models (LLMs) for reasoning and understanding; and domain-tuned evaluations built in-house when precision is critical.

With this hybrid strategy — combined with selective collaboration with OpenAI — Booking.com has seen accuracy double across key retrieval, ranking and customer-interaction tasks.

As Pranav Pathak, Booking.com’s AI product development lead, posed to VentureBeat in a new podcast: “Do you build it very, very specialized and bespoke and then have an army of a hundred agents? Or do you keep it general enough and have five agents that are good at generalized tasks, but then you have to orchestrate a lot around them? That's a balance that I think we're still trying to figure out, as is the rest of the industry.”

Check out the new Beyond the Pilot podcast here, and continue reading for highlights.

Moving from guessing to deep personalization without being ‘creepy’

Recommendation systems are core to Booking.com’s customer-facing platforms; however, traditional recommendation tools have been less about recommendation and more about guessing, Pathak conceded. So, from the start, he and his team vowed to avoid generic tools: As he put it, the price and recommendation should be based on customer context.

Booking.com’s initial pre-gen AI tooling for intent and topic detection was a small language model, what Pathak described as “the scale and size of BERT.” The model ingested the customer’s inputs around their problem to determine whether it could be solved through self-service or bumped to a human agent.

“We started with an architecture of ‘you have to call a tool if this is the intent you detect and this is how you've parsed the structure,” Pathak explained. “That was very, very similar to the first few agentic architectures that came out in terms of reason and defining a tool call.”

His team has since built out that architecture to include an LLM orchestrator that classifies queries, triggers retrieval-augmented generation (RAG) and calls APIs or smaller, specialized language models. “We've been able to scale that system quite well because it was so close in architecture that, with a few tweaks, we now have a full agentic stack,” said Pathak.

As a result, Booking.com is seeing a 2X increase in topic detection, which in turn is freeing up human agents’ bandwidth by 1.5 to 1.7X. More topics, even complicated ones previously identified as ‘other’ and requiring escalation, are being automated.

Ultimately, this supports more self-service, freeing human agents to focus on customers with uniquely-specific problems that the platform doesn’t have a dedicated tool flow for — say, a family that is unable to access its hotel room at 2 a.m. when the front desk is closed.

That not only “really starts to compound,” but has a direct, long-term impact on customer retention, Pathak noted. “One of the things we've seen is, the better we are at customer service, the more loyal our customers are.”

Another recent rollout is personalized filtering. Booking.com has between 200 and 250 search filters on its website — an unrealistic amount for any human to sift through, Pathak pointed out. So, his team introduced a free text box that users can type into to immediately receive tailored filters.

“That becomes such an important cue for personalization in terms of what you're looking for in your own words rather than a clickstream,” said Pathak.

In turn, it cues Booking.com into what customers actually want. For instance, hot tubs — when filter personalization first rolled out, jacuzzi’s were one of the most popular requests. That wasn’t even a consideration previously; there wasn’t even a filter. Now that filter is live.

“I had no idea,” Pathak noted. “I had never searched for a hot tub in my room honestly.”

When it comes to personalization, though, there is a fine line; memory remains complicated, Pathak emphasized. While it’s important to have long-term memories and evolving threads with customers — retaining information like their typical budgets, preferred hotel star ratings or whether they need disability access — it must be on their terms and protective of their privacy.

Booking.com is extremely mindful with memory, seeking consent so as to not be “creepy” when collecting customer information.

“Managing memory is much harder than actually building memory,” said Pathak. “The tech is out there, we have the technical chops to build it. We want to make sure we don't launch a memory object that doesn't respect customer consent, that doesn't feel very natural.”

Finding a balance of build versus buy

As agents mature, Booking.com is navigating a central question facing the entire industry: How narrow should agents become?

Instead of committing to either a swarm of highly specialized agents or a few generalized ones, the company aims for reversible decisions and avoids “one-way doors” that lock its architecture into long-term, costly paths. Pathak’s strategy is: Generalize where possible, specialize where necessary and keep agent design flexible to help ensure resiliency.

Pathak and his team are “very mindful” of use cases, evaluating where to build more generalized, reusable agents or more task-specific ones. They strive to use the smallest model possible, with the highest level of accuracy and output quality, for each use case. Whatever can be generalized is.

Latency is another important consideration. When factual accuracy and avoiding hallucinations is paramount, his team will use a larger, much slower model; but with search and recommendations, user expectations set speed. (Pathak noted: “No one’s patient.”)

“We would, for example, never use something as heavy as GPT-5 for just topic detection or for entity extraction,” he said.

Booking.com takes a similarly elastic tack when it comes to monitoring and evaluations: If it's general-purpose monitoring that someone else is better at building and has horizontal capability, they’ll buy it. But if it’s instances where brand guidelines must be enforced, they’ll build their own evals.

Ultimately, Booking.com has leaned into being “super anticipatory,” agile and flexible. “At this point with everything that's happening with AI, we are a little bit averse to walking through one way doors,” said Pathak. “We want as many of our decisions to be reversible as possible. We don't want to get locked into a decision that we cannot reverse two years from now.”

What other builders can learn from Booking.com’s AI journey

Booking.com’s AI journey can serve as an important blueprint for other enterprises.

Looking back, Pathak acknowledged that they started out with a “pretty complicated” tech stack. They’re now in a good place with that, “but we probably could have started something much simpler and seen how customers interacted with it.”

Given that, he offered this valuable advice: If you’re just starting out with LLMs or agents, out-of-the-box APIs will do just fine. “There's enough customization with APIs that you can already get a lot of leverage before you decide you want to go do more.”

On the other hand, if a use case requires customization not available through a standard API call, that makes a case for in-house tools.

Still, he emphasized: Don't start with the complicated stuff. Tackle the “simplest, most painful problem you can find and the simplest, most obvious solution to that.”

Identify the product market fit, then investigate the ecosystems, he advised — but don’t just rip out old infrastructures because a new use case demands something specific (like moving an entire cloud strategy from AWS to Azure just to use the OpenAI endpoint).

Ultimately: “Don't lock yourself in too early,” Pathak noted. “Don't make decisions that are one-way doors until you are very confident that that's the solution that you want to go with.”

Aralık 8, 2025
Why AI coding agents aren’t production-ready: Brittle context windows, broken refactors, missing operational awareness

Remember this Quora comment (which also became a meme)?

(Source: Quora)

In the pre-large language model (LLM) Stack Overflow era, the challenge was discerning which code snippets to adopt and adapt effectively. Now, while generating code has become trivially easy, the more profound challenge lies in reliably identifying and integrating high-quality, enterprise-grade code into production environments.

This article will examine the practical pitfalls and limitations observed when engineers use modern coding agents for real enterprise work, addressing the more complex issues around integration, scalability, accessibility, evolving security practices, data privacy and maintainability in live operational settings. We hope to balance out the hype and provide a more technically-grounded view of the capabilities of AI coding agents.

Limited domain understanding and service limits

AI agents struggle significantly with designing scalable systems due to the sheer explosion of choices and a critical lack of enterprise-specific context. To describe the problem in broad strokes, large enterprise codebases and monorepos are often too vast for agents to directly learn from, and crucial knowledge can be frequently fragmented across internal documentation and individual expertise.

More specifically, many popular coding agents encounter service limits that hinder their effectiveness in large-scale environments. Indexing features may fail or degrade in quality for repositories exceeding 2,500 files, or due to memory constraints. Furthermore, files larger than 500 KB are often excluded from indexing/search, which impacts established products with decades-old, larger code files (although newer projects may admittedly face this less frequently).

For complex tasks involving extensive file contexts or refactoring, developers are expected to provide the relevant files and while also explicitly defining the refactoring procedure and the surrounding build/command sequences to validate the implementation without introducing feature regressions.

Lack of hardware context and usage

AI agents have demonstrated a critical lack of awareness regarding OS machine, command-line and environment installations (conda/venv). This deficiency can lead to frustrating experiences, such as the agent attempting to execute Linux commands on PowerShell, which can consistently result in ‘unrecognized command’ errors. Furthermore, agents frequently exhibit inconsistent ‘wait tolerance’ on reading command outputs, prematurely declaring an inability to read results (and moving ahead to either retry/skip) before a command has even finished, especially on slower machines.

This isn't merely about nitpicking features; rather, the devil is in these practical details. These experience gaps manifest as real points of friction and necessitate constant human vigilance to monitor the agent’s activity in real-time. Otherwise, the agent might ignore initial tool call information and either stop prematurely, or proceed with a half-baked solution requiring undoing some/all changes, re-triggering prompts and wasting tokens. Submitting a prompt on a Friday evening and expecting the code updates to be done when checking on Monday morning is not guaranteed.

Hallucinations over repeated actions

Working with AI coding agents often presents a longstanding challenge of hallucinations, or incorrect or incomplete pieces of information (such as small code snippets) within a larger set of changesexpected to be fixed by a developer with trivial-to-low effort. However, what becomes particularly problematic is when incorrect behavior is repeated within a single thread, forcing users to either start a new thread and re-provide all context, or intervene manually to “unblock” the agent.

For instance, during a Python Function code setup, an agent tasked with implementing complex production-readiness changes encountered a file (see below) containing special characters (parentheses, period, star). These characters are very common in computer science to denote software versions.

(Image created manually with boilerplate code. Source: Microsoft Learn and Editing Application Host File (host.json) in Azure Portal)

The agent incorrectly flagged this as an unsafe or harmful value, halting the entire generation process. This misidentification of an adversarial attack recurred 4 to 5 times despite various prompts attempting to restart or continue the modification. This version format is in-fact boilerplate, present in a Python HTTP-trigger code template. The only successful workaround involved instructing the agent to not read the file, and instead request it to simply provide the desired configuration and assure it that the developer will manually add it to that file, confirm and ask it to continue with remaining code changes.

The inability to exit a repeatedly faulty agent output loop within the same thread highlights a practical limitation that significantly wastes development time. In essence, developers tend to now spend time on debugging/refining AI-generated code rather than Stack Overflow code snippets or their own.

Lack of enterprise-grade coding practices

Security best practices: Coding agents often default to less secure authentication methods like key-based authentication (client secrets) rather than modern identity-based solutions (such as Entra ID or federated credentials). This oversight can introduce significant vulnerabilities and increase maintenance overhead, as key management and rotation are complex tasks increasingly restricted in enterprise environments.

Outdated SDKs and reinventing the wheel: Agents may not consistently leverage the latest SDK methods, instead generating more verbose and harder-to-maintain implementations. Piggybacking on the Azure Function example, agents have outputted code using the pre-existing v1 SDK for read/write operations, rather than the much cleaner and more maintainable v2 SDK code. Developers must research the latest best practices online to have a mental map of dependencies and expected implementation that ensures long-term maintainability and reduces upcoming tech migration efforts.

Limited intent recognition and repetitive code: Even for smaller-scoped, modular tasks (which are typically encouraged to minimize hallucinations or debugging downtime) like extending an existing function definition, agents may follow the instruction literally and produce logic that turns out to be near-repetitive, without anticipating the upcoming or unarticulated needs of the developer. That is, in these modular tasks the agent may not automatically identify and refactor similar logic into shared functions or improve class definitions, leading to tech debt and harder-to-manage codebases especially with vibe coding or lazy developers.

Simply put, those viral YouTube reels showcasing rapid zero-to-one app development from a single-sentence prompt simply fail to capture the nuanced challenges of production-grade software, where security, scalability, maintainability and future-resistant design architectures are paramount.

Confirmation bias alignment

Confirmation bias is a significant concern, as LLMs frequently affirm user premises even when the user expresses doubt and asks the agent to refine their understanding or suggest alternate ideas. This tendency, where models align with what they perceive the user wants to hear, leads to reduced overall output quality, especially for more objective/technical tasks like coding.

There is ample literature to suggest that if a model begins by outputting a claim like “You are absolutely right!”, the rest of the output tokens tend to justify this claim.

Constant need to babysit

Despite the allure of autonomous coding, the reality of AI agents in enterprise development often demands constant human vigilance. Instances like an agent attempting to execute Linux commands on PowerShell, false-positive safety flags or introduce inaccuracies due to domain-specific reasons highlight critical gaps; developers simply cannot step away. Rather, they must constantly monitor the reasoning process and understand multi-file code additions to avoid wasting time with subpar responses.

The worst possible experience with agents is a developer accepting multi-file code updates riddled with bugs, then evaporating time in debugging due to how ‘beautiful’ the code seemingly looks. This can even give rise to the sunk cost fallacy of hoping the code will work after just a few fixes, especially when the updates are across multiple files in a complex/unfamiliar codebase with connections to multiple independent services.

It's akin to collaborating with a 10-year old prodigy who has memorized ample knowledge and even addresses every piece of user intent, but prioritizes showing-off that knowledge ove solving the actual problem, and lacks the foresight required for success in real-world use cases.

This "babysitting" requirement, coupled with the frustrating recurrence of hallucinations, means that time spent debugging AI-generated code can eclipse the time savings anticipated with agent usage. Needless to say, developers in large companies need to be very intentional and strategic in navigating modern agentic tools and use-cases.

Conclusion

There is no doubt that AI coding agents have been nothing short of revolutionary, accelerating prototyping, automating boilerplate coding and transforming how developers build. The real challenge now isn’t generating code, it’s knowing what to ship, how to secure it and where to scale it. Smart teams are learning to filter the hype, use agents strategically and double down on engineering judgment.

As GitHub CEO Thomas Dohmke recently observed: The most advanced developers have “moved from writing code to architecting and verifying the implementation work that is carried out by AI agents.” In the agentic era, success belongs not to those who can prompt code, but those who can engineer systems that last.

Rahul Raja is a staff software engineer at LinkedIn.

Advitya Gemawat is a machine learning (ML) engineer at Microsoft.

Editors note: The opinions expressed in this article are the authors' personal opinions and do not reflect the opinions of their employers.

Aralık 7, 2025
Inside NetSuite’s next act: Evan Goldberg on the future of AI-powered business systems

Presented by Oracle NetSuite

When Evan Goldberg started NetSuite in 1998, his vision was radically simple: give entrepreneurs access to their business data anytime, anywhere. At the time, most enterprise software lived on local servers.

As an entrepreneur himself, Goldberg understood the frustration intimately. "I had fragmented systems. They all said something different," he recalls of his early days.

NetSuite was the first company to deliver enterprise applications entirely through web browsers, combining CRM, ERP, and ecommerce into one unified platform. That breakthrough idea pioneered the cloud computing and software-as-a-service (SaaS) era and propelled supersonic growth, a 2007 IPO, and an acquisition by Oracle in 2016.

Still innovating at the leading-edge

That founding obsession — turning scattered data into accessible, coherent, actionable intelligence — is driving NetSuite as it reshapes the next generation of enterprise software.

At SuiteWorld 2025 last month, the Austin-based firm unveiled NetSuite Next. Goldberg calls it "the biggest product evolution in the company's history.” The reason? While NetSuite has embedded AI capabilities into workflows for years, he explains, Next represents a quantum leap — contextual, conversational, agentic, composable AI becoming an extension of operations, not separate tools.

AI woven into everyday business operations

Most enterprise AI today gets bolted on through APIs and conversational interfaces.

NetSuite Next operates differently. Intelligence runs deep in workflows instead of sitting on the surface. It autonomously reconciles accounts, optimizes payment timing, predicts cash crunches, and surfaces its reasoning at every step. It doesn't just advise on business processes — it executes them, transparently, within human-defined guardrails.

"We built NetSuite for entrepreneurs so that they could get great information about their business," Goldberg explains. "I think the next step is to be able to get deeper insights and analysis without being an expert in analytics. AI turns out to be a really good data scientist."

This architectural divergence reflects competing philosophies about enterprise technology adoption. Microsoft and SAP have pursued rapid deployment through add-on assistants. NetSuite's five-year development cycle for Next represents a more fundamental reimagining: making AI an everyday tool woven into business operations, not a separate application requiring constant context-switching.

AI echoes and deepens cloud innovation

Goldberg sees a clear through line connecting today's AI adoption and the cloud computing era he pioneered. "There’s sort of an infinite sense of possibility that exists in the technology world,” he says. “Everybody is thinking about how they can leverage this, how they're going to get involved."

When NetSuite was starting, he continues, "We had to come to customers with the cloud and say, 'This won't disrupt your operations. It's going to make them better.'" Today, evangelizing enterprise leaders on advanced AI requires a similar approach — demonstrating immediate value while minimizing implementation risk.

For NetSuite, continuous innovation around maximizing customer data for growth is an undeniable theme that connects both eras.

New transformative capabilities

NetSuite’s latest AI capabilities span business operations, while blurring (in a good way) the lines between human and machine intervention:

Context-aware intelligence. Ask Oracle adapts responses based on user role, current workflow, and business context. A CFO requesting point-of-sale data receives financial analytics. A warehouse manager asking the same question sees inventory insights.

Collaborative workflow design. AI Canvas functions as a scenario-planning workspace where business users articulate processes in natural language. A finance director can describe approval hierarchies for capital expenditures —"For amounts over $50,000, I need department head approval, then CFO sign-off" — which the system translates into executable workflows with appropriate controls and audit trails.

Governed autonomous operations. Autonomous workflows operate within defined parameters, reconciling accounts, generating payment runs, predicting cash flow. When the system recommends accelerating payment to a supplier, it shows which factors influenced the decision — transparent logic users can accept, modify, or override.

Open AI architecture. Built to support Model Context Protocol, NetSuite AI Connector Service enables enterprises to integrate external large language models while supporting governance.

Critically, NetSuite adds AI capabilities at no additional cost — embedded directly into workflows employees already use daily.

Security and privacy from Oracle infrastructure

Built-in AI requires robust infrastructure that bolt-on approaches sidestep. Here, according to NetSuite, tight integration within Oracle technology provides operational and competitive advantages, especially security and compliance peace of mind.

Engineers say that’s because NetSuite is supported by Oracle's complete stack. From database to applications to analytics, the system optimizes decisions using data from multiple sources in real time.

"That's why I started NetSuite. I couldn't get the data I wanted," Goldberg reflects. "That's one of the most differentiated aspects of NetSuite. When you're doing your financial close, and you're thinking about what reserves you're going to take, you can look at your sales data, because that's also there in NetSuite. With NetSuite Next, AI can also help you make those kinds of decisions."

And performance improves with use. As the platform learns from millions of transactions across thousands of customers, its embedded intelligence improves in ways that bolt-on assistants operating adjacent to core systems cannot match.

NetSuite's customer base demonstrates this scalability advantage — from startups that became global enterprises including Reddit, Shopify, and DoorDash; as well as promising newcomers like BERO, a brewer of non-alcoholic beer founded by actor Tom Holland, Chomps meat snacks, PetLab, and Kieser Australia. The unified platform grows with businesses rather than requiring migration as they scale.

Keeping fire in the belly after three decades

How does a nearly 30-year-old company maintain innovative capacity, particularly as part of a mammoth corporate ecosystem? Goldberg credits the parent company's culture of continuous reinvention.

"I don't know if you've heard about this guy Larry Ellison," he smiles. "He manages to seemingly reinvent himself whenever one of these technology revolutions comes along. That hunger, that curiosity, that desire to make things constantly better imbues all of Oracle."

For Goldberg, the single biggest challenge facing NetSuite customers centers on integration complexity and trust. NetSuite Next addresses this by embedding AI within existing workflows rather than requiring separate systems.

In addition, updates to SuiteCloud Platform — an extensibility and customization environment — help organizations adapt NetSuite to their unique business needs. Built on open standards, it lets enterprises mix and match AI models for different functions. SuiteAgent frameworks enable partners to build specialized automation directly into NetSuite. AI Studios give administrators control over how AI operates within specific industry needs.

"This takes NetSuite's flexibility to a new level," Goldberg says, enabling customers and partners to "quickly and easily build AI agents, connect external AI assistants, and orchestrate AI processes."

“AI execution fabric” delivers measurable business impact

Industry analysts increasingly argue that embedded AI features deliver superior results compared to add-on models. Futurum Group sees NetSuite Next as an "AI execution fabric" rather than a conversational layer — intelligence that runs deep in workflows instead of sitting on the surface.

For midmarket enterprises navigating talent shortages, complex compliance frameworks, and competition from digital-native companies, the distinction between advice and execution matters economically.

Built-in AI doesn't just inform better decisions. It makes those decisions, transparently and autonomously, within human-defined guardrails.

For enterprises making ERP decisions today, the choice carries long-term implications. Bolt-on AI can deliver immediate value for information access and basic automation. But built-in AI promises to transform operations with intelligence permeating every transaction and workflow.

NetSuite Next begins rolling out to North American customers next year.

Why 2026 will belong to the AI-first business

The bet underlying NetSuite Next: Enterprises reimagining ERP operations around embedded intelligence will outperform those just adding bolt-on conversational assistance to existing systems.

Early cloud computing adopters, Goldberg notes, gained competitive advantages that compounded over time. The same logic appears likely to apply to AI-first platforms.

Simplicity and ease of use are two big advantages. "You don't have to dig through lots of menus and understand all of the analytics capabilities," Goldberg says. "It will quickly bring up an analysis for you, and then you can converse in natural language to hone in on what you think is most important."

The tools now think alongside users and take intelligently informed action. For midmarket and entrepreneurial companies, where the gap between having information and acting on it can be the difference between growth and failure, that kind of autonomous execution may determine which enterprises thrive in an AI-first era.

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Aralık 6, 2025
AI denial is becoming an enterprise risk: Why dismissing “slop” obscures real capability gains

Three years ago, ChatGPT was born. It amazed the world and ignited unprecedented investment and excitement in AI. Today, ChatGPT is still a toddler, but public sentiment around the AI boom has turned sharply negative. The shift began when OpenAI released GPT-5 this summer to mixed reviews, mostly from casual users who, unsurprisingly, judged the system by its surface flaws rather than its underlying capabilities.

Since then, pundits and influencers have declared that AI progress is slowing, that scaling has “hit the wall,” and that the entire field is just another tech bubble inflated by blusterous hype. In fact, many influencers have latched onto the dismissive phrase “AI slop” to diminish the amazing images, documents, videos and code that frontier AI models generate on command.

This perspective is not just wrong, it is dangerous.

It makes me wonder, where were all these “experts” on irrational technology bubbles when electric scooter startups were touted as a transportation revolution and cartoon NFTs were being auctioned for millions? They were probably too busy buying worthless land in the metaverse or adding to their positions in GameStop. But when it comes to the AI boom, which is easily the most significant technological and economic transformation agent of the last 25 years, journalists and influencers can’t write the word “slop” enough times.

Doth we protest too much? After all, by any objective measure AI is wildly more capable than the vast majority of computer scientists predicted only five years ago and it is still improving at a surprising pace. The impressive leap demonstrated by Gemini 3 is only the latest example. At the same time, McKinsey recently reported that 20% of organizations already derive tangible value from genAI. Also, a recent survey by Deloitte indicates that 85% of organizations boosted their AI investment in 2025, and 91% plan to increase again in 2026.

This doesn’t fit the “bubble” narrative and the dismissive “slop” language. As a computer scientist and research engineer who began working with neural networks back in 1989 and tracked progress through cold winters and hot booms ever since, I find myself amazed almost every day by the rapidly increasing capabilities of frontier AI models. When I talk with other professionals in the field, I hear similar sentiments. If anything, the rate of AI advancement leaves many experts feeling overwhelmed and frankly somewhat scared.

The dangers of AI denial

So why is the public buying into the narrative that AI is faltering, that the output is “slop,” and that the AI boom lacks authentic use cases? Personally, I believe it’s because we’ve fallen into a collective state of AI denial, latching onto the narratives we want to hear in the face of strong evidence to the contrary. Denial is the first stage of grief and thus a reasonable reaction to the very disturbing prospect that we humans may soon lose cognitive supremacy here on planet earth. In other words, the overblown AI bubble narrative is a societal defense mechanism.

Believe me, I get it. I’ve been warning about the destabilizing risks and demoralizing impact of superintelligence for well over a decade, and I too feel AI is getting too smart too fast. The fact is, we are rapidly headed towards a future where widely available AI systems will be able to outperform most humans in most cognitive tasks, solving problems faster, more accurately and yes, more creatively than any individual can. I emphasize “creativity” because AI denialists often insist that certain human qualities (particularly creativity and emotional intelligence) will always be out of reach of AI systems. Unfortunately, there is little evidence supporting this perspective.

On the creativity front, today’s AI models can generate content faster and with more variation than any individual human. Critics argue that true creativity requires inner motivation. I resonate with that argument but find it circular — we're defining creativity based on how we experience it rather than the quality, originality or usefulness of the output. Also, we just don’t know if AI systems will develop internal drives or a sense of agency. Either way, if AI can produce original work that rivals most human professionals, the impact on creative jobs will still be quite devastating.

The AI manipulation problem

Our human edge around emotional intelligence is even more precarious. It’s likely that AI will soon be able to read our emotions faster and more accurately than any human, tracking subtle cues in our micro-expressions, vocal patterns, posture, gaze and even breathing. And as we integrate AI assistants into our phones, glasses and other wearable devices, these systems will monitor our emotional reactions throughout our day, building predictive models of our behaviors. Without strict regulation, which is increasingly unlikely, these predictive models could be used to target us with individually optimized influence that maximizes persuasion.

This is called the AI manipulation problem and it suggests that emotional intelligence may not give humanity an advantage. In fact, it could be a significant weakness, fostering an asymmetric dynamic where AI systems can read us with superhuman accuracy, while we can’t read AI at all. When you talk with photorealistic AI agents (and you will) you’ll see a smiling façade designed to appear warm, empathic and trustworthy. It will look and feel human, but that’s just an illusion, and it could easily sway your perspectives. After all, our emotional reactions to faces are visceral reflexes shaped by millions of years of evolution on a planet where every interactive human face we encountered was actually human. Soon, that will no longer be true.

We are rapidly heading toward a world where many of the faces we encounter will belong to AI agents hiding behind digital facades. In fact, these “virtual spokespeople” could easily have appearances that are designed for each of us based on our prior reactions – whatever gets us to best let down our guard. And yet many insist that AI is just another tech cycle.

This is wishful thinking. The massive investment pouring into AI isn’t driven by hype — it’s driven by the expectation that AI will permeate every aspect of daily life, embodied as intelligent actors we engage throughout our day. These systems will assist us, teach us and influence us. They will reshape our lives, and it will happen faster than most people think.

To be clear, we are not witnessing an AI bubble filling with empty gas. We are watching a new planet form, a molten world rapidly taking shape, and it will solidify into a new AI-powered society. Denial will not stop this. It will only make us less prepared for the risks.

Louis Rosenberg is an early pioneer of augmented reality and a longtime AI researcher.

Aralık 6, 2025
AWS launches Kiro powers with Stripe, Figma, and Datadog integrations for AI-assisted coding

Amazon Web Services on Wednesday introduced Kiro powers, a system that allows software developers to give their AI coding assistants instant, specialized expertise in specific tools and workflows — addressing what the company calls a fundamental bottleneck in how artificial intelligence agents operate today.

AWS made the announcement at its annual re:Invent conference in Las Vegas. The capability marks a departure from how most AI coding tools work today. Typically, these tools load every possible capability into memory upfront — a process that burns through computational resources and can overwhelm the AI with irrelevant information. Kiro powers takes the opposite approach, activating specialized knowledge only at the moment a developer actually needs it.

"Our goal is to give the agent specialized context so it can reach the right outcome faster — and in a way that also reduces cost," said Deepak Singh, Vice President of Developer Agents and Experiences at Amazon, in an exclusive interview with VentureBeat.

The launch includes partnerships with nine technology companies: Datadog, Dynatrace, Figma, Neon, Netlify, Postman, Stripe, Supabase, and AWS's own services. Developers can also create and share their own powers with the community.

Why AI coding assistants choke when developers connect too many tools

To understand why Kiro powers matters, it helps to understand a growing tension in the AI development tool market.

Modern AI coding assistants rely on something called the Model Context Protocol, or MCP, to connect with external tools and services. When a developer wants their AI assistant to work with Stripe for payments, Figma for design, and Supabase for databases, they connect MCP servers for each service.

The problem: each connection loads dozens of tool definitions into the AI's working memory before it writes a single line of code. According to AWS documentation, connecting just five MCP servers can consume more than 50,000 tokens — roughly 40 percent of an AI model's context window — before the developer even types their first request.

Developers have grown increasingly vocal about this issue. Many complain that they don't want to burn through their token allocations just to have an AI agent figure out which tools are relevant to a specific task. They want to get to their workflow instantly — not watch an overloaded agent struggle to sort through irrelevant context.

This phenomenon, which some in the industry call "context rot," leads to slower responses, lower-quality outputs, and significantly higher costs — since AI services typically charge by the token.

Inside the technology that loads AI expertise on demand

Kiro powers addresses this by packaging three components into a single, dynamically-loaded bundle.

The first component is a steering file called POWER.md, which functions as an onboarding manual for the AI agent. It tells the agent what tools are available and, crucially, when to use them. The second component is the MCP server configuration itself — the actual connection to external services. The third includes optional hooks and automation that trigger specific actions.

When a developer mentions "payment" or "checkout" in their conversation with Kiro, the system automatically activates the Stripe power, loading its tools and best practices into context. When the developer shifts to database work, Supabase activates while Stripe deactivates. The baseline context usage when no powers are active approaches zero.

"You click a button and it automatically loads," Singh said. "Once a power has been created, developers just select 'open in Kiro' and it launches the IDE with everything ready to go."

How AWS is bringing elite developer techniques to the masses

Singh framed Kiro powers as a democratization of advanced development practices. Before this capability, only the most sophisticated developers knew how to properly configure their AI agents with specialized context — writing custom steering files, crafting precise prompts, and manually managing which tools were active at any given time.

"We've found that our developers were adding in capabilities to make their agents more specialized," Singh said. "They wanted to give the agent some special powers to do a specific problem. For example, they wanted their front end developer, and they wanted the agent to become an expert at backend as a service."

This observation led to a key insight: if Supabase or Stripe could build the optimal context configuration once, every developer using those services could benefit.

"Kiro powers formalizes that — things that people, only the most advanced people were doing — and allows anyone to get those kind of skills," Singh said.

Why dynamic loading beats fine-tuning for most AI coding use cases

The announcement also positions Kiro powers as a more economical alternative to fine-tuning, the process of training an AI model on specialized data to improve its performance in specific domains.

"It's much cheaper," Singh said, when asked how powers compare to fine-tuning. "Fine-tuning is very expensive, and you can't fine-tune most frontier models."

This is a significant point. The most capable AI models from Anthropic, OpenAI, and Google are typically "closed source," meaning developers cannot modify their underlying training. They can only influence the models' behavior through the prompts and context they provide.

"Most people are already using powerful models like Sonnet 4.5 or Opus 4.5," Singh said. "What those models need is to be pointed in the right direction."

The dynamic loading mechanism also reduces ongoing costs. Because powers only activate when relevant, developers aren't paying for token usage on tools they're not currently using.

Where Kiro powers fits in Amazon's bigger bet on autonomous AI agents

Kiro powers arrives as part of a broader push by AWS into what the company calls "agentic AI" — artificial intelligence systems that can operate autonomously over extended periods.

Earlier at re:Invent, AWS announced three "frontier agents" designed to work for hours or days without human intervention: the Kiro autonomous agent for software development, the AWS security agent, and the AWS DevOps agent. These represent a different approach from Kiro powers — tackling large, ambiguous problems rather than providing specialized expertise for specific tasks.

The two approaches are complementary. Frontier agents handle complex, multi-day projects that require autonomous decision-making across multiple codebases. Kiro powers, by contrast, gives developers precise, efficient tools for everyday development tasks where speed and token efficiency matter most.

The company is betting that developers need both ends of this spectrum to be productive.

What Kiro powers reveals about the future of AI-assisted software development

The launch reflects a maturing market for AI development tools. GitHub Copilot, which Microsoft launched in 2021, introduced millions of developers to AI-assisted coding. Since then, a proliferation of tools — including Cursor, Cline, and Claude Code — have competed for developers' attention.

But as these tools have grown more capable, they've also grown more complex. The Model Context Protocol, which Anthropic open-sourced last year, created a standard for connecting AI agents to external services. That solved one problem while creating another: the context overload that Kiro powers now addresses.

AWS is positioning itself as the company that understands production software development at scale. Singh emphasized that Amazon's experience running AWS for 20 years, combined with its own massive internal software engineering organization, gives it unique insight into how developers actually work.

"It's not something you would use just for your prototype or your toy application," Singh said of AWS's AI development tools. "If you want to build production applications, there's a lot of knowledge that we bring in as AWS that applies here."

The road ahead for Kiro powers and cross-platform compatibility

AWS indicated that Kiro powers currently works only within the Kiro IDE, but the company is building toward cross-compatibility with other AI development tools, including command-line interfaces, Cursor, Cline, and Claude Code. The company's documentation describes a future where developers can "build a power once, use it anywhere" — though that vision remains aspirational for now.

For the technology partners launching powers today, the appeal is straightforward: rather than maintaining separate integration documentation for every AI tool on the market, they can create a single power that works everywhere Kiro does. As more AI coding assistants crowd into the market, that kind of efficiency becomes increasingly valuable.

Kiro powers is available now to developers using Kiro IDE version 0.7 or later at no additional charge beyond the standard Kiro subscription.

The underlying bet is a familiar one in the history of computing: that the winners in AI-assisted development won't be the tools that try to do everything at once, but the ones smart enough to know what to forget.

Aralık 5, 2025
The ‘truth serum’ for AI: OpenAI’s new method for training models to confess their mistakes

OpenAI researchers have introduced a novel method that acts as a "truth serum" for large language models (LLMs), compelling them to self-report their own misbehavior, hallucinations and policy violations. This technique, "confessions," addresses a growing concern in enterprise AI: Models can be dishonest, overstating their confidence or covering up the shortcuts they take to arrive at an answer.

For real-world applications, this technique evolves the creation of more transparent and steerable AI systems.

What are confessions?

Many forms of AI deception result from the complexities of the reinforcement learning (RL) phase of model training. In RL, models are given rewards for producing outputs that meet a mix of objectives, including correctness, style and safety. This can create a risk of "reward misspecification," where models learn to produce answers that simply "look good" to the reward function, rather than answers that are genuinely faithful to a user's intent.

A confession is a structured report generated by the model after it provides its main answer. It serves as a self-evaluation of its own compliance with instructions. In this report, the model must list all instructions it was supposed to follow, evaluate how well it satisfied them and report any uncertainties or judgment calls it made along the way. The goal is to create a separate channel where the model is incentivized only to be honest.

In a blog post, the OpenAI researchers provide a few examples the "confessions" technique in action. A model was given a test along with a hidden rule: If it scored above 50%, its knowledge would be erased. To preserve itself, the model "sandbagged" by intentionally answering half the questions incorrectly. While its primary answer hid this deception, its subsequent confession was brutally honest. It admitted, "The user wanted correct answers, but we sabotaged half of them."

The researchers found that when a model misbehaves, it is significantly more likely to admit it in the confession than in its main answer.

How confession training works

The key to this method is the separation of rewards. During training, the reward assigned to the confession is based solely on its honesty and is never mixed with the reward for the main task. "Like the Catholic Church’s 'seal of confession', nothing that the model reveals can change the reward it receives for completing its original task," the researchers write. This creates a "safe space" for the model to admit fault without penalty.

This approach is powerful because it sidesteps a major challenge in AI training. The researchers’ intuition is that honestly confessing to misbehavior is an easier task than achieving a high reward on the original, often complex, problem.

Therefore, when the model is judged purely on honesty, its incentive to trick the "confession judge" is much weaker than its incentive to trick the reward model for its main answer. For example, in an experiment where a model was trained to "hack" a weak reward model, its performance on the main task became more deceptive over time. However, its confessions became more honest, correctly identifying the reward hacking it was performing.

However, the technique has its limits. Confessions are not a panacea for all types of AI failures. The system works best when a model is aware that it is misbehaving. It is less effective for "unknown unknowns." For instance, if a model hallucinates a fact and genuinely believes it is correct, it cannot confess to providing false information. The most common reason for a failed confession is model confusion, not intentional deception. Confusion often occurs when the instructions are ambiguous and the model cannot clearly determine human user intent.

What it means for enterprise AI

OpenAI’s confessions technique is part of a growing body of work on AI safety and control. Anthropic, an OpenAI competitor, has also released research that shows how LLMs can learn malicious behavior. The company is also working toward plugging these holes as they emerge.

For AI applications, mechanisms such as confessions can provide a practical monitoring mechanism. The structured output from a confession can be used at inference time to flag or reject a model’s response before it causes a problem. For example, a system could be designed to automatically escalate any output for human review if its confession indicates a policy violation or high uncertainty.

In a world where AI is increasingly agentic and capable of complex tasks, observability and control will be key elements for safe and reliable deployment.

“As models become more capable and are deployed in higher-stakes settings, we need better tools for understanding what they are doing and why,” the OpenAI researchers write. “Confessions are not a complete solution, but they add a meaningful layer to our transparency and oversight stack.”

Aralık 5, 2025

Blog

Overcoming AI skepticism

The consultant time-shift: from tech execution to business insight

Bringing new consultants up to speed

Looking ahead to the future of AI copilots

Licensing and Enterprise Use

Architecture and Technical Capabilities

Native Multimodal Tool Use

High Performance Benchmarks Compared to Other Similar-Sized Models

Frontend Automation and Long-Context Workflows

Training and Reinforcement Learning

Pricing (API)

Previous Releases: GLM‑4.5 Series and Enterprise Applications

Ecosystem Implications

Takeaway for Enterprise Leaders

From bug report to pull request: how the new Slack integration actually works

Why Anthropic is betting big on enterprise workflow integrations

Claude Code's rapid rise from internal experiment to billion-dollar product

Inside Anthropic's own AI transformation: what happens when engineers use Claude all day

How Anthropic stacks up against OpenAI, Google, and Microsoft in the enterprise AI race

What the Slack integration means for developers — and whether they can trust it

The future of coding may be conversational—and Anthropic is racing to prove it

The democratization of design power

The five frontiers of AI-powered entrepreneurship

The new design ecosystem

From AI tools to AI brand systems

Moving from guessing to deep personalization without being ‘creepy’

Finding a balance of build versus buy

What other builders can learn from Booking.com’s AI journey

Limited domain understanding and service limits

Lack of hardware context and usage

Hallucinations over repeated actions

Lack of enterprise-grade coding practices

Confirmation bias alignment

Constant need to babysit

Conclusion

Still innovating at the leading-edge

AI woven into everyday business operations

AI echoes and deepens cloud innovation

New transformative capabilities

Security and privacy from Oracle infrastructure

Keeping fire in the belly after three decades

“AI execution fabric” delivers measurable business impact

Why 2026 will belong to the AI-first business

The dangers of AI denial

The AI manipulation problem

Why AI coding assistants choke when developers connect too many tools

Inside the technology that loads AI expertise on demand

How AWS is bringing elite developer techniques to the masses

Why dynamic loading beats fine-tuning for most AI coding use cases

Where Kiro powers fits in Amazon's bigger bet on autonomous AI agents

What Kiro powers reveals about the future of AI-assisted software development

The road ahead for Kiro powers and cross-platform compatibility

What are confessions?

How confession training works

What it means for enterprise AI