Kategori: Uncategorized

  • Simplifying the AI stack: The key to scalable, portable intelligence from cloud to edge

    Presented by Arm


    A simpler software stack is the key to portable, scalable AI across cloud and edge.

    AI is now powering real-world applications, yet fragmented software stacks are holding it back. Developers routinely rebuild the same models for different hardware targets, losing time to glue code instead of shipping features. The good news is that a shift is underway. Unified toolchains and optimized libraries are making it possible to deploy models across platforms without compromising performance.

    Yet one critical hurdle remains: software complexity. Disparate tools, hardware-specific optimizations, and layered tech stacks continue to bottleneck progress. To unlock the next wave of AI innovation, the industry must pivot decisively away from siloed development and toward streamlined, end-to-end platforms.

    This transformation is already taking shape. Major cloud providers, edge platform vendors, and open-source communities are converging on unified toolchains that simplify development and accelerate deployment, from cloud to edge. In this article, we’ll explore why simplification is the key to scalable AI, what’s driving this momentum, and how next-gen platforms are turning that vision into real-world results.

    The bottleneck: fragmentation, complexity, and inefficiency

    The issue isn’t just hardware variety; it’s duplicated effort across frameworks and targets that slows time-to-value.

    Diverse hardware targets: GPUs, NPUs, CPU-only devices, mobile SoCs, and custom accelerators.

    Tooling and framework fragmentation: TensorFlow, PyTorch, ONNX, MediaPipe, and others.

    Edge constraints: Devices require real-time, energy-efficient performance with minimal overhead.

    According to Gartner Research, these mismatches create a key hurdle: over 60% of AI initiatives stall before production, driven by integration complexity and performance variability.

    What software simplification looks like

    Simplification is coalescing around five moves that cut re-engineering cost and risk:

    Cross-platform abstraction layers that minimize re-engineering when porting models.

    Performance-tuned libraries integrated into major ML frameworks.

    Unified architectural designs that scale from datacenter to mobile.

    Open standards and runtimes (e.g., ONNX, MLIR) reducing lock-in and improving compatibility.

    Developer-first ecosystems emphasizing speed, reproducibility, and scalability.

    These shifts are making AI more accessible, especially for startups and academic teams that previously lacked the resources for bespoke optimization. Projects like Hugging Face’s Optimum and MLPerf benchmarks are also helping standardize and validate cross-hardware performance.

    Ecosystem momentum and real-world signals Simplification is no longer aspirational; it’s happening now. Across the industry, software considerations are influencing decisions at the IP and silicon design level, resulting in solutions that are production-ready from day one. Major ecosystem players are driving this shift by aligning hardware and software development efforts, delivering tighter integration across the stack.

    A key catalyst is the rapid rise of edge inference, where AI models are deployed directly on devices rather than in the cloud. This has intensified demand for streamlined software stacks that support end-to-end optimization, from silicon to system to application. Companies like Arm are responding by enabling tighter coupling between their compute platforms and software toolchains, helping developers accelerate time-to-deployment without sacrificing performance or portability. The emergence of multi-modal and general-purpose foundation models (e.g., LLaMA, Gemini, Claude) has also added urgency. These models require flexible runtimes that can scale across cloud and edge environments. AI agents, which interact, adapt, and perform tasks autonomously, further drive the need for high-efficiency, cross-platform software.

    MLPerf Inference v3.1 included over 13,500 performance results from 26 submitters, validating multi-platform benchmarking of AI workloads. Results spanned both data center and edge devices, demonstrating the diversity of optimized deployments now being tested and shared.

    Taken together, these signals make clear that the market’s demand and incentives are aligning around a common set of priorities, including maximizing performance-per-watt, ensuring portability, minimizing latency, and delivering security and consistency at scale.

    What must happen for successful simplification

    To realize the promise of simplified AI platforms, several things must occur:

    Strong hardware/software co-design: hardware features that are exposed in software frameworks (e.g., matrix multipliers, accelerator instructions), and conversely, software that is designed to take advantage of underlying hardware.

    Consistent, robust toolchains and libraries: developers need reliable, well-documented libraries that work across devices. Performance portability is only useful if the tools are stable and well supported.

    Open ecosystem: hardware vendors, software framework maintainers, and model developers need to cooperate. Standards and shared projects help avoid re-inventing the wheel for every new device or use case.

    Abstractions that don’t obscure performance: while high-level abstraction helps developers, they must still allow tuning or visibility where needed. The right balance between abstraction and control is key.

    Security, privacy, and trust built in: especially as more compute shifts to devices (edge/mobile), issues like data protection, safe execution, model integrity, and privacy matter.

    Arm as one example of ecosystem-led simplification

    Simplifying AI at scale now hinges on system-wide design, where silicon, software, and developer tools evolve in lockstep. This approach enables AI workloads to run efficiently across diverse environments, from cloud inference clusters to battery-constrained edge devices. It also reduces the overhead of bespoke optimization, making it easier to bring new products to market faster. Arm (Nasdaq:Arm) is advancing this model with a platform-centric focus that pushes hardware-software optimizations up through the software stack. At COMPUTEX 2025, Arm demonstrated how its latest Arm9 CPUs, combined with AI-specific ISA extensions and the Kleidi libraries, enable tighter integration with widely used frameworks like PyTorch, ExecuTorch, ONNX Runtime, and MediaPipe. This alignment reduces the need for custom kernels or hand-tuned operators, allowing developers to unlock hardware performance without abandoning familiar toolchains.

    The real-world implications are significant. In the data center, Arm-based platforms are delivering improved performance-per-watt, critical for scaling AI workloads sustainably. On consumer devices, these optimizations enable ultra-responsive user experiences and background intelligence that’s always on, yet power efficient.

    More broadly, the industry is coalescing around simplification as a design imperative, embedding AI support directly into hardware roadmaps, optimizing for software portability, and standardizing support for mainstream AI runtimes. Arm’s approach illustrates how deep integration across the compute stack can make scalable AI a practical reality.

    Market validation and momentum

    In 2025, nearly half of the compute shipped to major hyperscalers will run on Arm-based architectures, a milestone that underscores a significant shift in cloud infrastructure. As AI workloads become more resource-intensive, cloud providers are prioritizing architectures that deliver superior performance-per-watt and support seamless software portability. This evolution marks a strategic pivot toward energy-efficient, scalable infrastructure optimized for the performance and demands of modern AI.

    At the edge, Arm-compatible inference engines are enabling real-time experiences, such as live translation and always-on voice assistants, on battery-powered devices. These advancements bring powerful AI capabilities directly to users, without sacrificing energy efficiency.

    Developer momentum is accelerating as well. In a recent collaboration, GitHub and Arm introduced native Arm Linux and Windows runners for GitHub Actions, streamlining CI workflows for Arm-based platforms. These tools lower the barrier to entry for developers and enable more efficient, cross-platform development at scale.

    What comes next

    Simplification doesn’t mean removing complexity entirely; it means managing it in ways that empower innovation. As the AI stack stabilizes, winners will be those who deliver seamless performance across a fragmented landscape.

    From a future-facing perspective, expect:

    Benchmarks as guardrails: MLPerf + OSS suites guide where to optimize next.

    More upstream, fewer forks: Hardware features land in mainstream tools, not custom branches.

    Convergence of research + production: Faster handoff from papers to product via shared runtimes.

    Conclusion

    AI’s next phase isn’t about exotic hardware; it’s also about software that travels well. When the same model lands efficiently on cloud, client, and edge, teams ship faster and spend less time rebuilding the stack.

    Ecosystem-wide simplification, not brand-led slogans, will separate the winners. The practical playbook is clear: unify platforms, upstream optimizations, and measure with open benchmarks. Explore how Arm AI software platforms are enabling this future — efficiently, securely, and at scale.


    Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

  • Qwen’s new Deep Research update lets you turn its reports into webpages, podcasts in seconds

    Chinese e-commerce giant Alibaba’s famously prolific Qwen Team of AI model researchers and engineers has introduced a major expansion to its Qwen Deep Research tool, which is available as an optional modality the user can activate on the web-based Qwen Chat (a competitor to ChatGPT).

    The update lets users generate not only comprehensive research reports with well-organized citations, but also interactive web pages and multi-speaker podcasts — all within 1-2 clicks.

    This functionality is part of a proprietary release, distinct from many of Qwen’s previous open-source model offerings.

    While the feature relies on the open-source models Qwen3-Coder, Qwen-Image, and Qwen3-TTS to power its core capabilities, the end-to-end experience — including research execution, web deployment, and audio generation — is hosted and operated by Qwen.

    This means users benefit from a managed, integrated workflow without needing to configure infrastructure. That said, developers with access to the open-source models could theoretically replicate similar functionality on private or commercial systems.

    The update was announced via the team’s official X account (@Alibaba_Qwen) today, October 21, 2025, stating:

    “Qwen Deep Research just got a major upgrade. It now creates not only the report, but also a live webpage and a podcast — powered by Qwen3-Coder, Qwen-Image, and Qwen3-TTS. Your insights, now visual and audible.”

    Multi-Format Research Output

    The core workflow begins with a user request inside the Qwen Chat interface. From there, Qwen collaborates by asking clarifying questions to shape the research scope, pulls data from the web and official sources, and analyzes or resolves any inconsistencies it finds — even generating custom code when needed.

    A demo video posted by Qwen on X walks through this process on Qwen Chat using the U.S. SaaS market as an example.

    In it, Qwen retrieves data from multiple industry sources, identifies discrepancies in market size estimates (e.g., $206 billion vs. $253 billion), and highlights ambiguities in the U.S. share of global figures. The assistant comments on differences in scope between sources and calculates a compound annual growth rate (CAGR) of 19.8% from 2020 to 2023, providing contextual analysis to back up the raw numbers.

    Once the research is complete, users can click on the "eyeball" icon below the output result (see screenshot), which will bring up a PDF-style report in the right hand pane.

    Then, when viewing the report in the right-hand pane, the user can click the "Create" button in the upper-right hand corner and select from the following two options:

    1. "Web Dev" which produces a live, professional-grade web page, automatically deployed and hosted by Qwen, using Qwen3-Coder for structure and Qwen-Image for visuals.

    2. "Podcast," which, as it states, produces an audio podcast, featuring dynamic, multi-speaker narration generated by Qwen3-TTS, also hosted by Qwen for easy sharing and playback.

    This enables users to quickly convert a single research project into multiple forms of content — written, visual, and audible — with minimal extra input.

    The website includes inline graphics generated by Qwen Image, making it suitable for use in public presentations, classrooms, or publishing.

    The podcast feature allows users to select between 17 different speaker names as the host and 7 as the co-host, though I wasn't able to find a way to preview the voice outputs before selecting them. It appears designed for deep listening on the go.

    There was no way to change the language output that I could see, so mine came out in English, like my reports and initial prompts, though the Qwen LLMs are multi-modal. The voices were slightly more robotic than other AI tools I've used.

    Here's an example of a web page I generated on commonalities in authoritarian regimes throughout history, another one on UFO or UAP sightings, and below this paragraph, a podcast on UFO or UAP sightings.

    While the website is hosted via a public link, the podcast must be downloaded by the user and can't be linked to publicly, from what I could tell in my brief usage so far.

    Note the podcast is much different than the actual report — not just a straight read-through audio version of it, rather, a new format of two hosts discussing and bantering about the subject using the report as the jumping off point.

    The web page versions of the report also include new graphics not found in the PDF report.

    Comparisons to Google's NotebookLM

    While the new capabilities have been well received by many early users, comparisons to other research assistants have surfaced — particularly Google’s NotebookLM, which recently exited beta.

    AI commentator and newsletter writer Chubby (@kimmonismus) noted on X:

    “I am really grateful that Qwen provides regular updates. That’s great.

    But the attempt to build a NotebookLM clone inside Qwen-3-max doesn’t sound very promising compared to Google’s version.”

    While NotebookLM is built around organizing and querying existing documents and web pages, Qwen Deep Research focuses more on generating new research content from scratch, aggregating sources from the open web, and presenting it across multiple modalities.

    The comparison suggests that while the two tools overlap in general concept — AI-assisted research — they diverge in approach and target user experience.

    Availability

    Qwen Deep Research is now live and available through the Qwen Chat app. The feature can be accessed with the following URL.

    No pricing details have been provided for Qwen3-Max or the specific Deep Research capabilities as of this writing.

    What's Next For Qwen Deep Research?

    By combining research guidance, data analysis, and multi-format content creation into a single tool, Qwen Deep Research aims to streamline the path from idea to publishable output.

    The integration of code, visuals, and voice makes it especially attractive to content creators, educators, and independent analysts who want to scale their research into web- or podcast-friendly forms without switching platforms.

    Still, comparisons to more specialized offerings like NotebookLM raise questions about how Qwen’s generalized approach stacks up on depth, precision, and refinement. Whether the strength of its multi-format execution outweighs those concerns may come down to user priorities — and whether they value single-click publishing over tight integration with existing notes and materials.

    For now, Qwen is signaling that research doesn’t end with a document — it begins with one.

    Let me know if you want this repackaged into something shorter or tailored to a particular audience — newsletter, press-style blog, internal team explainer, etc.

  • DeepSeek drops open-source model that compresses text 10x through images, defying conventions

    DeepSeek, the Chinese artificial intelligence research company that has repeatedly challenged assumptions about AI development costs, has released a new model that fundamentally reimagines how large language models process information—and the implications extend far beyond its modest branding as an optical character recognition tool.

    The company's DeepSeek-OCR model, released Monday with full open-source code and weights, achieves what researchers describe as a paradigm inversion: compressing text through visual representation up to 10 times more efficiently than traditional text tokens. The finding challenges a core assumption in AI development and could pave the way for language models with dramatically expanded context windows, potentially reaching tens of millions of tokens.

    "We present DeepSeek-OCR as an initial investigation into the feasibility of compressing long contexts via optical 2D mapping," the research team wrote in their technical paper. "Experiments show that when the number of text tokens is within 10 times that of vision tokens (i.e., a compression ratio < 10×), the model can achieve decoding (OCR) precision of 97%."

    The implications have resonated across the AI research community. Andrej Karpathy, co-founder of OpenAI and former director of AI at Tesla, said in a post that the work raises fundamental questions about how AI systems should process information. "Maybe it makes more sense that all inputs to LLMs should only ever be images," Karpathy wrote. "Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in."

    How DeepSeek achieved 10x compression by treating text as images

    While DeepSeek marketed the release as an OCR model — a technology for converting images of text into digital characters — the research paper reveals more ambitious goals. The model demonstrates that visual representations can serve as a superior compression medium for textual information, inverting the conventional hierarchy where text tokens were considered more efficient than vision tokens.

    "Traditionally, vision LLM tokens almost seemed like an afterthought or 'bolt on' to the LLM paradigm," wrote Jeffrey Emanuel, an AI researcher, in a detailed analysis of the paper. "And 10k words of English would take up far more space in a multimodal LLM when expressed as intelligible pixels than when expressed as tokens…But that gets inverted now from the ideas in this paper."

    The model's architecture consists of two primary components: DeepEncoder, a novel 380-million-parameter vision encoder, and a 3-billion-parameter mixture-of-experts language decoder with 570 million activated parameters. DeepEncoder combines Meta's Segment Anything Model (SAM) for local visual perception with OpenAI's CLIP model for global visual understanding, connected through a 16x compression module.

    To validate their compression claims, DeepSeek researchers tested the model on the Fox benchmark, a dataset of diverse document layouts. The results were striking: using just 100 vision tokens, the model achieved 97.3% accuracy on documents containing 700-800 text tokens — representing an effective compression ratio of 7.5x. Even at compression ratios approaching 20x, accuracy remained around 60%.

    The practical impact: Processing 200,000 pages per day on a single GPU

    The efficiency gains translate directly to production capabilities. According to the company, a single Nvidia A100-40G GPU can process more than 200,000 pages per day using DeepSeek-OCR. Scaling to a cluster of 20 servers with eight GPUs each, throughput reaches 33 million pages daily — sufficient to rapidly construct training datasets for other AI models.

    On OmniDocBench, a comprehensive document parsing benchmark, DeepSeek-OCR outperformed GOT-OCR2.0 (which uses 256 tokens per page) while using only 100 vision tokens. More dramatically, it surpassed MinerU2.0 — which requires more than 6,000 tokens per page on average — while using fewer than 800 vision tokens.

    DeepSeek designed the model to support five distinct resolution modes, each optimized for different compression ratios and use cases. The "Tiny" mode operates at 512×512 resolution with just 64 vision tokens, while "Gundam" mode combines multiple resolutions dynamically for complex documents. "Gundam mode consists of n×640×640 tiles (local views) and a 1024×1024 global view," the researchers wrote.

    Why this breakthrough could unlock 10 million token context windows

    The compression breakthrough has immediate implications for one of the most pressing challenges in AI development: expanding the context windows that determine how much information language models can actively consider. Current state-of-the-art models typically handle context windows measured in hundreds of thousands of tokens. DeepSeek's approach suggests a path to windows ten times larger.

    "The potential of getting a frontier LLM with a 10 or 20 million token context window is pretty exciting," Emanuel wrote. "You could basically cram all of a company's key internal documents into a prompt preamble and cache this with OpenAI and then just add your specific query or prompt on top of that and not have to deal with search tools and still have it be fast and cost-effective."

    The researchers explicitly frame their work in terms of context compression for language models. "Through DeepSeek-OCR, we demonstrate that vision-text compression can achieve significant token reduction (7-20×) for different historical context stages, offering a promising direction for addressing long-context challenges in large language models," they wrote.

    The paper includes a speculative but intriguing diagram illustrating how the approach could implement memory decay mechanisms similar to human cognition. Older conversation rounds could be progressively downsampled to lower resolutions, consuming fewer tokens while maintaining key information — a form of computational forgetting that mirrors biological memory.

    How visual processing could eliminate the 'ugly' tokenizer problem

    Beyond compression, Karpathy highlighted how the approach challenges fundamental assumptions about how language models should process text. Traditional tokenizers—the systems that break text into units for processing—have long been criticized for their complexity and limitations.

    "I already ranted about how much I dislike the tokenizer," Karpathy wrote. "Tokenizers are ugly, separate, not end-to-end stage. It 'imports' all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network."

    Visual processing of text could eliminate these issues while enabling new capabilities. The approach naturally handles formatting information lost in pure text representations: bold text, colors, layout, embedded images. "Input can now be processed with bidirectional attention easily and as default, not autoregressive attention – a lot more powerful," Karpathy noted.

    The implications resonate with human cognitive science. Emanuel drew a parallel to Hans Bethe, the renowned physicist who memorized vast amounts of reference data: "Having vast amounts of task-specific knowledge in your working memory is extremely useful. This seems like a very clever and additive approach to potentially expanding that memory bank by 10x or more."

    The model's training: 30 million PDF pages across 100 languages

    The model's capabilities rest on an extensive training regimen using diverse data sources. DeepSeek collected 30 million PDF pages covering approximately 100 languages, with Chinese and English accounting for 25 million pages. The training data spans nine document types — academic papers, financial reports, textbooks, newspapers, handwritten notes, and others.

    Beyond document OCR, the training incorporated what the researchers call "OCR 2.0" data: 10 million synthetic charts, 5 million chemical formulas, and 1 million geometric figures. The model also received 20% general vision data for tasks like image captioning and object detection, plus 10% text-only data to maintain language capabilities.

    The training process employed pipeline parallelism across 160 Nvidia A100-40G GPUs (20 nodes with 8 GPUs each), with the vision encoder divided between two pipeline stages and the language model split across two others. "For multimodal data, the training speed is 70B tokens/day," the researchers reported.

    Open source release accelerates research and raises competitive questions

    True to DeepSeek's pattern of open development, the company released the complete model weights, training code, and inference scripts on GitHub and Hugging Face. The GitHub repository gained over 4,000 stars within 24 hours of release, according to Dataconomy.

    The breakthrough raises questions about whether other AI labs have developed similar techniques but kept them proprietary. Emanuel speculated that Google's Gemini models, which feature large context windows and strong OCR performance, might employ comparable approaches. "For all we know, Google could have already figured out something like this, which could explain why Gemini has such a huge context size and is so good and fast at OCR tasks," Emanuel wrote.

    Google's Gemini 2.5 Pro offers a 1-million-token context window, with plans to expand to 2 million, though the company has not publicly detailed the technical approaches enabling this capability. OpenAI's GPT-5 supports 400,000 tokens, while Anthropic's Claude 4.5 offers 200,000 tokens, with a 1-million-token window available in beta for eligible organizations.

    The unanswered question: Can AI reason over compressed visual tokens?

    While the compression results are impressive, researchers acknowledge important open questions. "It's not clear how exactly this interacts with the other downstream cognitive functioning of an LLM," Emanuel noted. "Can the model reason as intelligently over those compressed visual tokens as it can using regular text tokens? Does it make the model less articulate by forcing it into a more vision-oriented modality?"

    The DeepSeek paper focuses primarily on the compression-decompression capability, measured through OCR accuracy, rather than downstream reasoning performance. This leaves open whether language models could reason effectively over large contexts represented primarily as compressed visual tokens.

    The researchers acknowledge their work represents "an initial exploration into the boundaries of vision-text compression." They note that "OCR alone is insufficient to fully validate true context optical compression" and plan future work including "digital-optical text interleaved pretraining, needle-in-a-haystack testing, and other evaluations."

    DeepSeek has established a pattern of achieving competitive results with dramatically lower computational resources than Western AI labs. The company's earlier DeepSeek-V3 model reportedly cost just $5.6 million to train—though this figure represents only the final training run and excludes R&D and infrastructure costs—compared to hundreds of millions for comparable models from OpenAI and Anthropic.

    Industry analysts have questioned the $5.6 million figure, with some estimates placing the company's total infrastructure and operational costs closer to $1.3 billion, though still lower than American competitors' spending.

    The bigger picture: Should language models process text as images?

    DeepSeek-OCR poses a fundamental question for AI development: should language models process text as text, or as images of text? The research demonstrates that, at least for compression purposes, visual representation offers significant advantages. Whether this translates to effective reasoning over vast contexts remains to be determined.

    "From another perspective, optical contexts compression still offers substantial room for research and improvement, representing a promising new direction," the researchers concluded in their paper.

    For the AI industry, the work adds another dimension to the race for longer context windows — a competition that has intensified as language models are applied to increasingly complex tasks requiring vast amounts of information. The open-source release ensures the technique will be widely explored, tested, and potentially integrated into future AI systems.

    As Karpathy framed the deeper implication: "OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa." In other words, the path forward for AI might not run through better tokenizers — it might bypass text tokens altogether.

  • The unexpected benefits of AI PCs: why creativity could be the new productivity

    Presented by HP


    Creativity is quickly becoming the new measure of productivity. While AI is often framed as a tool for efficiency and automation, new research from MIT Sloan School of Management shows that generative AI enhances human creativity — when employees have the right tools and skills to use it effectively.

    That’s where AI PCs come in. These next-generation laptops combine local AI processing with powerful Neural Processing Units (NPUs), delivering the speed and security that knowledge workers expect while also unlocking new creative possibilities. By handling AI tasks directly on the device, AI PCs minimize latency, protect sensitive data, and lower energy consumption.

    Teams are already proving the impact. Marketing teams are using AI PCs to generate campaign assets in hours instead of weeks. Engineers are shortening design and prototyping cycles. Sales reps are creating personalized proposals onsite, even without cloud access. In each case, AI PCs are not just accelerating workflows — they’re sparking fresh ideas, faster iteration, and more engaged teams.

    The payoff is clear: creativity that translates into measurable business outcomes, from faster time-to-market and stronger compliance to deeper customer engagement. Still, adoption is uneven, and the benefits aren’t yet reaching the wider workforce.

    Early creative benefits, but a divide remains

    New Morning Consult and HP research shows nearly half of IT decision makers (45%) already use AI PCs for creative assistance, with almost a third (29%) using them for tasks like image generation and editing. That’s not just about efficiency — it’s about bringing imagination into everyday workflows.

    According to HP’s 2025 Work Relationship Index, fulfillment is the single biggest driver of a healthy work relationship, outranking even leadership. Give employees tools that let them create, not just execute tasks, and you unlock productivity, satisfaction, retention, and optimism. The same instinct that drives workers to build outside the office is the one companies can harness inside it.

    The challenge is that among broader knowledge workers, adoption is still low, just 29% for creative assistance and just 19% for image generation. This creative divide means the full potential of AI PCs hasn’t reached the wider workforce. For CIOs, the opportunity isn’t just deploying faster machines — it’s fostering a workplace culture where creativity drives measurable business value.

    Creative benefits of AI PCs

    So when you put AI PCs in front of the employees who embrace the possibilities, what does that look like in practice? Early adopters are already seeing AI PCs reshape how creative work gets done.

    Teams dream up fresh ideas, faster. AI PCs can spark new perspectives and out-of-the-box solutions, enhancing human creativity rather than replacing it. With dedicated NPUs handling AI workloads, employees stay in flow without interruptions. Battery life is extended, latency drops, and performance improves — allowing teams to focus on ideas, not wait times.

    On-device AI is opening new creative mediums, from visual design to video production to music editing, and videos, photos, and presentations that can be generated, edited, and refined in real time.

    Plus, AI workloads like summarization, transcription, and code generation run instantly without relying on cloud APIs. That means employees can work productively in low-bandwidth or disconnected environments, removing downtime risks, especially for mobile workforces and global deployments.

    And across the organization, AI PCs mean real-world, measurable business outcomes.

    Marketing: AI PCs enable creative teams to generate ad variations, social content, and campaign assets in minutes instead of days, reducing dependence on external agencies. And that leads to faster campaign launches, reduced external vendor spend, and increased pipeline velocity.

    Product and engineering: Designers/engineers can prototype in CAD, generate 3D mockups, or run simulations locally with on-device AI accelerators, shortening feedback loops. That means reduced iteration cycles, faster prototyping, and faster time-to-market.

    Sales/customer engagement: Reps can use AI PCs to generate real-time proposals, personalized presentations, or analyze contracts offline at client sites, even without cloud connection. This generates faster deal cycles, higher client engagement, and a shorter sales turnaround.

    From efficiency to fulfillment

    AI PCs are more than just a performance upgrade. They’re reshaping how people approach and experience work. By giving employees tools that spark creativity as well as productivity, organizations can unlock faster innovation, deeper engagement, and stronger retention.

    For CIOs, the opportunity goes beyond efficiency gains. The true value of AI PCs won’t be measured in speed or specs, but in how they open new possibilities for creation, collaboration, and competition — helping teams not just work faster, but work more creatively and productively.


    Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

  • AI’s financial blind spot: Why long-term success depends on cost transparency

    Presented by Apptio, an IBM company


    When a technology with revolutionary potential comes on the scene, it’s easy for companies to let enthusiasm outpace fiscal discipline. Bean counting can seem short-sighted in the face of exciting opportunities for business transformation and competitive dominance. But money is always an object. And when the tech is AI, those beans can add up fast.

    AI’s value is becoming evident in areas like operational efficiency, worker productivity, and customer satisfaction. However, this comes at a cost. The key to long-term success is understanding the relationship between the two — so you can ensure that the potential of AI translates into real, positive impact for your business.

    The AI acceleration paradox

    While AI is helping to transform business operations, its own financial footprint often remains obscure. If you can’t connect costs to impact, how can you be sure your AI investments will drive meaningful ROI? This uncertainty makes it no surprise that in the 2025 Gartner® Hype Cycle™ for Artificial Intelligence, GenAI has moved into the “Trough of Disillusionment” .

    Effective strategic planning depends on clarity. In its absence, decision-making falls back on guesswork and gut instinct. And there’s a lot riding on these decisions. According to Apptio research, 68% of technology leaders surveyed expect to increase their AI budgets, and 39% believe AI will be their departments’ biggest driver of future budget growth.

    But bigger budgets don’t guarantee better outcomes. Gartner® also reveals that “despite an average spend of $1.9 million on GenAI initiatives in 2024, fewer than 30% of AI leaders say their CEOs are satisfied with the return on investment.” If there’s no clear link between cost and outcome, organizations risk scaling investments without scaling the value they’re meant to create.

    To move forward with well-founded confidence, business leaders in finance, IT, and tech must collaborate to gain visibility into AI’s financial blind spot.

    The hidden financial risks of AI

    The runaway costs of AI can give IT leaders flashbacks to the early days of public cloud. When it’s easy for DevOps teams and business units to procure their own resources on an OpEx basis, costs and inefficiencies can quickly spiral. In fact, AI projects are avid consumers of cloud infrastructure — while incurring additional costs for data platforms and engineering resources. And that’s on top of the tokens used for each query. The decentralized nature of these costs makes them particularly difficult to attribute to business outcomes.

    As with the cloud, the ease of AI procurement quickly leads to AI sprawl. And finite budgets mean that every dollar spent represents an unconscious tradeoff with other needs. People worry that AI will take their job. But it’s just as likely that AI will take their department’s budget.

    Meanwhile, according to Gartner®, “Over 40% of agentic AI projects will be canceled by end of 2027, due to escalating costs, unclear business value or inadequate rish controls”. But are those the right projects to cancel? Lacking a way to connect investment to impact, how can business leaders know whether those rising costs are justified by proportionally greater ROI? ?

    Without transparency into AI costs, companies risk overspending, under-delivering, and missing out on better opportunities to drive value.

    Why traditional financial planning can't handle AI

    As we learned with cloud, we see that traditional static budget models are poorly suited for dynamic workloads and rapidly scaling resources. The key to cloud cost management has been tagging and telemetry, which help companies attribute each dollar of cloud spend to specific business outcomes. AI cost management will require similar practices. But the scope of the challenge goes much further. On top of costs for storage, compute, and data transfer, each AI project brings its own set of requirements — from prompt optimization and model routing to data preparation, regulatory compliance, security, and personnel.

    This complex mix of ever-shifting factors makes it understandable that finance and business teams lack granular visibility into AI-related spend — and IT teams struggle to reconcile usage with business outcomes. But it’s impossible to precisely and accurately track ROI without these connections.

    The strategic value of cost transparency

    Cost transparency empowers smarter decisions — from resource allocation to talent deployment.

    Connecting specific AI resources with the projects that they support helps technology decision-makers ensure that the most high-value projects are given what they need to succeed. Setting the right priorities is especially critical when top talent is in short supply. If your highly compensated engineers and data scientists are spread across too many interesting but unessential pilots, it’ll be hard to staff the next strategic — and perhaps pressing — pivot.

    FinOps best practices apply equally to AI. Cost insights can surface opportunities to optimize infrastructure and address waste whether by right-sizing performance and latency to match workload requirements, or by selecting a smaller, more cost-effective model instead of defaulting to the latest large language model (LLM). As work proceeds, tracking can flag rising costs so leaders can pivot quickly in more-promising directions as needed. A project that makes sense at X cost might not be worthwhile at 2X cost.

    Companies that adopt a structured, transparent, and well-governed approach to AI costs are more likely to spend the right money in the right ways and see optimal ROI from their investment.

    TBM: An enterprise framework for AI cost management

    Transparency and control over AI costs depend on three practices:

    IT financial management (ITFM): Managing IT costs and investments in alignment with business priorities

    FinOps: Optimizing cloud costs and ROI through financial accountability and operational efficiency

    Strategic portfolio management (SPM): Prioritizing and managing projects to better ensure they deliver maximum value for the business

    Collectively, these three disciplines make up Technology Business Management (TBM) — a structured framework that helps technology, business, and finance leaders connect technology investments to business outcomes for better financial transparency and decision-making.

    Most companies are already on the road to TBM, whether they realize it or not. They may have adopted some form of FinOps or cloud cost management. Or they might be developing strong financial expertise for IT. Or they may rely on Enterprise Agile Planning or Strategic Portfolio Management project management to deliver initiatives more successfully. AI can draw on — and impact — all of these areas. By unifying them under one umbrella with a common model and vocabulary, TBM brings essential clarity to AI costs and the business impact they enable.

    AI success depends on value — not just velocity. The cost transparency that TBM provides offers a road map that can help business and IT leaders make the right investments, deliver them cost-effectively, scale them responsibly, and turn AI from a costly mistake into a measurable business asset and strategic driver.

    Sources : Gartner® Press Release, Gartner® Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027, June 25, 2025 https://www.Gartner®.com/en/newsroom/press-releases/2025-06-25-Gartner®-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027

    GARTNER® is a registered trademark and service mark of Gartner®, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.


    Ajay Patel is General Manager, Apptio and IT Automation at IBM.


    Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

  • Claude Code comes to web and mobile, letting devs launch parallel jobs on Anthropic’s managed infra

    Vibe coding is evolving and with it are the leading AI-powered coding services and tools, including Anthropic’s Claude Code.

    As of today, the service will be available via the web and, in preview, on the Claude iOS app, giving developers access to additional asynchronous capabilities. Previously, it was available through the terminal on developers' PCs with support for Git, Docker, Kubernetes, npm, pip, AWS CLI, etc., and as an extension for Microsoft's open source VS Code editor and other JetBrains-powered integrated development environments (IDEs) via Claude Agent.

    “Claude Code on the web lets you kick off coding sessions without opening your terminal,” Anthropic said in a blog post. “Connect your GitHub repositories, describe what you need, and Claude handles the implementation. Each session runs in its own isolated environment with real-time progress tracking, and you can actively steer Claude to adjust course as it’s working through tasks.”

    This allows users to run coding projects asynchronously, a trend that many enterprises are looking for. 

    The web version of Claude Code, currently in research preview, will be available to Pro and Max users. However, web Claude Code will be subject to the same rate limits as other versions. Anthropic throttled rate limits to Claude and Claude Code after the unexpected popularity of the coding tool in July, which enabled some users to run Claude Code overnight. 

    Anthropic is now ensuring Claude Code comes closer to matching the availability of rival OpenAI's Codex AI coding platform, powered by a variant of GPT-5, which launches on mobile and the web back in mid September 2025.

    Parallel usage

    Anthropic said running Claude Code in the cloud means teams can “now run multiple tasks in parallel across different repositories from a single interface and ship faster with automatic PR creation and clear change summaries.”

    One of the big draws of coding agents is giving developers the ability to run multiple coding projects, such as bugfixes, at the same time. Google’s two coding agents, Jules and Code Assist, both offer asynchronous code generation and checks. Codex from OpenAI also lets people work in parallel.

    Anthropic said bringing Claude Code to the web won’t disrupt workflows, but noted running tasks in the cloud work best for tasks such as answering questions around projects and how repositories are mapped, bugfixes and for routine, well-defined tasks, and backend changes to verify any adjustments. 

    While most developers will likely prefer to use Claude Code on a desktop, Anthropic said the mobile version could encourage more users to “explore coding with Claude on the go.”

    Isolated environments 

    Anthropic insisted that Claude Code tasks on the cloud will have the same level of security as the earlier version. It runs on an “isolated sandbox environment with network and filesystem restrictions.” 

    Interactions go through a secure proxy service, which the company said ensures the model only accesses authorized repositories.

    Enterprise users can customize which domains Claude Code can connect to. 

    Claude Code is powered by Claude Sonnet 4.5, which Anthropic claims is the best coding model around. The company recently made Claude Haiku 4.5, a smaller version of Claude that also has strong coding capabilities, available to all Claude subscribers, including free users. 

  • Adobe Foundry wants to rebuild Firefly for your brand — not just tweak it

    Hoping to attract more enterprise teams to its ecosystem, Adobe launched a new model customization service called Adobe AI Foundry, which would create bespoke versions of its flagship AI model, Firefly.

    Adobe AI Foundry will work with enterprise customers to rearchitect and retrain Firefly models specific to the client. AI Foundry version models are different from custom Firefly models in that Foundry models understand multiple concepts compared to custom models with only a single concept. These models will also be multimodal, offering a wider use case than custom Firefly models, which can only ingest and respond with images. 

    Adobe AI Foundry models, with Firefly at its base, will know a company’s brand tone, image and video style, products and services and all its IP. The models will generate content based on this information for any use case the company wants. 

    Hannah Elsakr, vice president, GenAI New Business Ventures at Adobe, told VentureBeat that the idea to set up AI Foundry came because enterprise customers wanted more sophisticated custom versions of Firefly. But with how complex the needs of enterprises are, Adobe will be doing the rearchitecting rather than handing the reins over to customers. 

    “We will retrain our own Firefly commercially safe models with the enterprise IP. We keep that IP separate. We never take that back into the base model, and the enterprise itself owns that output,” Elsakr said. 

    Adobe will deploy the Foundry version of Firefly through its API solution, Firefly Services. 

    Elsakr likened AI Foundry to an advisory service, since Adobe will have teams working directly with enterprise customers to retrain the model. 

    Deep tuning

    Elsakr refers to Foundry as a deep tuning method because it goes further than simply fine-tuning a model.

    “The way we think about it, maybe more layman's terms, is that we're surgically reopening the Firefly-based models,” Elsakr said. “So you get the benefit of all the world's knowledge from our image model or a video model. We're going back in time and are bringing in the IP from the enterprise, like a brand. It could be footage from a shot style, whatever they have a license to contribute. We then retrain. We call this continuous pre-training, where we overweigh the model to dial some things differently. So we're literally retraining our base model, and that's why we call it deep tuning instead of fine-tuning.”

    Part of the training pipeline involves Adobe’s embedded teams working with the company to identify the data they would need. Then the data is securely transferred and ingested before being tagged. It is fed to the base model, and then Adobe begins a pre-training model run. 

    Elsakr maintains the Foundry versions of Firefly will not be small or distilled models. Often, the additional data from companies expands the parameters of Firefly.

    Two early customers of Adobe AI Foundry are Home Depot and Walt Disney Imagineering, the research and development arm of Disney for its theme parks. 

    “We are always exploring innovative ways to enhance our customer experience and streamline our creative workflows. Adobe’s AI Foundry represents an exciting step forward in embracing cutting-edge technologies to deepen customer engagement and deliver impactful content across our digital channels,” said Molly Battin, senior vice president and chief marketing officer at The Home Depot.

    More customization

    Enterprises often turn to fine-tuning and model customization to bring large language models with their vast external knowledge closer to their company’s needs. Fine-tuning also enables enterprise users to utilize models only in the context of their organization’s data, so the model doesn’t respond with text wholly unrelated to the business.

    Most organizations, however, do the fine-tuning themselves. They connect to the model’s API and begin retraining it to answer based on their ground truth or their preferences. Several methods for fine-tuning exist, including some that can be done with just a prompt. Other model providers also try to make it easier for their customers to fine-tune models, such as OpenAI with its o4-mini reasoning model

    Elsakr said she expects some companies will have three versions of Firefly: the Foundry version for most projects, a custom Firefly for specific single-concept use cases, and the base Firefly because some teams want a model less encumbered by corporate knowledge. 

  • The teacher is the new engineer: Inside the rise of AI enablement and PromptOps

    As more companies quickly begin using gen AI, it’s important to avoid a big mistake that could impact its effectiveness: Proper onboarding. Companies spend time and money training new human workers to succeed, but when they use large language model (LLM) helpers, many treat them like simple tools that need no explanation.

    This isn't just a waste of resources; it's risky. Research shows that AI has advanced quickly from testing to actual use in 2024 to 2025, with almost a third of companies reporting a sharp increase in usage and acceptance from the previous year.

    Probabilistic systems need governance, not wishful thinking

    Unlike traditional software, gen AI is probabilistic and adaptive. It learns from interaction, can drift as data or usage changes and operates in the gray zone between automation and agency. Treating it like static software ignores reality: Without monitoring and updates, models degrade and produce faulty outputs: A phenomenon widely known as model drift. Gen AI also lacks built-in organizational intelligence. A model trained on internet data may write a Shakespearean sonnet, but it won’t know your escalation paths and compliance constraints unless you teach it. Regulators and standards bodies have begun pushing guidance precisely because these systems behave dynamically and can hallucinate, mislead or leak data if left unchecked.

    The real-world costs of skipping onboarding

    When LLMs hallucinate, misinterpret tone, leak sensitive information or amplify bias, the costs are tangible.

    • Misinformation and liability: A Canadian tribunal held Air Canada liable after its website chatbot gave a passenger incorrect policy information. The ruling made it clear that companies remain responsible for their AI agents’ statements.

    • Embarrassing hallucinations: In 2025, a syndicated “summer reading list” carried by the Chicago Sun-Times and Philadelphia Inquirer recommended books that didn’t exist; the writer had used AI without adequate verification, prompting retractions and firings.

    • Bias at scale: The Equal Employment Opportunity Commission (EEOCs) first AI-discrimination settlement involved a recruiting algorithm that auto-rejected older applicants, underscoring how unmonitored systems can amplify bias and create legal risk.

    • Data leakage: After employees pasted sensitive code into ChatGPT, Samsung temporarily banned public gen AI tools on corporate devices — an avoidable misstep with better policy and training.

    The message is simple: Un-onboarded AI and un-governed usage create legal, security and reputational exposure.

    Treat AI agents like new hires

    Enterprises should onboard AI agents as deliberately as they onboard people — with job descriptions, training curricula, feedback loops and performance reviews. This is a cross-functional effort across data science, security, compliance, design, HR and the end users who will work with the system daily.

    1. Role definition. Spell out scope, inputs/outputs, escalation paths and acceptable failure modes. A legal copilot, for instance, can summarize contracts and surface risky clauses, but should avoid final legal judgments and must escalate edge cases.

    2. Contextual training. Fine-tuning has its place, but for many teams, retrieval-augmented generation (RAG) and tool adapters are safer, cheaper and more auditable. RAG keeps models grounded in your latest, vetted knowledge (docs, policies, knowledge bases), reducing hallucinations and improving traceability. Emerging Model Context Protocol (MCP) integrations make it easier to connect copilots to enterprise systems in a controlled way — bridging models with tools and data while preserving separation of concerns. Salesforce’s Einstein Trust Layer illustrates how vendors are formalizing secure grounding, masking, and audit controls for enterprise AI.

    3. Simulation before production. Don’t let your AI’s first “training” be with real customers. Build high-fidelity sandboxes and stress-test tone, reasoning and edge cases — then evaluate with human graders. Morgan Stanley built an evaluation regimen for its GPT-4 assistant, having advisors and prompt engineers grade answers and refine prompts before broad rollout. The result: >98% adoption among advisor teams once quality thresholds were met. Vendors are also moving to simulation: Salesforce recently highlighted digital-twin testing to rehearse agents safely against realistic scenarios.

    4. 4) Cross-functional mentorship. Treat early usage as a two-way learning loop: Domain experts and front-line users give feedback on tone, correctness and usefulness; security and compliance teams enforce boundaries and red lines; designers shape frictionless UIs that encourage proper use.

    Feedback loops and performance reviews—forever

    Onboarding doesn’t end at go-live. The most meaningful learning begins after deployment.

    • Monitoring and observability: Log outputs, track KPIs (accuracy, satisfaction, escalation rates) and watch for degradation. Cloud providers now ship observability/evaluation tooling to help teams detect drift and regressions in production, especially for RAG systems whose knowledge changes over time.

    • User feedback channels. Provide in-product flagging and structured review queues so humans can coach the model — then close the loop by feeding these signals into prompts, RAG sources or fine-tuning sets.

    • Regular audits. Schedule alignment checks, factual audits and safety evaluations. Microsoft’s enterprise responsible-AI playbooks, for instance, emphasize governance and staged rollouts with executive visibility and clear guardrails.

    • Succession planning for models. As laws, products and models evolve, plan upgrades and retirement the way you would plan people transitions — run overlap tests and port institutional knowledge (prompts, eval sets, retrieval sources).

    Why this is urgent now

    Gen AI is no longer an “innovation shelf” project — it’s embedded in CRMs, support desks, analytics pipelines and executive workflows. Banks like Morgan Stanley and Bank of America are focusing AI on internal copilot use cases to boost employee efficiency while constraining customer-facing risk, an approach that hinges on structured onboarding and careful scoping. Meanwhile, security leaders say gen AI is everywhere, yet one-third of adopters haven’t implemented basic risk mitigations, a gap that invites shadow AI and data exposure.

    The AI-native workforce also expects better: Transparency, traceability, and the ability to shape the tools they use. Organizations that provide this — through training, clear UX affordances and responsive product teams — see faster adoption and fewer workarounds. When users trust a copilot, they use it; when they don’t, they bypass it.

    As onboarding matures, expect to see AI enablement managers and PromptOps specialists in more org charts, curating prompts, managing retrieval sources, running eval suites and coordinating cross-functional updates. Microsoft’s internal Copilot rollout points to this operational discipline: Centers of excellence, governance templates and executive-ready deployment playbooks. These practitioners are the “teachers” who keep AI aligned with fast-moving business goals.

    A practical onboarding checklist

    If you’re introducing (or rescuing) an enterprise copilot, start here:

    1. Write the job description. Scope, inputs/outputs, tone, red lines, escalation rules.

    2. Ground the model. Implement RAG (and/or MCP-style adapters) to connect to authoritative, access-controlled sources; prefer dynamic grounding over broad fine-tuning where possible.

    3. Build the simulator. Create scripted and seeded scenarios; measure accuracy, coverage, tone, safety; require human sign-offs to graduate stages.

    4. Ship with guardrails. DLP, data masking, content filters and audit trails (see vendor trust layers and responsible-AI standards).

    5. Instrument feedback. In-product flagging, analytics and dashboards; schedule weekly triage.

    6. Review and retrain. Monthly alignment checks, quarterly factual audits and planned model upgrades — with side-by-side A/Bs to prevent regressions.

    In a future where every employee has an AI teammate, the organizations that take onboarding seriously will move faster, safer and with greater purpose. Gen AI doesn’t just need data or compute; it needs guidance, goals, and growth plans. Treating AI systems as teachable, improvable and accountable team members turns hype into habitual value.

    Dhyey Mavani is accelerating generative AI at LinkedIn.

  • Abstract or die: Why AI enterprises can’t afford rigid vector stacks

    Vector databases (DBs), once specialist research instruments, have become widely used infrastructure in just a few years. They power today's semantic search, recommendation engines, anti-fraud measures and gen AI applications across industries. There are a deluge of options: PostgreSQL with pgvector, MySQL HeatWave, DuckDB VSS, SQLite VSS, Pinecone, Weaviate, Milvus and several others.

    The riches of choices sound like a boon to companies. But just beneath, a growing problem looms: Stack instability. New vector DBs appear each quarter, with disparate APIs, indexing schemes and performance trade-offs. Today's ideal choice may look dated or limiting tomorrow.

    To business AI teams, volatility translates into lock-in risks and migration hell. Most projects begin life with lightweight engines like DuckDB or SQLite for prototyping, then move to Postgres, MySQL or a cloud-native service in production. Each switch involves rewriting queries, reshaping pipelines, and slowing down deployments.

    This re-engineering merry-go-round undermines the very speed and agility that AI adoption is supposed to bring.

    Why portability matters now

    Companies have a tricky balancing act:

    • Experiment quickly with minimal overhead, in hopes of trying and getting early value;

    • Scale safely on stable, production-quality infrastructure without months of refactoring;

    • Be nimble in a world where new and better backends arrive nearly every month.

    Without portability, organizations stagnate. They have technical debt from recursive code paths, are hesitant to adopt new technology and cannot move prototypes to production at pace. In effect, the database is a bottleneck rather than an accelerator.

    Portability, or the ability to move underlying infrastructure without re-encoding the application, is ever more a strategic requirement for enterprises rolling out AI at scale.

    Abstraction as infrastructure

    The solution is not to pick the "perfect" vector database (there isn't one), but to change how enterprises think about the problem.

    In software engineering, the adapter pattern provides a stable interface while hiding underlying complexity. Historically, we've seen how this principle reshaped entire industries:

    • ODBC/JDBC gave enterprises a single way to query relational databases, reducing the risk of being tied to Oracle, MySQL or SQL Server;

    • Apache Arrow standardized columnar data formats, so data systems could play nice together;

    • ONNX created a vendor-agnostic format for machine learning (ML) models, bringing TensorFlow, PyTorch, etc. together;

    • Kubernetes abstracted infrastructure details, so workloads could run the same everywhere on clouds;

    • any-llm (Mozilla AI) now makes it possible to have one API across lots of large language model (LLM) vendors, so playing with AI is safer.

    All these abstractions led to adoption by lowering switching costs. They turned broken ecosystems into solid, enterprise-level infrastructure.

    Vector databases are also at the same tipping point.

    The adapter approach to vectors

    Instead of having application code directly bound to some specific vector backend, companies can compile against an abstraction layer that normalizes operations like inserts, queries and filtering.

    This doesn't necessarily eliminate the need to choose a backend; it makes that choice less rigid. Development teams can start with DuckDB or SQLite in the lab, then scale up to Postgres or MySQL for production and ultimately adopt a special-purpose cloud vector DB without having to re-architect the application.

    Open source efforts like Vectorwrap are early examples of this approach, presenting a single Python API to Postgres, MySQL, DuckDB and SQLite. They demonstrate the power of abstraction to accelerate prototyping, reduce lock-in risk and support hybrid architectures employing numerous backends.

    Why businesses should care

    For leaders of data infrastructure and decision-makers for AI, abstraction offers three benefits:

    Speed from prototype to production

    Teams are able to prototype on lightweight local environments and scale without expensive rewrites.

    Reduced vendor risk

    Organizations can adopt new backends as they emerge without long migration projects by decoupling app code from specific databases.

    Hybrid flexibility

    Companies can mix transactional, analytical and specialized vector DBs under one architecture, all behind an aggregated interface.

    The result is data layer agility, and that's more and more the difference between fast and slow companies.

    A broader movement in open source

    What's happening in the vector space is one example of a bigger trend: Open-source abstractions as critical infrastructure.

    • In data formats: Apache Arrow

    • In ML models: ONNX

    • In orchestration: Kubernetes

    • In AI APIs: Any-LLM and other such frameworks

    These projects succeed, not by adding new capability, but by removing friction. They enable enterprises to move more quickly, hedge bets and evolve along with the ecosystem.

    Vector DB adapters continue this legacy, transforming a high-speed, fragmented space into infrastructure that enterprises can truly depend on.

    The future of vector DB portability

    The landscape of vector DBs will not converge anytime soon. Instead, the number of options will grow, and every vendor will tune for different use cases, scale, latency, hybrid search, compliance or cloud platform integration.

    Abstraction becomes strategy in this case. Companies adopting portable approaches will be capable of:

    • Prototyping boldly

    • Deploying in a flexible manner

    • Scaling rapidly to new tech

    It's possible we'll eventually see a "JDBC for vectors," a universal standard that codifies queries and operations across backends. Until then, open-source abstractions are laying the groundwork.

    Conclusion

    Enterprises adopting AI cannot afford to be slowed by database lock-in. As the vector ecosystem evolves, the winners will be those who treat abstraction as infrastructure, building against portable interfaces rather than binding themselves to any single backend.

    The decades-long lesson of software engineering is simple: Standards and abstractions lead to adoption. For vector DBs, that revolution has already begun.

    Mihir Ahuja is an AI/ML engineer and open-source contributor based in San Francisco.

  • Codev lets enterprises avoid vibe coding hangovers with a team of agents that generate and document code

    For many software developers using generative AI, vibe coding is a double-edged sword.

    The process delivers rapid prototypes but often leaves a trail of brittle, undocumented code that creates significant technical debt.

    A new open-source platform, Codev, addresses this by proposing a fundamental shift: treating the natural language conversation with an AI as part of the actual source code.

    Codev is based on SP(IDE)R, a framework designed to turn vibe-coding conversations into structured, versioned, and auditable assets that become part of the code repository.

    What is Codev?

    At its core, Codev is a methodology that treats natural language context as an integral part of the development lifecycle as opposed to a disposable artifact as is the case with vanilla vibe coding.

    According to co-founder Waleed Kadous, the goal is to invert the typical engineering workflow.

    "A key principle of Codev is that documents like the specification are the actual code of the system," he told VentureBeat. "It's almost like natural language is compiled down into Typescript by our agents."

    This approach avoids the common pitfall where documentation is created after the fact, if at all.

    Its flagship protocol, SP(IDE)R, provides a lightweight but formal structure for building software. The process begins with Specify, where a human and multiple AI agents collaborate to turn a high-level request into concrete acceptance criteria. Next, in the Plan stage, an AI proposes a phased implementation, which is again reviewed.

    For each phase, the AI enters an IDE loop: it Implements the code, Defends it against bugs and regression with comprehensive tests, and Evaluates the result against the specification. The final step is Review, where the team documents lessons learned to update and improve the SP(IDE)R protocol itself for future projects.

    The framework’s key differentiator is its use of multiple agents and explicit human review at different stages. Kadous notes that each agent brings unique strengths to the review process.

    "Gemini is extremely good at catching security issues," he said, citing a critical cross-site scripting (XSS) flaw and another bug that "would have shared an OpenAI API key with the client, which could cost thousands of dollars."

    Meanwhile, "GPT-5 is very good at understanding how to simplify a design." This structured review, with a human providing final approval at each stage, prevents the kind of runaway automation that leads to flawed code.

    The platform’s AI-native philosophy extends to its installation. There is no complex installer; instead, a user instructs their AI agent to apply the Codev GitHub repository to set up the project. The developers "dogfooded" their framework, using Codev to build Codev.

    “The key point here is that natural language is executable now, with the agent being the interpreter,” Kadous said. “This is great because it means it's not a ‘blind’ integration of Codev, the agent gets to choose the best way to integrate it and can intelligently make decisions.”

    Codev case study

    To test the framework's effectiveness, its creators ran a direct comparison between vanilla vibe-coding and Codev. They gave Claude Opus 4.1 a request to build a modern web-based todo manager. The first attempt used a conversational, vibe-coding approach. The result was a plausible-looking demo. However, an automated analysis conducted by three independent AI agents found that it had implemented 0% of the required functionality, contained no tests, and lacked a database or API.

    The second attempt used the same AI model and prompt but applied the SP(IDE)R protocol. This time, the AI produced a production-ready application with 32 source files, 100% of the specified functionality, five test suites, a SQLite database, and a complete RESTful API.

    Throughout this process, the human developers reported they never directly edited a single line of source code. While this was a single experiment, Kadous estimates the impact is substantial.

    "Subjectively, it feels like I'm about three times as productive with Codev as without," he says. The quality also speaks for itself. "I used LLMs as a judge, and one of them described the output like what a well-oiled engineering team would produce. That was exactly what I was aiming for."

    While the process is powerful, it redefines the developer's role from a hands-on coder to a system architect and reviewer. According to Kadous, the initial spec and plan stages can each take between 45 minutes to two hours of focused collaboration.

    This is in contrast to the impression given by many vibe-coding platforms, where a single prompt and a few minutes of processing gives you a fully functional and scalable application.

    "All of the value I add is in the background knowledge I apply to the specs and plans," he explains. He emphasizes that the framework is designed to augment, not replace, experienced talent. "The people who will do the best… are senior engineers and above because they know the pitfalls… It just takes the senior engineer you already have and makes them much more productive."

    A future of human and AI collaboration

    Frameworks like Codev signal a shift where the primary creative act of software development moves from writing code to crafting precise, machine-readable specifications and plans. For enterprise teams, this means AI-generated code can become auditable, maintainable, and reliable. By capturing the entire development conversation in version control and enforcing it with CI, the process turns ephemeral chats into durable engineering assets.

    Codev proposes a future where the AI acts not as a chaotic assistant, but as a disciplined collaborator in a structured, human-led workflow.

    However, Kadous acknowledges this shift creates new challenges for the workforce. "Senior engineers that reject AI outright will be outpaced by senior engineers who embrace it," he predicts. He also expresses concern for junior developers who may not get the chance "to build their architectural chops," a skill that becomes even more critical when guiding AI.

    This highlights a central challenge for the industry: ensuring that as AI elevates top performers, it also creates pathways to develop the next generation of talent.