Microsoft-backed chip startup d-Matrix, which tackles the AI inference bottleneck through its revolutionary digital in-memory computing architecture, delivering generative AI cost efficiency while dramatically reducing data center energy consumption and accelerating AI inference acceleration across enterprise deployments, raised $275 million.

A single AI chatbot burns through seven figures daily just to run. That’s the hidden crisis crippling enterprises deploying large language models at scale. While artificial intelligence training grabs headlines, the real economic killer is inference. The moment trained models generate responses for millions of simultaneous users.

d-Matrix, a Santa Clara-based AI chip startup, just raised $275 million to solve this trillion-dollar infrastructure nightmare that nobody’s talking about. The Series C funding round, announced on November 12, 2025, values the Microsoft-backed company at $2 billion and brings total capital to $450 million.

Big-money co-leads BullhoundCapital, Triatomic Capital, and Singapore’s Temasek sovereign wealth fund backed the oversubscribed round. Yet the real story isn’t the funding—it’s the problem d-Matrix identified six years ago that the entire AI industry is now wrestling with.

The Memory Wall, an AI’s Dirty Secret

Today’s graphics processors, led by Nvidia’s dominance, separate compute from memory. That means data travels constantly between processing cores and memory banks, consuming enormous energy and creating bottlenecks that make AI inference inefficient at scale.

It’s like asking a construction crew to haul materials back and forth between distant warehouses rather than keeping supplies on-site. The result? Wasted power, inflated costs, and an infrastructure crisis nobody’s ready for.

Market analysts expect data center power demand to skyrocket by 165 percent by 2030, driven solely by AI. That’s a problem for hyperscalers drowning in electricity bills, but it’s the reason d-Matrix’s technology matters. The startup’s chief executive officer, Sid Sheth, and co-founder Sudeep Bhoja predicted inference would become the dominant expense long before ChatGPT proved them right.

The d-Matrix Solution is Moving Compute Into Memory

Here’s where d-Matrix’s innovation changes the game. The company developed Digital In-Memory Compute, or DIMC, a breakthrough architecture that embeds processing directly into memory itself instead of shuttling data between separate components.

It’s like building the warehouse near the construction site. Suddenly, work accelerates and waste disappears.

The technology powers Corsair, d-Matrix’s flagship AI inference accelerator designed for data centers. Corsair delivers astonishing numbers: up to 10 times faster performance, 3 times lower costs, and 5 times better energy efficiency compared to GPU systems.

The card features two custom chips with 1 gigabyte of ultra-fast SRAM each, memory typically reserved for processor caches. Microsoft-backed d-Matrix engineered them to perform vector-matrix multiplications. It is the mathematical backbone of AI inference. The result produces up to 9,600 trillion calculations per second using compressed data formats, dramatically reducing power consumption.

The chip’s Corsair architecture uses eight chiplets built on cutting-edge 6-nanometer manufacturing, connected via ultra-low-latency pathways that deliver eight terabytes per second of die-to-die bandwidth.

That’s roughly 150 terabytes per second of memory bandwidth, an order of magnitude higher than Nvidia’s HBM technology. When you multiply those gains across data center deployments, the economics become transformative. d-Matrix claims enterprises save millions annually on power and hardware through its platform.

Why Timing Is Everything for AI Inference

The funding round signals a profound market shift. From 2022 to October 2024, inference costs plummeted 280 times, a stunning price compression driven by competition and innovation. But here’s the paradox. While end-user API prices collapse, infrastructure costs explode.

Running models at production scale on GPU-based systems consumes massive amounts of electricity and requires significant capital investment. By 2026, analysts project inference budgets will surpass training budgets across the entire industry. That’s the inflection point d-Matrix is designed for.

Jeff Huber, a general partner at Triatomic Capital, said, “AI inference is becoming the dominant cost in production AI systems, and d-Matrix has cracked the code on delivering both performance and sustainable economics at scale.”

The investor consortium behind this round, spanning Europe, North America, Asia, and the Middle East, isn’t betting on hype. These are deep-tech specialists recognizing that efficiency and scalability win in the next era of AI competition.

The Microsoft Endorsement That Matters

Microsoft’s M12 venture fund participated in this round, signaling that even tech giants are hedging their infrastructure bets.

Michael Stewart, managing partner at M12, said, “d-Matrix is the first AI chip startup to address contemporary unit economics in LLM inference with differentiated elements in the in-memory product architecture that will sustain the TCO benefits with leading latency and throughput.”

Microsoft sees d-Matrix as essential infrastructure for Azure’s future competitiveness.

Additionally, d-Matrix already ships through partners like Super Micro Computer, integrating Corsair cards into enterprise servers. The company announced a reference architecture called SquadRack alongside Arista, Broadcom, and Supermicro, establishing an ecosystem that challenges Nvidia’s grip on AI infrastructure.

With 250 employees spread across Santa Clara, Toronto, Sydney, Bangalore, and Belgrade, d-Matrix operates at scale for a startup.

The 3D Memory and Global Expansion

This capital accelerates d-Matrix’s product roadmap. The company plans to launch Raptor, a next-generation accelerator that stacks RAM directly onto compute modules in 3D configurations, pushing efficiency even further by reducing data travel distance.

The Aviator software stack optimizes AI workloads from edge to cloud, while JetStream networking accelerators link Corsair-equipped servers into seamless inference clusters.

The funding enables global expansion and large-scale customer pilots. Early adopters remain undisclosed, but industry observers expect Azure, major cloud providers, and enterprise AI factories to become primary users.

Goldman Sachs estimates global AI chip markets will reach $200 billion annually by 2027. In that landscape, companies solving inference efficiency don’t just capture market share, they become foundational infrastructure.

The Inference Economy Is Upon Us

The Age of AI Inference has arrived. Companies currently sinking millions into GPU clusters to run models will face a reckoning, as they’re using 1990s-era computer architectures to solve 2020s-era problems.

d-Matrix’s $2 billion valuation reflects Wall Street’s recognition that the future of AI infrastructure belongs to startups solving inference, not training. With $450 million in funding and backing from Microsoft, sovereign wealth funds, and silicon-savvy venture firms, d-Matrix now has the resources to scale globally.

The question isn’t whether enterprise AI will thrive, it’s who controls the infrastructure powering it. d-Matrix bet everything on inference six years ago. Now the market is proving them right, and this funding round is just the beginning.

Follow USTechTimes on Facebook, Twitter and Linkedin for in-depth news of market trends, funding updates, and regulatory changes affecting startups in USA.

We Recommend: