Notion Slashes AI Embedding Prices 80% After Ditching Spark for Ray

Notion has slashed its AI embedding pipeline prices by greater than 80% after migrating from Apache Spark to Ray, the distributed computing framework backed by Anyscale. The productiveness software program firm additionally achieved 10x enhancements in question latency whereas consolidating three separate jobs per area into one.

The migration particulars emerged at Ray Day Seattle on April 9, 2026, the place ML engineers from Notion, Uber, Salesforce, and Apple shared hard-won classes about scaling AI infrastructure.

What Notion Really Modified

Mickey Liu, a software program engineer on Notion’s search platform group, walked by the overhaul. Their unique setup used a three-step Spark pipeline operating on Amazon EMR: knowledge chunking, third-party API requires embedding technology, and writes to a vector retailer.

The ache factors have been predictable however extreme. Double compute prices. Third-party API price limits throttling throughput. Debugging nightmares when failures occurred throughout instruments—driver and executor logs weren’t even endured in YARN.

The brand new structure streams Kafka knowledge immediately right into a Ray cluster dealing with CPU chunking, GPU embedding technology, and vector retailer writes in a single pipeline. No intermediate S3 handoffs. What began because the backend for a Q&A function in 2023 now powers all of Notion AI and customized brokers.

Uber and Salesforce Report Comparable Positive factors

Uber’s Peng Zhang detailed how their Michelangelo ML platform advanced from TensorFlow/Horovod to Ray with PyTorch. The standout transfer: separating CPU data-loading nodes from GPU coaching nodes in a heterogeneous cluster design. End result? GPU utilization jumped 20%, and coaching time dropped roughly 50% in choose pipelines.

Salesforce tackled a special beast—summarizing paperwork as much as 200,000 tokens lengthy (roughly a brief novel) with P95 latency below 15 seconds. Their group used Ray to chunk paperwork and run parallel inference throughout a distributed actor pool with vLLM, then merge outcomes. They landed on 1-2 GPU knowledge parallelism because the candy spot after operating scaling experiments immediately on Ray.

Why This Issues Past These Corporations

Robert Nishihara, Ray’s co-creator and Anyscale co-founder, opened the occasion by framing the core downside: AI infrastructure retains getting more durable. Multimodal knowledge processing, reinforcement studying workloads, and multi-node LLM inference are pushing current instruments previous their limits.

Each speaker landed on the identical conclusion from totally different angles—their earlier tooling ran out of highway.

Apple engineers Charlie Chen and Haocheng Bian highlighted basis mannequin coaching challenges: huge unstructured knowledge, billion-plus parameters, and sparse architectures like Combination of Specialists. Conventional engines fail as a result of knowledge pipelines and coaching frameworks run in separate environments with no shared context.

What’s Subsequent

Ray Day Seattle kicked off Anyscale’s 2026 “Ray on the Street” tour—eight cities throughout three nations. The corporate can also be operating invite-only buyer roundtables at every cease to preview their product roadmap.

For groups hitting comparable partitions with Spark or different distributed frameworks, Notion’s full technical writeup is out there on their engineering weblog below “Two Years of Vector Search at Notion.” The 80% value discount and 10x latency enchancment supply a concrete benchmark for anybody evaluating comparable migrations.

Picture supply: Shutterstock

Supply hyperlink

What's Hot

Google Had Pages. AI Has One Reply. Is It You?

Brian Armstrong Lastly Says Sure to the Readability Act, and It Solely Took Two Rejections to Get There – BlockNews

Japan Classifies Crypto as Monetary Devices – Bitbo

Notion Slashes AI Embedding Prices 80% After Ditching Spark for Ray

Google Had Pages. AI Has One Reply. Is It You?

Brian Armstrong Lastly Says Sure to the Readability Act, and It Solely Took Two Rejections to Get There – BlockNews

Trump Memecoin Occasion Positive Print Says He Could Not Present Up — Senators Need Solutions

Google's PaperOrchestra AI Converts Lab Notes Into Publication-Prepared Analysis Papers – Decrypt

Pi Community’s (PI) Newest Updates, Worrying Bitcoin (BTC) Predictions, and Extra: Bits Recap April 10

Is Bitcoin Going to Crash? 5 Warning Indicators to Watch

Glassnode: Bitcoin Nonetheless in Bear Zone Beneath $81.6K – Bitbo

Bitcoin ETF Hype Hits Ceiling, Sharp Drop Danger Emerges: Analyst

BlackRock Bitcoin ETF Pulls $269M in 5-Week Greatest – Bitbo

Crypto CEX Exercise Cools: Quantity Down 48% From Bitcoin ATH

XRP could also be much less uncovered to quantum laptop threats than bitcoin

Solo Bitcoin Miner Defies 1-in-100,000 Odds To Win $222K Block Reward

Top Insights

Market Professional Shakes Off SEC’s Delay Of XRP ETFs, Provides Timeframe For Approval | Bitcoinist.com

Greatest Crypto to Purchase Now as $227 Million Bitcoin Brief Shakes Market Confidence

Coinbase Q3 Web Earnings Soars 475% as Crypto Volatility Fuels Buying and selling Growth

What's Hot

Notion Slashes AI Embedding Prices 80% After Ditching Spark for Ray

What Notion Really Modified

Uber and Salesforce Report Comparable Positive factors

Why This Issues Past These Corporations

What’s Subsequent

Related Posts

Subscribe to Updates