Close Menu
Cryprovideos
    What's Hot

    XRP New Addresses, Energetic Provide Plunge Amid Shift to ‘Institutional Rails’ – Decrypt

    May 8, 2026

    Lagarde Says Stablecoins Will Not Strengthen Euro’s International Function

    May 8, 2026

    CoinDesk 20 efficiency replace: NEAR Protocol (NEAR) good points 6.3%, main index increased

    May 8, 2026
    Facebook X (Twitter) Instagram
    Cryprovideos
    • Home
    • Crypto News
    • Bitcoin
    • Altcoins
    • Markets
    Cryprovideos
    Home»Markets»FlashAttention-4 Hits 1,605 TFLOPS on NVIDIA Blackwell GPUs
    FlashAttention-4 Hits 1,605 TFLOPS on NVIDIA Blackwell GPUs
    Markets

    FlashAttention-4 Hits 1,605 TFLOPS on NVIDIA Blackwell GPUs

    By Crypto EditorJanuary 23, 2026No Comments3 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Alvin Lang
    Jan 22, 2026 23:03

    NVIDIA’s FlashAttention-4 achieves 71% {hardware} effectivity on Blackwell chips, delivering 3.6x speedup over FA2 for AI coaching workloads.

    FlashAttention-4 Hits 1,605 TFLOPS on NVIDIA Blackwell GPUs

    NVIDIA has launched FlashAttention-4, the most recent optimization for transformer neural networks that squeezes 1,605 TFLOPS out of its Blackwell structure—capturing 71% of the {hardware}’s theoretical most efficiency.

    The announcement issues for anybody watching AI infrastructure investments. As massive language fashions push towards longer context home windows, the eye mechanism’s quadratic reminiscence complexity turns into a brutal bottleneck. FlashAttention-4 assaults this drawback immediately, and the benchmark numbers recommend significant features for manufacturing AI workloads.

    What the Numbers Present

    On the B200 GPU, FA4 delivers a 3.6x speedup over FlashAttention-2 throughout ahead passes at 32,768 sequence size. Backward move efficiency hits 3.15x quicker than FA2 beneath the identical situations. In opposition to current frameworks, FA4 posts 1.3x enchancment over cuDNN and a couple of.4x over Triton Inference Server implementations.

    The reminiscence effectivity features are equally vital. Commonplace consideration scales at O(N²) with sequence size—that means doubling your context window quadruples reminiscence necessities. FA4 brings this right down to O(N) by tiling and incremental softmax normalization. NVIDIA claims 20x decrease reminiscence utilization in comparison with PyTorch baselines.

    {Hardware}-Software program Co-Design

    FA4 was constructed particularly for Blackwell’s quirks. The structure presents an uneven scaling drawback: compute energy roughly doubles whereas reminiscence bandwidth would not maintain tempo. Conventional approaches go away tensor cores sitting idle whereas ready for knowledge.

    The answer leverages Blackwell’s devoted Tensor Reminiscence (TMEM)—256 KB of on-chip reminiscence per streaming multiprocessor. By storing intermediate calculations immediately in TMEM as an alternative of shared reminiscence, FA4 sidesteps the bandwidth bottleneck that will in any other case throttle the quicker compute models.

    Bigger tile sizes (as much as 128×128) and deeper pipelines maintain the {hardware} busy. The backward move—usually the slower half of coaching—advantages from bypassing register accumulation solely.

    Manufacturing Integration

    Main inference frameworks together with SGLang and vLLM already help FA4 prefill operations. NVIDIA has integrated these strategies into cuDNN 9.14, making the optimizations accessible to builders with out customized kernel work.

    For AI corporations burning by compute budgets, the effectivity features translate on to value financial savings. A 3x+ speedup on coaching passes means both quicker iteration cycles or the power to coach bigger fashions inside current infrastructure constraints.

    The broader pattern right here: as transformer fashions develop, algorithmic effectivity on the kernel degree turns into as essential as uncooked {hardware} functionality. FlashAttention-4 represents the present frontier of that optimization work.

    Picture supply: Shutterstock




    Supply hyperlink

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    Lagarde Says Stablecoins Will Not Strengthen Euro’s International Function

    May 8, 2026

    CoinDesk 20 efficiency replace: NEAR Protocol (NEAR) good points 6.3%, main index increased

    May 8, 2026

    NVIDIA Mannequin Optimizer Brings FP8 Quantization to CLIP Fashions

    May 8, 2026

    Aptos Commits $50M to Construct the Way forward for AI Brokers

    May 8, 2026
    Latest Posts

    Michael Saylor Backtracks From Promote a Kidney Stance to Promoting Bitcoin

    May 8, 2026

    Why Satoshi's BTC Will By no means Transfer: Fred Krueger Explains Most Logical Idea About Bitcoin's Creator – U.At this time

    May 8, 2026

    Bitcoin Merchants Have These Help Ranges in Thoughts as $80,000 Battle Returns

    May 8, 2026

    Hiring slowdown could possibly be nice for bitcoin (BTC) — until wages spoil the get together

    May 8, 2026

    Bitcoin Provide Shock: 100,000 BTC Vanish From Exchanges In Underneath 90 Days

    May 8, 2026

    Bitcoin Worth Falls Beneath Its Most Vital Help, What Does it Imply?

    May 8, 2026

    Hyperliquid Outperformed Bitcoin By 71% In The Worst Crypto Quarter Since 2018 — Report Reveals Why | Bitcoinist.com

    May 8, 2026

    Solv Protocol Strikes $700 Million in Bitcoin Property to Chainlink CCIP for Stronger Safety – BlockNews

    May 8, 2026

    CryptoVideos.net is your premier destination for all things cryptocurrency. Our platform provides the latest updates in crypto news, expert price analysis, and valuable insights from top crypto influencers to keep you informed and ahead in the fast-paced world of digital assets. Whether you’re an experienced trader, investor, or just starting in the crypto space, our comprehensive collection of videos and articles covers trending topics, market forecasts, blockchain technology, and more. We aim to simplify complex market movements and provide a trustworthy, user-friendly resource for anyone looking to deepen their understanding of the crypto industry. Stay tuned to CryptoVideos.net to make informed decisions and keep up with emerging trends in the world of cryptocurrency.

    Top Insights

    Binance pre ipo discovery in Web3 Pockets provides 5 tokenized belongings, SpaceX and OpenAI

    April 11, 2026

    $80,000 Damaged: Bitcoin’s Worth Drop Triggers Sharp Crypto Market Correction – BlockNews

    April 7, 2025

    Trump Nominee Paul Atkins Clears Senate Panel for U.S. Securities and Trade Fee (SEC) Chair: Report – The Day by day Hodl

    April 4, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    • Home
    • Privacy Policy
    • Contact us
    © 2026 CryptoVideos. Designed by MAXBIT.

    Type above and press Enter to search. Press Esc to cancel.