AMD’s beastly ‘Strix Halo’ Ryzen AI Max+ debuts with radical new memory tech to feed RDNA 3.5 graphics and Zen 5 CPU cores

AMD Strix Halo Ryzen AI Max
(Image credit: AMD)

AMD announced its ‘Strix Halo’ Ryzen AI Max series laptop processors here at CES 2025 in Las Vegas, and by any definition, the new APUs look to be absolute monsters for the enthusiast thin-and-light gaming and AI workstation laptop markets, featuring what AMD bills as the fastest integrated graphics available in the Windows ecosystem courtesy of a new disruptive integrated memory architecture that unlocks new capabilities and a beastly 40-core RDNA 3.5 integrated GPU. 

In fact, AMD says the AI Max chips deliver up to 1.4X faster gaming performance than Intel’s flagship ‘Lunar Lake’ Core Ultra 9 288V flagship, and up to 84% faster rendering performance than the Apple MacBook M4 Pro. AMD also says the chip can deliver up to an incredible 2.2X more performance in AI workloads than the discreet desktop Nvidia RTX 4090 GPU, but at an 87% lower TDP.

If AMD keeps with tradition, which we fully expect, we will see these monstrous APU chips come to desktop PCs in the future.

Ryzen AI Max Specifications

The fire-breathing 120W Zen 5-powered flagship Ryzen AI Max+ 395 comes packing 16 CPU cores and 32 threads paired with 40 RDNA 3.5 (Radeon 8060S) integrated graphics cores (CUs), but perhaps more importantly, it supports up to 128GB of memory that is shared among the CPU, GPU, and XDNA 2 NPU AI engines. The memory can also be carved up to a distinct pool dedicated to the GPU only, thus delivering an astounding 256 GB/s of memory throughput that unlocks incredible performance in memory capacity-constrained AI workloads (details below). AMD says this delivers groundbreaking capabilities for thin-and-light laptops and mini workstations, particularly in AI workloads. The company also shared plenty of gaming and content creation benchmarks.

Swipe to scroll horizontally
Processor - 8060S iGPUCores/ThreadsBoost ClockCache (total)GPU CorescTDPPeak TOPS
Ryzen Al Max+ 395 / PRO16 / 325.1 GHz80 MB40 CU45-120W50
Ryzen Al Max 390 / PRO12 / 245.0 GHz76 MB32 CU45-120W50
Ryzen Al Max 385 / PRO8 / 165.0 GHz40 MB32 CU45-120W50
Ryzen Al Max 380 PRO6 / 124.9 GHz22 MB16 CU45-120W50

As designated by the ‘+,’ the Ryzen AI Max+ 395 serves as the flagship of the new series, but it’s accompanied by three other processors in varying configurations ranging from a 6-core 12-thread / 16 GPU core Ryzen AI Max Pro 380 to the 12-core 24-thread / 32 GPU core Ryzen AI Max 390. The Ryzen AI Max 390 and 385 are armed with the Radeon 8050S graphics engine. AMD also has an unlisted 16-core 16-CU model for OEMs.

All of the AI Max chips have a 55W base TDP, but also a configurable TDP that ranges from 45 to 120W to unleash more horsepower in designs that can handle the thermal output. The top three models will come in both standard and Pro series models, with the latter

including a RAS feature set for professional users. AMD plans to release these processors across Q1 and Q2 of this year.

(Image credit: AMD)

As you can see above, the chip has a large central I/O die that houses the GPU and NPU, while the two smaller dies contain the CPU cores. The chip connects to standard memory, but you can use the Radeon Adrenaline software to make a custom memory allocation dedicated to the GPU only. 

For instance, if you have 128GB of total system memory, up to 96GB can be allocated to the GPU alone, with the remaining 32GB dedicated to the CPU. However, the GPU can still read from the entire 128 GB memory, thus eliminating costly memory copies via its unified coherent memory architecture. However, it can only write to its directly allocated 96GB pool. The combination of a shared memory pool and a dedicated memory pool enables faster performance in a range of workloads by leveraging the memory as a slower VRAM of sorts, including disruptive performance in AI workloads that we’ll outline below. But first, gaming and content creation performance.

Ryzen AI Max+ gaming and content creation benchmarks

The Strix Halo chips address both the enthusiast gaming and workstation markets, so performance in gaming is key. AMD claims the Ryzen AI Max+ is 1.4X faster than Intel’s flagship Lunar Lake Core 9 288V, but the company didn’t share real-world gaming benchmarks. Instead, AMD shared synthetic 3DMark tests that often don’t correlate well to real-world gaming performance. We won’t know more about real-world gaming performance until Ryzen AI Max hits the market, and as always, you should approach vendor provided benchmarks with the appropriate skepticism (we’ve included AMD’s test notes at the end of the article).

AMD also included numerous rendering benchmarks of its 16-core flagship against the 12-core Apple MacBook M4 Pro, claiming an up to 86% advantage in a v-ray workload. Naturally, the 14-core M4 Pro, also included in the benchmarks, is more competitive, but AMD still holds a stout lead in the Blender, Corona, and v-ray selection of benchmarks. However, the Ryzen AI Max+ isn’t as performant in the multi-threaded Cinebench 2024 test, beating the 12-core M4 Pro by a scant 2%, and trailing the 14-core M4 Pro by 3%.

AMD also took another swipe at Intel’s Lunar Lake Core Ultra 288V flagship, showing massive 3D rendering performance advantages of 2.6X faster on average, with advantages ranging from 340% to 402% in this selection of benchmarks. The 288V is geared for lower-power laptops, so comparisons against Intel’s newly-launched Core Ultra 200H and 200HX series, the latter with a dGPU, will be interesting.

AMD used a reference motherboard for its tests (a test board that typically has unlimited cooling capacity), so these results won’t reflect the performance you can expect in thermally constrained laptop. In contrast, the Apple and Intel benchmarks were generated with shipping laptops. AMD lists the tested TDP at a base of 55W, but power consumption can stretch much higher during heavy loads, especially with beefy cooling on a reference motherboard. This can lead to lopsided comparisons – for instance, the 288V has a 37W max TDP. All of the Ryzen AI Max configs were tested with only 32GB of RAM for these benchmarks.

Ryzen AI performance and systems

AMD also shared some rather impressive results showing a Llama 70B Nemotron LLM AI model running on both the Ryzen AI Max+ 395 with 128GB of total system RAM (32GB for the CPU, 96GB allocated to the GPU) and a desktop Nvidia GeForce RTX 4090 with 24GB of VRAM (details of the setups in the slide below). AMD says the AI Max+ 395 delivers up to 2.2X the tokens/second performance of the desktop RTX 4090 card, but the company didn’t share time-to-first-token benchmarks.

Perhaps more importantly, AMD claims to do this at an 87% lower TDP than the 450W RTX 4090, with the AI Max+ running at a mere 55W. That implies that systems built on this platform will have exceptional power efficiency metrics in AI workloads. The Ryzen AI Max models all have an integrated XDNA 2 NPU AI engine capable of delivering up to 50 TOPS of performance, but AMD didn’t share benchmarks for the NPU.

Speaking of systems, AMD’s partners will have systems built around the AI Max chips coming to market this year. HP has a ZBook Ultra G1 compact workstation and an HP Z2 mini G1a workstation, but the release date for those systems is undefined. Additionally, Asus has its gaming tablet with a detachable keyboard, the ROG Flow Z13, coming to market in Q1 of this year. That implies a hefty amount of gaming performance packed into a very slim form factor. Here’s hoping we can find these at retail, as AMD’s flagship gaming chips have often suffered from poor availability.

AMD Ryzen AI Max

(Image credit: AMD)
TOPICS
Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • bit_user
    Uh, so what memory does it actually use? I'm guessing LPDDR5X @ 256-bit data width. Unless my math is wrong, a nominal bandwidth of 256 GB/s suggests LPDDR5X-8000?

    The total cache suggests they're using the same amount of L3 per core as their desktop/server CCDs. That makes me wonder if they're also using full 512-bit vector pipelines.

    The last big question I have (for now, at least) is what process node?
    Reply
  • bit_user
    The article said:
    if you have 128GB of total system memory, up to 96GB can be allocated to the GPU alone, with the remaining 32GB dedicated to the CPU. However, the GPU can still read from the entire 128 GB memory, thus eliminating costly memory copies via its unified coherent memory architecture.
    You'd think so, but this actually requires software to be written accordingly. For games, Microsoft only somewhat recently introduced a feature in Direct3D for the app/game to share memory with the GPU. Unless the programmer specifically uses this API feature, games will still be copying assets into the GPU's memory segment, in spite of the fact that they're merely logical partitions within the same physical memory.
    https://www.tomshardware.com/news/dx12-optimization-cpu-gpu-access-vram-simultaneously
    @JarredWaltonGPU , perhaps reach out to AMD to see if they know which games do this, since it might make for some interesting benchmarks.
    Reply
  • SocDriver
    Its great to see that this will be on mini PC's from various OEMs but I would love to see this on an ITX motherboard with 3 or 4 M.2 slots and a 2.5gb ethernet port. I can think of a few small sub 4L cases I would love to put this in and use as a portable computer.
    Reply
  • Jame5
    CAMM2 support? 128GB(+?) support of user replaceable memory is ideal.

    Weird that they dropped from 192GB max support to 128GB on this though. I guess it has to do with the new IMC.
    Reply
  • thestryker
    bit_user said:
    The total cache suggests they're using the same amount of L3 per core as their desktop/server CCDs.
    All but the 6 core seem to share desktop. configuration. I'm curious if there's any cache dedicated to the GPU as they don't really say one way or the other.

    Product pages (other than 6 core) are up:
    https://www.amd.com/en/products/processors/laptop/ryzen/300-series/amd-ryzen-ai-max-plus-395.html
    Reply
  • bit_user
    Jame5 said:
    CAMM2 support? 128GB(+?) support of user replaceable memory is ideal.

    Weird that they dropped from 192GB max support to 128GB on this though. I guess it has to do with the new IMC.
    It really depends on whether they support dual rank memory. If so, then 32 Gb chips should enable 256 GB. However, if that were the case, then they should already support 192 GB. So, maybe the 128 GB figure is already anticipating the 32 Gb chips.
    Reply
  • JamesJones44
    I find it interesting AMD didn't compare apple's to apple's when it comes to Apple Silicon. It would have made more sense to compare the 16 core/40 GPU core M4 Max to the AI Max+ 395 since they have the same basic config (16 compute/40 GPU). I'll be curious to see the independent reviews to see how close it comes to Apple's upper end offering.
    Reply
  • JamesJones44
    AMD's marketing department went all out with Max, Pro, + ! All they needed to do with throw in words Ultra and Extreme to cover all the upped end product naming cliches
    Reply
  • Notton
    1.4x gaming performance compared to the C9U 288V seems like a conservative estimate?
    It's >228% faster in the 3DMark benchmarks.

    The 256GB/s memory bandwidth is better than expected for a 256-bit bus. and at least they didn't go for the measly 6400.
    I hope OEMs decide to pack in LPDDR5X-10700, although 8533 would be fine too.
    Reply
  • usertests
    bit_user said:
    Unless my math is wrong, a nominal bandwidth of 256 GB/s suggests LPDDR5X-8000?
    If so they pulled back (or never had) LPDDR5X-8533 (273 GB/s).
    Notton said:
    1.4x gaming performance compared to the C9U 288V seems like a conservative estimate?
    It's >228% faster in the 3DMark benchmarks.

    The 256GB/s memory bandwidth is better than expected for a 256-bit bus. and at least they didn't go for the measly 6400.
    I hope OEMs decide to pack in LPDDR5X-10700, although 8533 would be fine too.
    128% faster, 2.28x, not 228%. But yes that is strange. They don't want to talk about Strix Halo gaming, maybe because it will be irrelevant compared to the AI use case.

    Some leaks said LPDDR5X-8000 which appears to be correct, some said 8533. Could they even use higher than that?
    bit_user said:
    The total cache suggests they're using the same amount of L3 per core as their desktop/server CCDs.

    The last big question I have (for now, at least) is what process node?
    Strix Halo should be using desktop Zen 5 chiplets. It could even get X3D in the future.
    thestryker said:
    I'm curious if there's any cache dedicated to the GPU as they don't really say one way or the other.
    Leaks said 32 MiB Infinity Cache. Now we wait for more details.
    Reply