I have been frustrated with the limited number of PCIe lanes in mainstream desktop platforms for quite some time, so I decided to do something about it. Having a limited number of PCIe lanes is not a big consideration for many workloads, but it is a big bottleneck in some scenarios.
This post will recount my experience putting together a workstation (WS) machine using an AMD Ryzen Threadripper PRO 5955WX processor. This is about building the StorageBeast.
Here is a video of StorageBeast in action:
What Am I Trying to Do?
My very narrow use case is simply trying to see how fast I can get SQL Server 2022 to do a full database backup of a 2TB database, using different backup methods and settings. This relates to real-world problems backing up large databases, but what I am doing with this research is not what I would ever do with a Production database.
For example, I am using consumer-grade TLC NAND storage in an HEDT/WS class platform with no RAID protection, non-ECC RAM and no hardware redundancy. I am also willing to configure the database and the system as needed to optimize storage and backup performance as the first priority.
BTW, I spent some time trying to push the storage limits of a mainstream desktop platform late last year.
A modern desktop system needs eight main components. This is true whether you are building a gaming rig or something more like what I am building. These are the main components:
- CPU cooler
- Power supply
Your CPU choice drives many of your other component choices, so you need to choose an appropriate CPU first. In my case, I decided to use an AMD Ryzen Threadripper PRO 5955WX processor. This is a lower end SKU that only has 16C/32T, but it still has the full feature set of the AMD Ryzen Threadripper PRO 5000 series. This includes having support for 128 PCIe 4.0 lanes, which was crucial for my project.
The 7nm Zen 3 Threadripper PRO 5955WX has a base clock of 4.0 GHz and a max boost clock of up to 4.5 GHz, along with a 64MB L3 cache. It has eight memory channels that officially support up to 256GB of DDR4-3200 RAM, and it does NOT have integrated graphics. This means you must have a discrete GPU.
The processor also does not come with a boxed CPU cooler, so you will need an aftermarket CPU cooler that works with a Threadripper processor.
The Ryzen Threadripper PRO 5000 series was launched in March 2022, but it was OEM exclusive until August 2022. This processor requires an sWRX8 socket motherboard, which dictates your motherboard choice.
Threadripper PRO processors are huge compared to modern mainstream desktop processors. They also have a default TDP rating of 280W, so you will need a pretty decent CPU cooler that covers both of those requirements.
I decided to use a Noctua NH-U9 TR4-SP3 CPU cooler. This is the smallest TR4-SP3 CPU cooler that Noctua makes, with two 92mm PWM fans. I wanted this model in order to make sure I could use the top PCIe slot and also get good RAM clearance. It looks tiny mounted in the system, but it has performed quite well so far. At idle the CPU is typically running at about 35C, which is quite good.
Using a Threadripper PRO processor drastically limits your available motherboard choices. I ended up choosing an ASUS Pro WS WRX80E SAGE SE WiFi motherboard. This is an eATX motherboard that is quite physically large (which will drive your case selection). ServeTheHome has a good review of this motherboard here:
This motherboard has a lot of useful features for my use case. This includes three M.2 PCIe 4.0 slots, seven PCIe 4.0 x16 slots, eight DDR4-3200 memory slots and two Intel X550 10GbE ports. There is also one ASUS Hyper M.2 x16 Gen 4 PCIe card bundled with the system.
All of the PCIe 4.0 x16 slots support 4×4 bifurcation, which will let me run multiple quad M.2 to PCIe 4.0 cards. I will need one PCIe slot for a discrete GPU, but the other six PCIe slots will be available for storage. This configuration will let me have a total of 27 M.2 PCIe 4.0 NVMe SSDs in the system!
This platform officially supports DDR4-3200 RAM in an eight-channel configuration. WRX80 motherboards have eight memory slots that can support a 32GB DIMM in each slot, giving you a total memory capacity of 256GB.
With server-class ECC RAM, you can have up to 2TB of RAM in this motherboard. In order to get your full memory bandwidth, you need to populate all eight memory slots.
Luckily, I had a G.SKILL Trident Z RGB 128GB DDR4-3200 CL16-18-18-38 memory kit already on hand, which I started out with. Later, I bought a second identical kit to let me fill every memory slot to maximize my total memory capacity and bandwidth. This memory runs rock solid with DOCP (XMP) enabled, even with all eight slots populated.
Depending on what CPU and GPU(s) you use with this system, you could need a pretty large capacity power supply. I had an ASUS ROG Thor 850P power supply on hand, which has more than enough capacity for my configuration. If you have multiple high-end discrete GPUs, you might need a larger capacity GPU.
This motherboard requires two eight-pin EPS power connections plus one six-pin PCIe power connection from the power supply. You will need a power supply with at least two eight-pin EPS connections.
The ASUS Pro WS WRX80E SAGE WiFi motherboard is a very large eATX motherboard that is 309.8mm x 330.2mm. This means that you will need a relatively large tower case to make the motherboard fit inside. Right now, the StorageBeast is still mounted on an Open Benchtable as I experiment with it. I have not decided what permanent case I will use just yet. These two are the leading candidates:
Storage is the main use case for this system. Fortunately, there are a LOT of options with this system. The processor and motherboard support PCIe 4.0 storage, but not the latest PCIe 5.0 storage.
This is fine, since consumer PCIe 5.0 storage is still pretty immature and expensive. The initial M.2 PCIe 5.0 NVMe TLC SSDs that are available right now (with Phison E26 controllers) are limited to about 10,000 MB/sec for reads and writes because they are using 1600 MT/sec NAND cells.
The ASUS motherboard I am using has three M.2 PCIe 4.0 slots connected to the CPU, so I filled them with 2TB M.2 PCIe 4.0 NVMe TLC SSDs from three different vendors.
Just in case you haven’t noticed, NAND SSD prices have gone down a LOT in the past few months. This means that high-end 2TB M.2 PCIe 4.0 TLC SSDs are in the $150-$175 range, while 1TB models are in the $75-$100 range. It is absolutely a mistake to buy a smaller than 1TB capacity model SSD, since the price/GB is much higher and the performance is also significantly lower.
I also think it is a mistake to buy a QLC NAND SSD for this type of workload, since they have much lower sequential performance and lower write endurance, but their prices are not significantly lower.
This motherboard has six available PCIe 4.0 slots (after using one for a discrete GPU) that all support bifurcation. You have to go into the BIOS and enable that for each PCIe slot.
The motherboard is bundled with one ASUS Hyper M.2 X16 Gen 4 Card that lets you have four M.2 PCIe 4.0 NVMe SSDs in each available PCIe 4.0 slot on the motherboard.
Depending on how many of these cards you decide to buy, you will need a number of M.2 PCIe 4.0 NVMe SSDs to fill them. Even with the current very low NAND SSD prices, this can get expensive! These are some great 1TB choices.
- 1TB Samsung 990 PRO (Amazon)
- 1TB Samsung 980 PRO (Amazon)
- 1TB SK hynix Platinum P41 (Amazon)
- 1TB Solidigm P44 Pro (Amazon)
Right now, 2TB SSDs are less expensive per GB (but they are more expensive per SSD). The 2TB models also have larger SLC write caches, so they can deal with larger sustained writes before write performance decreases dramatically.
I’m sure I will have people wondering why I did not use 4TB or 8TB M.2 SSDs, and the reason is the cost per GB. My preferred brands and models also don’t have 4TB or 8TB sizes.
There will also probably be people pointing out that random I/O performance is more important than sequential I/O performance for most workloads. That is true, but this specific workload relies on sustained sequential performance.
If you can afford it, a 2TB model is the best choice for this sort of sequential workload. Otherwise, I really like the 1TB Solidigm P44 Pro because of its low price and large (for a 1TB drive) 198GB SLC write cache.
Here are the reported SLC cache sizes for most of the drives that I have mentioned:
- 2TB SK hynix Platinum P41: 296GB
- 2TB Solidigm P44 Pro: 289GB
- 2TB Samsung 990 PRO: 240GB
- 1TB SK hynix Platinum P41: 213GB
- 1TB Solidigm P44 Pro: 198GB
- 1TB Samsung 980 PRO: 113GB
I will have another post soon with details about how this system is working and performing so far. This is how the system looks in HWiNFO64.
This is what the system looks like on the Open Benchtable right now.
If you have any questions about this post, please ask me here in the comments or on Twitter. I am pretty active on Twitter as GlennAlanBerry. Thanks for reading!