Being part of DRN for almost two decades has given me an incredible front row seat of the ever changing consumer tech space. Fads come and go, and right now the biggest buzz word has to be AI.
With some very generous assistance by Kingston Technologies, I am setting out to dip my toes into the world of local AI.
Why Local AI Is Becoming More Important
From ChatGPT to Gemini, Claude, Grok. There are an abundance of commercial, subscription based AI agents out in the wild. Don’t miss Microsoft has been trying to shoehorn CoPilot into absolutely everything like a supercharged Clippy. Let’s not forget the uncharted wilderness of OpenClaw (nee Clawdbot).
But there is more to it – the frontier of local LLMs. AI agents, running in your private environment. Nothing sent out to the internet, under the control of corporations who will take your data and use it to train their own models.
Jo has covered this off in quite a bit of detail recently. Some of her review is very technical, but the bit about the pros and cons on local LLM is important.
Can You Run Local AI Without Expensive Hardware?
I had a plan. And “a plan never survives the first thirty seconds of combat” – Ambler Furry (ok, it came from Firebreak by Richard Herman Junior.)
Late last year I was lucky enough to inherit a not too old NUC. It has been sitting behind my desk wrapped up and not doing much until I started to plan where the good DRN boat should be heading this year.
With Jo spearheading our latest gen hardware running local AI instances, I went the other way and thought I will see what it would take to cobble together a rig that will run a local LLM without spending thousands of dollars just on a GPU alone.
RAM and SSD has been in immense demands with AI data centers essentially taking everything produced. This scarcity has now been compounded by the Middle East situation. So when I say without spending thousands … look RAM isn’t getting cheaper any time soon.
Kingston KC3000 SSD and FURY DDR5 RAM
I reached out to Kingston with a proposal – let’s see if I can sustainably build and run a local LLM, with a view to leveraging it to drive my home automation smarter.
DRN has worked with Kingston in the past, and the reason I approached them for this project was the assurance that I can pseudo-enterprise grade hardware without the price tag.
They kindly provided me with a KC3000 M2 PCIe 4.0 NVMe SSD. It offer speeds up to 7,000MB/s with a graphene aluminium heat spreader. In the confined space of a NUC, I need to keep the drive cool and help maintain performance stability without going into thermal constraints territory.
I was also provided with the FURY Impact DDR5 32GB (KF556S40IB-32), 32GB of RAM in a single module. Yes I have two channels on my motherboard, and having the RAM in two modules would have provided a better throughput overall. But I wanted the flexibility to bump the system to 64GB if things pan out.
Running a local LLM, I need the hardware to deliver consistent throughput under load.
The ASUS ExpertCenter Hardware Specifications
- Model: Asus ExpertCenter PN53
- Processor (CPU): AMD Ryzen 7 7735HS (8 Cores, 16 Threads)
- Architecture: Zen 3+ (Rembrandt Refresh)
- Base Clock: 3.2 GHz (Boost up to 4.75 GHz)
- Power: 35W default TDP (configurable up to 54W)
- Graphics (iGPU): Integrated AMD Radeon 680M (RDNA 2 architecture)
Memory & Storage
- RAM: 32GB Kingston KF556S40IB
- Storage: Kingston KC3000 M2 PCIe 4.0 NVMe SSD
Connectivity & Ports
- Wireless: Wi-Fi 6E & Bluetooth 5.2.
- Networking: Onboard 2.5GbE LAN (RJ45).
- Front I/O: 1x USB4 Type-C, 2x USB 3.2 Gen1 Type-A, 1x Audio Jack.
- Rear I/O: 1x USB4 Type-C (supports Power Delivery input), 3x USB 3.2 Gen1 Type-A, 2x HDMI 2.1, and a configurable port (yours looks to be DisplayPort or HDMI).
I do not have a RTX GPU magically hidden in that NUC.
My power consumption is limited to a max of 54W.
What is important to note here, is that I am not running any AI optimised hardware, nothing beefed up with hundreds of watts in power supply. Just some (kind of) supported hardware that could get a local LLM up and running.
What about the caveat of kind of? When you don’t run a Nvidia GPU with full CUDA support, it turns out getting a local LLM running is definitely not plug and play.
Installing Ollama and Open WebUI on Proxmox VE
On the bare metal, I ran a hypervisor so I can virtualise my machines.
Whilst I have plenty of experience with VMWare ESX, ESXi, Hyper-V and VirtualBox; for my own home environment I like to run Proxmox VE. It has a ridiculously low footprint, full featured and free. Something that can’t be said for most of the alternatives. In particular Broadcom has murdered the free ESXi license back in 2024 which was a key driver in me moving to Proxmox.
However with the move to Proxmox VE, hardware efficiency becomes paramount. I’m no longer just running an OS; I’m managing a dense ecosystem where every MB of RAM throughput counts.
In the past I have a 6th gen Intel NUC running my Home Assistant instance and a couple of Linux and Windows OS virtual machines that I turn on and off as required for various things I do. For that purpose a ten year old platform was more than adequate.
But for the local LLM I needed something that is not Jurassic Park, and that’s where the Asustek PN53 came into play.
I consider myself a pretty technical person, and this one took a bit more than just elbow grease. According to plenty of research done by me prior to embarking on this journey, everything points to a supported configuration … I am going to talk generally about what I did, but not so much the actual nitty gritty of the setup.
Firing up the Ubuntu in a LXC (LinuX Container) was simple. Proxmox makes life easy for this.
Next up was installing Ollama. Think of it as an app store for AI models. It is a free, open-sourced tool that allows you to pull and run LLMs locally on your own computer instead of relying on the cloud.
On top of that, I also installed Open WebUI, which gives you a nice ChatGPTsque front end that most users are familiar with. After all, who wants to be facing just a command line black window, without history or capability to pull in attachments?
The tricky part of the setup was optimising the setup for a non-nVidia CUDA based graphics card. The PN53 runs ROCm (Radeon Open Compute) which is different to CUDA. There has been many hours of effort expended to configure the ROCm environment and overriding hardware IDs to ensure the AMD Radeon 680M iGPU was properly leveraged, rather than falling back to the much slower CPU-only execution.
I got there in the end by specifically using “HSA_OVERRIDE_GFX_VERSION=10.3.0” to trick the software into treating the Radeon 680M iGPU as a supported discrete architecture.
Is it blisteringly fast at generating tokens? No. This configuration will never be capable of replicating the “speed and feel” of a cloud AI, but I am not using it to take over the world either. It was a proof of concept to show that I don’t need a state of the art rig to run a limited local AI for specific purposes.
More importantly the core components inside the NUC, being the Kingston KC3000 and FURY Impact RAM are not the limiters. My Home Assistant dashboard now actively monitors these physics in real-time, tracking the thermal curve of the KC3000 as it handles the IOPS-heavy task of swapping AI model layers.
The laws of physics dictates a number of factors:
- the more computational power you are using, the higher the energy draw
- the higher the energy draw, the higher thermals
- the higher the thermals, the more surface area is needed for thermal management
- if there is insufficient cooling, your components will got into thermal throttling, or encounter instability
A reality check moment here, I am absolutely pushing this small form factor setup to its physical limits. The endurance of the components is not a theoretical discussion, but a real necessity. Kingston offered transparent endurance ratings – the 4TB KC3000 is rated for up to 3.2PBW, providing defined write-lifespan expectations for sustained workloads. The FURY Impact DDR5 memory also incorporates on-die ECC at the chip level to enhance internal data integrity.
And as one would expect from an enterprise grade vendor, they maintain a long-standing commitment to customer service and global partnership networks.
Real-World Performance and Final Thoughts
I have a NUC running Proxmox VE. Under that hypervisor I have:
- Ollama with Open WebUI
- Home Assistant
- Bunsen Labs Linux OS
- Tiny10 (heavily stripped down Windows 10)
Does it run in harmony? Kind of, when I ask my Ollama an open ended question, it does hammer the hardware pretty hard. The thermals have so far stayed in check, but it seems power draw is helping to keep things under control.
Given that I set out to prove that one can run a local AI instance without having to expend significant amounts of money on the latest and greatest hardware, I would absolutely declare this mission accomplished.
In the next stage, I will be exploring just how useful (or not) this is to manage my Home Assistant instance, and if it can do things smarter than the in-built automation logic.
DRN would like to thank Kingston for providing the KC3000 SSD and FURY Impact DDR5 KF556S40IB RAM for this project.




