
This is not a post telling you AI is the greatest thing in the world. I believe that it is a tool and should be used like one. Like any tool, it should be reliable, fast, and accurate. However, to get all three of these traits, your AI has to be hosted in a datacenter. The datacenter provides enough compute for these huge foundational models—more than your laptop or desktop could ever dream of. However, AI's new obsession should be a race to the bottom. "Bottom" being the least powerful devices.
For reference, this AI essay is coming from the perspective of using AI as a coding tool. I use it at work as a tool for developing software. At work, I am working with quite large repositories of code. The code assistant I use constantly crashes or takes an abhorrent amount of time. The other code assistant some of my coworkers have reaches a token limit frequently. This all combines into a very slow and sometimes limited experience. But when it works, it's great! However, the solution for our problems is literally on the device I am working on.
Jeff Geerling made an incredible new video showing four Mac Studios running as a cluster to host massive AI models. The performance shown in the video is quite impressive. But here is the best part about running your AI models locally: you are only limited by the hardware you have. There is no data cap found like in other AI models where, after so many prompts, you either have to wait until your token count refreshes or be throttled by the system. There are fewer security concerns because instead of sending proprietary code to an AI company that can use the information, it all remains on the hardware.
I don't know the economics of how much companies spend on AI, but I would have to guess it's somewhere from $50,000 to $100,000 per 100 employees, depending on what and where they have their tools. This is not chump change. Especially as I can only imagine AI companies will continue to charge more, as they currently operate at thin or negative margins. OpenAI is still operating at a loss, for perspective.
Let's look at some napkin math: this HP ZBook Power is $7,500. It comes with an Nvidia RTX 3000 (8 GB) GPU. No slouch, but the VRAM capacity is quite low for running current large local models. Apple has a slight edge in memory capacity, but lower GPU performance usually; a MacBook Pro 16 with an M4 Max and 128 GB of Unified Memory is $5,399 (with the same amount of storage as the HP). So, if a company bought 100 HP dev workstations, that is $750,000. If a company bought 100 Apple MacBooks, it would be $539,900.
Now, this is almost 5–7x what it costs to just rent from an AI company for their large language models. But let me explain why the cost is worth it. First of all, this isn't a yearly subscription; this is a one-time cost. If you bought 100 MacBooks and they were used for 5 years—which is honestly close to how often new hardware is rotated in and out—this would be about on par with the money spent on renting AI. However, you might argue that these AI companies are always producing better AI models; well, you can always upgrade the local AI models too. It's not like you are locked into that platform.
The other factor for local AI is the time wasted. When you have to send a large prompt to a cloud-based AI, it will take a while for it to parse—and that is if it doesn't fail or a token limit isn't reached. With local AI, if it fails, you try again, and you never run out of tokens (not in the prompt, but available for use like credits) because you aren't paying for the compute. Also, since nothing is being sent, your proprietary code is de facto safer from prying eyes.
I am no fool, though. It is currently cheaper to just pay someone to run these AI models and not worry about them. But I can dream that I can have a local, on-device AI that can help me get my work done fast and without fear of throttling. The next step for AI should be size reduction. Just look at how datacenters are sprouting up like weeds, consuming tons of electricity to run themselves. AI should focus on efficiency now, just like how ARM processors have become so popular due to their equal (or better) performance in an incredibly more efficient package. There are quantized AI models working to make this a reality, but I would say until we have the same performance as thinking multi-billion parameter models, we haven't won. Also, we should think about these things from a conserving-the-earth perspective, too. And... if AI is more efficient, maybe more hardware would be cheaper and available for PC gamers.