Teaching AI to Machine Parts

November 2024 - May 2025

If you handed me the controls of a CNC milling machine and asked me to manufacture a precision part, I'd have no idea where to start. There are dozens of parameters to set: how fast should the cutting tool spin? How quickly should it move through the material? How deep should each cut be? Get these wrong, and at best you'll produce a useless chunk of metal. At worst, you'll snap a tool worth hundreds of dollars or damage a machine worth hundreds of thousands.

Yet experienced machinists make these decisions intuitively. After decades at the controls, they know that aluminum likes high speeds and deep cuts, while titanium demands patience and finesse. They can hear when a tool is struggling, feel when parameters need adjustment. This knowledge, accumulated over entire careers, typically retires with them.

For my bachelor thesis at TUM, I tried to build a reinforcement learning system that could learn machining parameters through trial and error. Not by studying manuals or copying examples, but by actually trying different parameters and learning from the results.

CAM software interface showing parameter selections for a milling operation

The main challenge was integrating reinforcement learning with industrial software that wasn't designed for it. HyperMILL, the CAM software I used, is built for human operators clicking through menus, not for automated systems making thousands of experimental decisions. Every parameter change required the software to recalculate the entire toolpath, simulate the cutting process, and generate the machine code. Each training iteration took about 12 seconds.

I implemented two different reinforcement learning algorithms: PPO (Proximal Policy Optimization) and SAC (Soft Actor-Critic). They behaved quite differently during training. PPO was cautious and kept exploring throughout, while SAC tended to commit to strategies once it found something that worked.

The training process was painfully slow. Each algorithm needed about 20 hours to try different combinations of tools and parameters for six different machining operations. I'd start a training run in the evening, and by the next afternoon I'd have results. Sometimes the system would get stuck selecting obviously wrong tools, like trying to use a tiny 2mm endmill for a job that needed a 100mm face mill. Other times it found parameter combinations I hadn't anticipated.

The most interesting part was watching the learning happen. SAC, the more decisive algorithm, went through distinct phases. For the first thousand attempts, it was all over the place, trying random combinations. Then suddenly, around episode 1500, something clicked. The performance metrics shot up and stayed there. It had discovered a strategy that worked and committed to it. Meanwhile, PPO kept experimenting throughout the entire training, never quite settling on a single approach, constantly second-guessing itself.

Looking at what each algorithm learned revealed their personalities even more clearly. PPO developed a strange preference for extreme values, often choosing either the minimum or maximum for parameters, rarely anything in between. It also fell in love with one particular tool, an 8mm endmill, and tried to use it for almost everything. SAC was more nuanced, selecting different tools for different operations and finding parameter values in the middle ranges that seemed more reasonable.

When I compared their choices against industry-standard parameters from machining handbooks, the limitations became obvious. For finishing operations, where surface quality is paramount, the handbook recommended conservative parameters: stepovers of around 10% of the tool diameter to ensure a smooth finish. The system, focused solely on minimizing time and energy, chose stepovers of 60-100%. It found the fastest way to complete the job while completely ignoring that the surface would look terrible. A perfect example of optimizing exactly what you told it to optimize, nothing more.

Within its scope, the system worked. SAC converged 24% faster than PPO and reliably completed all six operations according to the optimization criteria. The results also revealed just how much nuance exists in real machining expertise beyond what the framework could capture.

The biggest limitation was that the system could only learn one part at a time. Train it on a different part, and it had to start from scratch with another 20 hours of training. A human machinist, after learning to machine one aluminum bracket, would immediately apply that knowledge to similar parts. The system had no such ability to generalize.

There's also the question of trust. Would I use these parameter selections on an actual CNC machine with an expensive workpiece? Probably not. The system had no understanding of context beyond its narrow optimization goals. It didn't know that certain materials are expensive, that some features are cosmetic while others are critical, or that machine time might be less valuable than tool life in certain situations.

Working on this project gave me enormous respect for the machinists whose knowledge I was trying to capture. What seems like simple parameter selection actually involves juggling dozens of considerations, many of them learned through expensive mistakes. Every crashed tool, every scrapped part, every successfully completed job adds to an internal model that's incredibly difficult to formalize.

There's still a long path from research prototype to practical system. As experienced workers retire, we're losing decades of accumulated knowledge. Whether reinforcement learning can effectively capture that intuition remains an open question. The thesis demonstrated that the basic integration is feasible and identified the key challenges that would need to be solved for real-world use.

Looking back, I spent six months building a system to do something a human could learn in a few weeks of apprenticeship. The irony wasn't lost on me. But the project taught me something valuable about the gap between academic machine learning and real-world manufacturing. What looks like straightforward parameter selection turns out to involve layers of context, judgment, and accumulated knowledge that are difficult to formalize. Understanding those limitations is as important as the technical implementation.