Experimenting with AI Code Generation

Lately, I’ve been experimenting with Cursor, one of the new GenAI coding tools.

I wanted to see how it would do with the simulation I created. The simulation is a Rails app powered by an engine implemented as a Ruby gem. The code base isn’t huge by enterprise standards, but it’s large enough and complex enough that I thought it could put code generation through its paces. (Also, I’m the sole developer, so if I really messed things up I am the only person affected.)

First Steps

I asked Cursor to examine the simulation engine. Talking through the design decisions I’d made and opportunities for improvement felt like a conversation with an enthusiastic but slightly naive developer with an encyclopedic knowledge of design patterns. One of Cursor’s early suggestions was that I make a stateful part of the code event-driven. But when I pushed on why, and re-explained the intention behind the code, Cursor immediately backed off, admitting: “You’re right. An event-driven architecture is not appropriate here.”

But other ideas it had were much more insightful and valuable. I was honestly surprised by the depth of understanding it developed about the code. I ended up using Cursor to generate a much better README to describe the code structure and the future directions of the engine. Win!

Can It Fix My GUI?

I’m not a UI designer. While I love fonts and I know my way around HTML and CSS (and Rails partials and ViewComponents), I’m very much an amateur. So I’d gotten the GUI for the simulation to a good enough state, but left it there.

Recently I had feedback that the GUI was cluttered and confusing. Plus, I knew that it did not work well on mobile. So I decided to see how well Cursor would do fixing the GUI. It’s worth noting that the Rails app is a bit quirky because of the nature of the simulation. So even though I knew Cursor could figure out a boilerplate Rails model-view-controller app, how would it do with this code base?

I set an initial goal for the UI work: tweak the UI to clean up the most egregious issues and make it work on a large mobile screen (my iPad). But as I started experimenting I realized that Cursor was doing a fantastic job of understanding all the quirks of the code base. I decided to be much more ambitious. What if I used Cursor to clean up all my shoved-in-the-wrong-place styles, artisinally hand-crafted padding and margin settings, and hard-coded div dimensions? Forget small tweaks: Cursor could revamp the UI entirely to make it more consistent and easier to maintain.

Then I thought even bigger: what if I got Cursor to add dark mode?

The problem was that I had to be careful about biting off more than I could chew. The simulation has to be stable next week for the workshop Joel Tosi, Chris Pipito, and I are running. Normally, I wouldn’t even attempt changes this big so close to a deadline. But I could timebox my Cursor experiments, allowing me to throw the work away without regret if things went badly.

Spoiler: the new version of the GUI, now with dark mode, is in production. It turns out Cursor really can do magical things. But it can also do things that make you question reality. Let’s look at some of my biggest lessons learned.

The Good

Before tackling the UI issues I decided to tackle something more foundational. My Javascript hot reload configuration was broken, and had been for an embarrassingly long time. That meant every time I made a change to the Javascript I had to recompile. The JS build cycle time took several minutes. Ugh. So I avoided making changes to the Javascript unless I absolutely had to. I had a similar issue with changes to my application’s CSS file. If Cursor and I were going to fix the UI, we needed a fast feedback cycle.

Why had I let that long feedback cycle persist for so long? It’s not that I didn’t want to fix the hot reload problems. I’d tried. Over the last few years I’d taken multiple runs at it, burning hours and becoming increasingly frustrated each time. I just couldn’t figure it out. It wasn’t just a simple error in my configuration; it was multiple issues.

So I asked Cursor what the problem might be. It had ideas. The first few tries didn’t work (like I said it wasn’t just one misconfiguration but a combination of factors). But each time, Cursor had another idea. Ultimately we fixed both hot load configuration problems in less than an hour.

Cursor had just paid for itself on the first day of my paid subscription. (Also just to be clear: I paid for Cursor myself. This is not a paid post.)

The Less Good

Now that my local test server was picking up Javascript and CSS changes immediately, I put Cursor to work on fixing my layouts.

I quickly figured out that the key to making Cursor work for me was committing small, frequent changes. Every time something started working, even if it wasn’t “done,” I did a git commit. This habit of making small local commits is something I normally do when TDD’ing, but it was even more important with Cursor and I dialed the frequency of my commits up to 11. More than once, I had to git reset --hard because Cursor took things in a wildly unexpected direction.

Sometimes, Cursor went off the rails (so to speak), touching files it had no business messing with. I learned to be specific about what I wanted changed, and just as importantly, what I didn’t want it to touch. I discovered that adding “without modifying any JavaScript” to prompts ensured Cursor stuck to simpler CSS solutions.

Then there was the weird stuff. At one point Cursor randomly decided to insert an “Architecture” heading on a dashboard. It was an almost-accurate heading and in the right general spot, but still… why? It was trying to be helpful, but I asked it to change the structure of the page, not the contents. That was my cue to tell it to stop making creative decisions on my behalf.

The WTF

When I save a change to a file using a traditional IDE, that change stays put.

Much to my surprise that’s not quite how Cursor works. I saw instances where I thought Cursor and I had made changes, but then those changes were undone. I wasn’t sure what to make of that…was it my imagination? Was it a side effect of a later prompt? But then Cursor did something I couldn’t ignore or explain away.

Backing up a step: I was keeping RubyMine open side-by-side with Cursor. Whenever I felt the itch to just fix something myself I used RubyMine because it’s the IDE I’m most comfortable in. At one point I manually added borders to debug a CSS layout issue that Cursor hadn’t been able to fix. Once I was done, I removed the borders, committed my changes on the command line with git, and returned to Cursor to start a new small change. I asked Cursor to do something completely unrelated to the previous CSS changes, and suddenly those diagnostic borders were back in all their bright red and green glory.

One of the benefits of generative AI is that you can ask the model what happened. So I asked Cursor why it reverted my changes. Cursor explained that it uses the version of the file it previously read, not the version saved to disk, as the starting point for future changes. Oh! So much for my quaint notions of file management.

YOLO Is Good, Actually

I knew about YOLO mode, but thought to myself “LOL no.” Why would I let an AI run rampant over my code?

But I also wanted to continue using get reset --hard to revert Cursor changes I didn’t want to pursue. So it turns out, YOLO mode is ironically the safest way for me to avoid the problem of Cursor reverting its own changes. With YOLO turned on, Cursor auto-approves the changes it makes, so it’s internal version of the file and the file on disk are more likely to match.

Of course there’s a catch. YOLO is marked as dangerous for a reason. So I wanted to be specific about what Cursor was allowed to do on my behalf. I used the YOLO prompt allow file writes and web searches, do not allow terminal commands and that seemed to do the trick.

Even then there’s still one more step: if I made changes outside of Cursor, either with a git reset or by manually editing a file, Cursor could still have an older version of the file read into memory. Cursor helpfully suggested that I should tell it whenever I make manual edits to avoid having it work with stale data. I’ll do that, and also run git status to verify I’m only seeing expected changes more often.

Next Steps

Cursor is now my daily driver and I am absolutely blown away by its capabilities. With it I’m not just 2x or 10x more productive: I’m infinitely more productive because I am willing to tackle projects I otherwise wouldn’t dare try on my own. It’s not going to replace me; rather, it feels like wearing a mech suit. Without Cursor, the only way I would have added dark mode to the simulation was if I magically fell into a pile of money that enabled me to hire a team. But with Cursor? I can do a whole lot of heavy lifting all on my own.

That said, Cursor is still clearly in its infancy. It crashes. It does weird things. More generally, GenAI IDEs have a learning curve. If you’re not the kind of person who works in small steps naturally, it’s a habit you’re going to want to learn. It takes effort to craft prompts that yield the desired results.

But if you learn how to set clear boundaries, watch for surprises, and keep a tight feedback loop, GenAI IDEs are an absolutely incredible force multiplier.