Phoenix.new Feedback

I want to start by saying that I love the idea of phoenix.new and I think it works very well at creating working mock-ups quickly. It obviously understands LiveView and Presence and other Phoenix-framework-specific concepts better than other models/tools. But what I’m getting for $20/mo feels severely underwhelming compared to what most other AI tools offer for similar price points.

For context, I’ve been developing software across a variety of industries and programming languages for almost 20 years, and I’ve worked at AI companies before, and I’ve worked at companies that were very AI-positive (i.e. encouraging/supporting developers to use AI tooling). I also have been using Claude Code for a while now, to surprisingly good results even within Elixir/Phoenix projects.

The following are some pain points that, as a whole, really stop me from considering phoenix.new as a long-term AI coding tool. I’ll typically use Claude Code as a comparison as that’s probably the best personal experience I’ve recently had for this kind of tooling.

Rewrites vs Edits

If I make changes to a file, phoenix.new seems to always rewrite a file it needs to modify rather than seeking to edit the file. This means I frequently have my changes erased and overwritten.

If someone is using phoenix.new solely to vibe code from start to finish, that’s not a problem.

If someone is using phoenix.new just to set up a new project and get some working bits and bobs before taking it over manually, that’s probably not a big deal.

If someone is trying to work with phoenix.new to shape the direction of the code and pick up the slack when it fails to materialize a proper solution, then that is a huge issue for it to just throw out those manual changes.

Bad at Validation

I’ve seen some other complaints about this and I want to reiterate it: when phoenix.new validates web components, it seems like it frequently fails to do so correctly. In what feels like it happens half of the time, a validation task will just hang.

Whatever the cause may be, it appears as though phoenix.new prefers to use command-line tools (like a headless browser) in an effort to fetch a web page and inspect the contents and then interact with it. That’s nice for some things, but why isn’t it generating test cases in test/? Or why isn’t it using Elixir script (.exs files) to validate things?

Without any kind of prompting or guidance, even Claude Code defaulted to creating temporary .exs scripts to validate that functions and features were behaving as expected. It even created test cases in test/ to ensure that web components were showing the appropriate elements and that interactive elements were triggering the appropriate events. How is phoenix.new not doing that by default?

I’ve also frustratingly had some cases where phoenix.new will run what it believes is a good validation against something I’m telling it is incorrect, and then tell me everything is working as expected now, but it’s wrong. The validation it wrote had a flaw, usually in the premise of what I said the issue was, and that just wastes credits and time. Claude Code very rarely gets the premise of an issue wrong, and the problem it usually has is that it will either (a) fail to identify why there’s an issue or (b) get into a fix-debug loop where it literally can’t solve the issue and will get stuck trying indefinitely if I don’t step in.

But I prefer that to what I frequently experience with validations made by phoenix.new.

Lower Quality, Less Output

In my experience, phoenix.new starts off way better than any competitor. With most AI tools, it’s best to start with at least an “empty” project. With any greenfield project I might use Claude Code on, I’ll always do some initial setup first (e.g. mix phx.new) and then build context around that before engaging with the agent. But with phoenix.new, it’ll do all of that for you. It’ll run other mix commands too, like mix phx.gen.auth or mix phx.gen.live or whatever.

I also love that it starts by generating a plan that it uses to direct itself on what actions it needs to take to get an MVP (minimal viable prototype) completed. And by the end of the very first prompt response you will have a working Phoenix app. That’s incredible. You can put in very little effort overall; just describe the kind of application it should be, boom here’s something you can immediately start working with.

But after maybe 30 minutes, you can start to see where the quality drops off. And as the quality drops off, you start realizing how little output you are actually receiving for the cost.

In my case, it usually fills the initial app with a lot of static data. This is great for doing a mock-up, but phoenix.new is a software entity that can do way more than I can in a shorter amount of time. It can easily fill up a database with records to allowing displaying live data. There’s no need to resort to static data that breaks functionality when you try to have meaningful interactions with the application.

So eventually I end up needing to direct phoenix.new to removing static data and filling it with real data in the database. And while I could maybe get around this by explicitly requesting it to not use static data when I give an initial prompt, that’s extra work on my end that I don’t need to worry about with Claude Code or other tools I’ve tried so far. Claude Code frequently creates data in the database, even just temporarily sometimes, when it wants to validate something or prepare a particular view for further testing and development.

In short, the longer I spend with phoenix.new, the lower the quality of the output seems to be, and therefore the less output I feel as though I’m getting for the cost. Add in frustrations like the above Rewrites vs Edits, where phoenix.new will typically nuke my own changes and ignore the direction I wanted for the application, and it’s not hard to see how one could spend a large amount of time and money just trying to get the little details correct.

Significantly Higher Cost

Although $20/mo is a relatively low expense for a tool of this kind, it’s also about how much you might pay to Cursor for access to their custom LLM, or to Anthropic for a Claude Pro plan which gets you access to Claude Code in your terminal. And I get waaaaaaay more output, usually of higher quality as well, from those tools.

With Claude Code specifically, you have a rate limit that resets every 5 hours. Even just on the Pro plan (for $20/mo), I’ve never hit the limit once. But if I did, I wouldn’t have to wait days or weeks to get back into my workflow, I might have to wait a few hours at most.

With phoenix.new, I’m hyper aware that I’m running out of credits. Not only because the credits are reset monthly and running out of credits has a larger impact on my workflow, but because the quality of the output forces me to engage more often in ways I wish I didn’t have to. I’m spending more money that I feel I should be just trying to course-correct phoenix.new onto a path that other tools don’t seem to have nearly as much trouble with.

TL;DR

Overall I loved the initial experience with phoenix.new but it really started to fall off a cliff after about an hour or so. I think there’s a ton of potential here, but I’m also sad to see things like the lack of using .exs scripts for testing or data generation, or the lack of creating tests in test/ without being explicitly told to. I really don’t like how files get overwritten and my personal work is effectively thrown out every time it happens, effectively taking one step forward and two steps back. And when all the various issues are taken together, it’s really hard to generate a good value proposition over something like Claude Pro + Claude Code.

But I do think the potential is there, and I’m hoping it’ll see a lot of love in the coming days as it improves its capabilities.

2 Likes