Phoenix.new Feedback

I want to start by saying that I love the idea of phoenix.new and I think it works very well at creating working mock-ups quickly. It obviously understands LiveView and Presence and other Phoenix-framework-specific concepts better than other models/tools. But what I’m getting for $20/mo feels severely underwhelming compared to what most other AI tools offer for similar price points.

For context, I’ve been developing software across a variety of industries and programming languages for almost 20 years, and I’ve worked at AI companies before, and I’ve worked at companies that were very AI-positive (i.e. encouraging/supporting developers to use AI tooling). I also have been using Claude Code for a while now, to surprisingly good results even within Elixir/Phoenix projects.

The following are some pain points that, as a whole, really stop me from considering phoenix.new as a long-term AI coding tool. I’ll typically use Claude Code as a comparison as that’s probably the best personal experience I’ve recently had for this kind of tooling.

Rewrites vs Edits

If I make changes to a file, phoenix.new seems to always rewrite a file it needs to modify rather than seeking to edit the file. This means I frequently have my changes erased and overwritten.

If someone is using phoenix.new solely to vibe code from start to finish, that’s not a problem.

If someone is using phoenix.new just to set up a new project and get some working bits and bobs before taking it over manually, that’s probably not a big deal.

If someone is trying to work with phoenix.new to shape the direction of the code and pick up the slack when it fails to materialize a proper solution, then that is a huge issue for it to just throw out those manual changes.

Bad at Validation

I’ve seen some other complaints about this and I want to reiterate it: when phoenix.new validates web components, it seems like it frequently fails to do so correctly. In what feels like it happens half of the time, a validation task will just hang.

Whatever the cause may be, it appears as though phoenix.new prefers to use command-line tools (like a headless browser) in an effort to fetch a web page and inspect the contents and then interact with it. That’s nice for some things, but why isn’t it generating test cases in test/? Or why isn’t it using Elixir script (.exs files) to validate things?

Without any kind of prompting or guidance, even Claude Code defaulted to creating temporary .exs scripts to validate that functions and features were behaving as expected. It even created test cases in test/ to ensure that web components were showing the appropriate elements and that interactive elements were triggering the appropriate events. How is phoenix.new not doing that by default?

I’ve also frustratingly had some cases where phoenix.new will run what it believes is a good validation against something I’m telling it is incorrect, and then tell me everything is working as expected now, but it’s wrong. The validation it wrote had a flaw, usually in the premise of what I said the issue was, and that just wastes credits and time. Claude Code very rarely gets the premise of an issue wrong, and the problem it usually has is that it will either (a) fail to identify why there’s an issue or (b) get into a fix-debug loop where it literally can’t solve the issue and will get stuck trying indefinitely if I don’t step in.

But I prefer that to what I frequently experience with validations made by phoenix.new.

Lower Quality, Less Output

In my experience, phoenix.new starts off way better than any competitor. With most AI tools, it’s best to start with at least an “empty” project. With any greenfield project I might use Claude Code on, I’ll always do some initial setup first (e.g. mix phx.new) and then build context around that before engaging with the agent. But with phoenix.new, it’ll do all of that for you. It’ll run other mix commands too, like mix phx.gen.auth or mix phx.gen.live or whatever.

I also love that it starts by generating a plan that it uses to direct itself on what actions it needs to take to get an MVP (minimal viable prototype) completed. And by the end of the very first prompt response you will have a working Phoenix app. That’s incredible. You can put in very little effort overall; just describe the kind of application it should be, boom here’s something you can immediately start working with.

But after maybe 30 minutes, you can start to see where the quality drops off. And as the quality drops off, you start realizing how little output you are actually receiving for the cost.

In my case, it usually fills the initial app with a lot of static data. This is great for doing a mock-up, but phoenix.new is a software entity that can do way more than I can in a shorter amount of time. It can easily fill up a database with records to allowing displaying live data. There’s no need to resort to static data that breaks functionality when you try to have meaningful interactions with the application.

So eventually I end up needing to direct phoenix.new to removing static data and filling it with real data in the database. And while I could maybe get around this by explicitly requesting it to not use static data when I give an initial prompt, that’s extra work on my end that I don’t need to worry about with Claude Code or other tools I’ve tried so far. Claude Code frequently creates data in the database, even just temporarily sometimes, when it wants to validate something or prepare a particular view for further testing and development.

In short, the longer I spend with phoenix.new, the lower the quality of the output seems to be, and therefore the less output I feel as though I’m getting for the cost. Add in frustrations like the above Rewrites vs Edits, where phoenix.new will typically nuke my own changes and ignore the direction I wanted for the application, and it’s not hard to see how one could spend a large amount of time and money just trying to get the little details correct.

Significantly Higher Cost

Although $20/mo is a relatively low expense for a tool of this kind, it’s also about how much you might pay to Cursor for access to their custom LLM, or to Anthropic for a Claude Pro plan which gets you access to Claude Code in your terminal. And I get waaaaaaay more output, usually of higher quality as well, from those tools.

With Claude Code specifically, you have a rate limit that resets every 5 hours. Even just on the Pro plan (for $20/mo), I’ve never hit the limit once. But if I did, I wouldn’t have to wait days or weeks to get back into my workflow, I might have to wait a few hours at most.

With phoenix.new, I’m hyper aware that I’m running out of credits. Not only because the credits are reset monthly and running out of credits has a larger impact on my workflow, but because the quality of the output forces me to engage more often in ways I wish I didn’t have to. I’m spending more money that I feel I should be just trying to course-correct phoenix.new onto a path that other tools don’t seem to have nearly as much trouble with.

TL;DR

Overall I loved the initial experience with phoenix.new but it really started to fall off a cliff after about an hour or so. I think there’s a ton of potential here, but I’m also sad to see things like the lack of using .exs scripts for testing or data generation, or the lack of creating tests in test/ without being explicitly told to. I really don’t like how files get overwritten and my personal work is effectively thrown out every time it happens, effectively taking one step forward and two steps back. And when all the various issues are taken together, it’s really hard to generate a good value proposition over something like Claude Pro + Claude Code.

But I do think the potential is there, and I’m hoping it’ll see a lot of love in the coming days as it improves its capabilities.

3 Likes

I also liked the idea a lot but I ended up just losing time and money.
First, the agent got stuck writing a session_live.ex file. No matter what I did, it kept on rewriting the module by appending its definition at the end of the file. This resulted in a 5000 long file for a simple live file. That alone costed $10 and it kept on repeating the same mistake no matter what I prompted or which changes I made to the file itself in the editor.

Then, when at $0 credits left, I was unable to clone the repository because of a remote error:

Cloning into 'agent_observer'...
fatal: the remote end hung up unexpectedly

Not only it’s a waste of $20 but also a huge waste of time given that I can’t even get the code on my machine.
Really disappointed

Sorry to hear about your less than stellar experience. We’ve have seen some corrupted file system issues that can cause beamfile errors, and for which the agent will repeatedly attempt to fix because every compile that should work gives an beamfile error. I’m not sure if this is what happened in your case, but I’ve thrown some credits your way if we want to give things another go.

Hey @the-eon, thanks for the candid feedback. I also threw some credits your way if you want to try out a feature that just shipped, which will help with some of the behavior you want:

Testing/validation:
We now support a root /workspace/AGENTS.md, where you can specify guidelines you want the agent to follow. For example, we don’t instruct the agent to write test files unless the user asks for it. So if you prefer less web “testing” in favor of test files, you can either ask explicitly, or now with AGENTS.md, just specify you always want test files and do avoid calling the web browser to interact and try out the app.

Rewrites vs Edits:
The agent has a surgical edit tool that it can choose for minor edits, but it’s currently a balance of successful surgical modification vs a bad edit that requires a rewrite anyway. We’re tuned with a happy balance at the moment, but if you’d like to have the agent lean on the surgical modify front, your AGENTS.md can say something like Prefer surgical modify for any file edits that can be reasonably accomplished with standard sed operations or similar. It may take some trial and error.

Strong start, tapered off quality:

In my experience, phoenix.new starts off way better than any competitor.

Glad to hear we are giving a strong start! At the moment we’ve only focused on and optimized for the starting point. Once you’re ready to iterate on an established (or existing) app, the agent takes a lot more hand holding and direction to get the more measured behavior that you want. AGENTS.md can help here, but really we need to focus our internal guidelines for these different modes. Currently it’s full vibe mode, but we are working on an pair mode to make this continued dev story much better. So stay tuned for updates in this regard. Thanks!

–Chris

1 Like

Thanks a lot for the quick reply and the credits. It was really frustrating to see it stuck appending the same module over and over again while the credits where sinking :sweat_smile:.

I think there’s a lot of potential there, but so far it’s too expensive compared to claude code and I ended up having to use a ssh key to push the repository manually on my github which worked well. Maybe an integration with github would prevent this sort of issues

Oh wow, thanks for jumping into the conversation so quickly @chrismccord. Yeah, I think the main takeaway if you’re contemplating what would bring the most value to phoenix.new is that ability to do more “pair programming” workflow types. I haven’t tried using AGENTS.md to ask it to use surgical modifications instead of the current “hammer solves everything” approach, so I’ll try doing that today and see what happens.

I think if you distilled my feedback into a single primary issue, it’s two parts:

  1. For a purely vibe coding tool, it consumes a lot of credits and progressively gets “more expensive” the further along you get, i.e. you get less bang for your buck as it stops making these huge leaps in progress for relatively few credits.
  2. When moving beyond pure vibe-coding, it has to be a lot better at working with the user to create solutions. Even if it’s going to completely rewrite a file, it should at least read in any recent edits made to understand what the user has done and try to use that as additional context when rewriting the file.

Thanks again for the quick feedback, I’m looking forward to seeing how surgical edits work out and I’ll be keeping an eye on future updates to the tool!

I’m back! Been a minute, but I hadn’t a chance to play around with the additional credits until today.

So, I had two unpleasant experiences that once again made me feel like the value proposition still isn’t quite there yet, and in fact the second experience I’ll mention here made me wonder if phoenix.new is actually better at Phoenix projects than other AI tools.

First, git commits. I had thought that the agent was aggressive with committing things to the git history upon completing tasks, given how small each commit diff typically is and how many commits I end up with in a short period of time. But today I had everything related to user auth inexplicably ignored (by the agent, not via .gitignore). All the auth stuff was on the phoenix.new machine, but it wasn’t there when I pulled the project down locally.

That wasn’t upsetting so much as it was annoying, but it did make me wonder how the agent decides what goes into the git history and whether the individual commits it is actually making are any good if it doesn’t just add all changes from a given task to a commit.

Second, a lack of understanding around how form components work. And this one was genuinely disheartening. Now, I’m not sure if this was intended to work, but the agent modified a live component for a form to use phx-value-* alongside phx-change. I don’t think I’ve ever seen phx-value-* used alongside phx-change, and nothing in the documentation suggests that phx-value-* works with anything other than phx-click (which is weird), and so my belief is that using phx-value-* doesn’t do anything when paired with phx-change.

The error seemed to reinforce that belief, because the output showed that only one parameter was being sent to the event handler: _target. Nothing in phx-value-* was coming through but it expected that data and caused a “no matching function” error. On its own that might not be the end of the world, but the agent believed it was a successful change and told me so. And while I don’t blindly believe AI is right about everything, I did expect it to better understand the very thing that is supposed to be its bread and butter: Phoenix web applications.

There were some other very minor disappointments, like when it didn’t generate a “home page” initially and so everything went straight to the login page. But then when I asked it to generate a home page so people don’t go straight to the login page, it places the home page behind authentication so that everyone still goes straight to the login page. Or when I told it to specifically use an embeds_many in a model and it was like “yeah definitely” and then completely ignored me and created a new migration for new tables and used has_many associations instead. Or how it originally started with a consistent theme and then for some reason changed the root layout to have an entirely different theme from the app layout and I had to tell it to make the theme consistent throughout. I also tried to use AGENTS.md in the workspace root (not the project directory) and it seemed to just outright ignore it, or otherwise didn’t know how to interpret my commands to it from that file.

But again, I found it incredibly useful for getting started (sans the git commit issue). It seems to excel at generating plans and building out a skeleton Phoenix application in short order. But at this point I would probably prefer to pay for credits one-off rather than on a subscription so that I could top up just whenever I need to kickstart a new project. Assuming my current credits don’t expire for a while though, I’ll be coming back periodically to test updates. Is there a good way to stay up to date on changes made to the agent?

I’ve been having a good time using phoenix.new over the past few weeks.
I find that it accelerates parts of development quite a bit.

I’ve also taken a few wrong turns with it and had a few frustrating loops kill the vibe.

My first project I imported was a phoenix 1.7.20 project.

Phoenix.new does better to keep 1.8 on task, so the agent kept struggling with @inner_block and @inner_content errors after implementing custom layouts everywhere. It does much better for 1.8 and knows it shouldn’t do that.

I also got in quite a bad loop when using Commanded. Working through event definition files and syntax, the editor would consistently start redefining the module 80-100 lines in, and get stuck over and over. I did refresh a couple times. No luck. It really seemed to struggled with
{...
current_data: current_data,
proposed_data: proposed_data
...
}

elements, so much, i wondered if they’re keywords in the stack somewhere. there were other definitions, but only seemed to get caught on those key/values – across 3 different files.

anyhow, it’s helping me reach and explore things that would be much more strenuous otherwise. up and downs and bigger ups!

I’d like to see what others are doing with their `AGENTS.md` files to stay in the groove.

See one of my posts, and if you have admin access you can see the loop it went through to burn my money. I have no idea how it managed to do it. I made another app and it cost me $20 to get something working. This cost me $120 to get a login page and a landing page. It got stuck in a loop I guess. I noticed it only when I came back to look.

Not totally sure what happened, but I have to end my subscription until some kinks are worked out.

Here is why phoenix.new burned through all my credits….
This is ridiculous. There are more screenshots similar, I feel robbed.

Oof so sorry for this! We added preview pane recovery and the LLM can assist with failed server starts, but not sure what happened here with recursive failed starts. It doesn’t appear any of those were actually hitting the LLM (ie invoking api/tool calls), but in any case I’ve comped $150 for the trouble and I’m about to push out a few bugfixes around pane recovery, which explicitly does not hit the LLM unless the user clicks an inactive preview themselves and the previously run command (ie mix phx.server) fails to come up.

1 Like

@thedangler I can confirm that the recursive failed server restart did not trigger the LLM loop. I’ve deployed a few bug fixes around preview pane starts that should resolve this as well as allow easy server restarts without having to bug the agent and burn credits to restore the preview pane.

Our system prompt is solely tailored for phoenix 1.8+, so it makes sense that it will struggle with prior interfaces and layout handling. With phoenix 1.8 shipping AGENTS.md (largely extracted from phoenix.new), I will revisit our own system prompt to avoid duping guidelines and this would allow you to bring your own 1.7 rules, but TBD.

There are a few failure modes with our git proxying that I need to hunt down. Every file tool that agent invokes performs a commit. It’s not possible for it to touch a file without committing actually, but we have a git hook that auto pushes to a local bare git repo for which your clones/pulls fetch from. Sometimes this bare git repo gets out of sync and I still need to track it down.

We’ll do newsletter blasts that aggregate bigger sets of updates. I also need to blog about some of the neater things we’ve shipped as of late, like better surgical edits via lua scripts.

A pair mode is on our list for more measured planning by the agent, but note that we watch all file changes and the moment you save a file the agent “sees” it in its system index, so it will always have the current view of the world project file wise. I do this in part exactly to avoid having to re-read in files over and over which consumes unnecessary tokens. Also the agent is always up to date so no guesswork or having to tell it to go look at files or check mtimes.

Focused chats on the same codebase and squashing chats works very well for me, but so far we’ve only optimized for the “new” part of building phoenix apps. There’s a lot to explore on continue progress after the starting point, ie pair mode.

Thanks for the feedback!

2 Likes

Hi Chris,
That is beyond generous and I am grateful.

Thank you for the detailed explanation.

I’ll be sure to take another look when I get back from my vacation.

Keep up the good work.

Is there a guide or documentation coming?

Thanks again.

A little more feedback: I signed up for what was described as a $20/mo subscription, but was really buying $20 of credits (that will expire in a month if not used? Am I going to get billed another $20 at the end of the month automatically? Not sure.).

I used all but $2.86 of that initial $20 in less than an hour. Mind you, it was an incredible experience starting an app that way, but I think I burned most of the credit trying to fix a single front-end bug in the initial app that the system kept thinking it had fixed. I’ve cloned the project and will try to fix the bug locally myself - if I can, I might pony up $20 more to see what adding features is like.

Hello

I am using Phoenix to develop a larger application. I am not a software engineer and I dont write code however, I understand the software development process.

I created a very detailed PRD ordered to first build the DB, Admin backend, business rules, and UI. The PRD is written to facilitate development by an AI Agent containing User Stories, the “what” that needs to be developed, Tests, Software Architecture, Security, Atomic DB transitions, etc….

Phoenix.new did a great job planning out the work and quickly coded the application. It spent a lot of time correcting itself and it went thru $220 fairly quickly. It claimed as the app was being built, that the app was tested and the functionality was working properly.

When I tested the app, a lot of what it claimed was working, was broken. I provided feedback and it performed fixes and said everything was working. It took several iterations to fix reported issues and some it did not fix because I stopped. I estimate $40-50 was spent correcting issues even though it had detailed and clear instructions. I expect to spend another $100 testing and getting Phoenix.new to fix issues.

Phoenix.new has great potential but testing and understanding feedback needs to be appreciably improved. One area of improvement is to allow files with attached rather that pasting everything.

Is there anything I can do to improve the results I get from Phoenix.new?

Regards

Hi Chris.
I logged back in to start working with Phoenix.new and I had to subscribe again and the $150 credit was gone. I guess when I unsubscribe it erased my credit ?

I’m not sure why if I don’t want to pay $20/m because I have $150 in there it deleted the credit.

Might be a bug