As you say, shared CPUs are never going to be that impressive. As you work up through their 1/2/4 ladder, the RPS improves accordingly, so that would suggest the CPU is the issue. This post says that you can have up to 16 other users and so you could expect some throttling:
However that wouldn’t explain your performance-cpu test. You’d expect that would be much better . Maybe … SSH in to the machine, run
htop
(or equivalent) to see the processes and then run a slightly longer test to see if you get the expected number of child workers. Not sure.