External calls timing out on node

Hi,

A couple of days ago, my server suddenly stopped being able to call external urls from within my node app (using NestJS). Everything has been working fine, there were no deployments at the time, it just stopped.

My app is running inside a docker container, and was using Alpine as its base, but having looked around on here, I saw a suggestion that was to switch it to Slim, so I did that and redeployed, no good.

I then took the nuclear option as someone had tried here, so I deleted my app and recreated it. That has since been done, a deployment has been made, but still no good.

What’s strange is I sometimes seem to get maybe one or two requests to my app that work, and then it just stops. The external urls are not timing out themselves or giving errors, if I run my app locally the calls work just fine.

If I make a request to an endpoint on my app that doesn’t require an external url, then it gets returned, so I know the app is running ok.

I’m unsure what could have caused this, but I could do with some help in getting this back up and running and I’m open to all suggestions.

In case it’s useful, this is my docker file

FROM node:18-slim AS BUILD_IMAGE

# https://medium.com/trendyol-tech/how-we-reduce-node-docker-image-size-in-3-steps-ff2762b51d5a

WORKDIR /usr/src/app

COPY . .

RUN npm config set cache /tmp --global

# The rimraf node_modules is run before npm ci as otherwise NPM throws a WARN about it

RUN npm install rimraf -g && \
    npm run bootstrap-server:ci && \
    npm run build-common && \
    npm run build-server && \
    rimraf ['node_modules', 'apiclient/node_modules', 'cinema-api/node_modules']

# Install dependencies for production
RUN npm ci --omit=dev && \
    cd apiclient && \ 
    npm ci --omit=dev && \
    npm link && \
    cd ../cinema-api && \
    npm ci --omit=dev && \
    npm link @cinemaplanner/api-client

FROM node:18-slim AS RUNTIME

LABEL fly_launch_runtime="nodejs"

WORKDIR /usr/src/app

COPY --from=BUILD_IMAGE /usr/src/app/node_modules ./node_modules
COPY --from=BUILD_IMAGE /usr/src/app/apiclient ./apiclient
COPY --from=BUILD_IMAGE /usr/src/app/cinema-api ./cinema-api

ENV NODE_ENV=production

EXPOSE 3000
EXPOSE 8080

WORKDIR /usr/src/app/cinema-api
CMD ["npm", "run", "start:prod"]

From my logs in the fly monitor, this is the error I end up with

2023-12-21T11:13:52.924 app[e286033c927738] lhr [info] Error: connect ETIMEDOUT 2606:4700::6811:6081:443

2023-12-21T11:13:52.924 app[e286033c927738] lhr [info] at AxiosError.from (/usr/src/app/cinema-api/node_modules/axios/dist/node/axios.cjs:837:14)

2023-12-21T11:13:52.924 app[e286033c927738] lhr [info] at RedirectableRequest.handleRequestError (/usr/src/app/cinema-api/node_modules/axios/dist/node/axios.cjs:3029:25)

2023-12-21T11:13:52.924 app[e286033c927738] lhr [info] at RedirectableRequest.emit (node:events:517:28)

2023-12-21T11:13:52.924 app[e286033c927738] lhr [info] at RedirectableRequest.emit (node:domain:489:12)

2023-12-21T11:13:52.924 app[e286033c927738] lhr [info] at eventHandlers.<computed> (/usr/src/app/cinema-api/node_modules/follow-redirects/index.js:14:24)

2023-12-21T11:13:52.924 app[e286033c927738] lhr [info] at ClientRequest.emit (node:events:529:35)

2023-12-21T11:13:52.924 app[e286033c927738] lhr [info] at ClientRequest.emit (node:domain:489:12)

2023-12-21T11:13:52.924 app[e286033c927738] lhr [info] at TLSSocket.socketErrorListener (node:_http_client:501:9)

2023-12-21T11:13:52.924 app[e286033c927738] lhr [info] at TLSSocket.emit (node:events:517:28)

2023-12-21T11:13:52.924 app[e286033c927738] lhr [info] at TLSSocket.emit (node:domain:489:12)

Hi,

Ah this sounds familiar. Yep, I found the same.

In my case (and the other guy), changing to a different base fixed it e.g

… however in our case we were getting ENOTFOUND errors which suggested the DNS wasn’t resolving to an IP.

It can’t hurt to try a full image (not alpine or slim), purely out of interest :thinking:.

But in your case you have ETIMEDOUT and do have an IP (an IPv6) next to that, which suggests to me that your app is able to resolve DNS (else it wouldn’t be able to get that IP). So it looks like a different issue, which may have a different solution …

What about if you call other URLs? Like … https://jsonplaceholder.typicode.com/ ? There is an example using Node calling a demo endpoint https://jsonplaceholder.typicode.com/todos/1 if you scroll down that page to “Try it”.

What about if you call other URLs? Like … https://jsonplaceholder.typicode.com/ ? There is an example using Node calling a demo endpoint https://jsonplaceholder.typicode.com/todos/1

I get the same problem, it just times out.

I get a similar error, with a v6 IP address.

It can’t hurt to try a full image (not alpine or slim), purely out of interest :thinking:.

I did try this too, as probably expected, it made no difference.

Strange :thinking:

Since the current connection between the domains is they are both resolving to IPv6 (which then times out) … how about trying to make a fetch request to a domain which only has an IPv4? See if that’s the issue :thinking:.

e.g https://ipv4.google.com

… which (as its name suggests) only has an IPv4:

Does that work?

It does, yes, every time. So looks to be something strange happening when trying to connect via ipv6 then :confused: I wonder what could have triggered this when I made no changes on my end.

Weird! No idea why that would spontaneously stop working.

I guess for now you’d have to tell Node to use IPv4 (since it will prefer IPv6 if both are available, which … times out). Most domains should have both A and AAAA. So that would get you going again.

e.g:

family IP address family to use when resolving host or hostname. Valid values are 4 or 6. When unspecified, both IP v4 and v6 will be used.

https://nodejs.org/api/http.html#httprequesturl-options-callback

Thanks for this suggestion. I’ve found the way to do this with the axios package I’m using and it now looks to be connecting ok, thanks for the troubleshooting help @greg.

1 Like

Hey @ScottLovegrove

Sorry about that. We had some IPv6 routing problems in lhr region. Should be fixed now.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.