Help with Gossip Glomers: Fly.io Distributed Systems Challenges

andrewarrow · February 22, 2023, 6:49pm

I made it to 3b before heading over here!

In my handler for broadcast I added:

    for _, node := range n.NodeIDs() {
      if node == n.ID() {
        continue
      }
      n.Send(node, body)
    }

Thinking this would send the message to all the other nodes but me… But I get:

Assert failed: Invalid dest for message
 #maelstrom.net.message.Message{:id 3755969, 
:src "n1", :dest "c11", 
:body {:in_reply_to 5, :type "broadcast_ok"}}

benbjohnson · February 22, 2023, 7:05pm

Congrats!

andrewarrow:

Thinking this would send the message to all the other nodes but me… But I get:
Assert failed: Invalid dest for message
 #maelstrom.net.message.Message{:id 3755969, 
:src "n1", :dest "c11", 
:body {:in_reply_to 5, :type "broadcast_ok"}}

Looks like that is sending back to the client (c11) instead of another node (which would start with n). And it’s also a "broadcast_ok" message rather than the original "broadcast" type. Have you tried printing out what’s in n.NodeIDs() and seeing what it contains?

andrewarrow · February 22, 2023, 7:37pm

Wow, hard to see any output from fmt.Println so I wrote it out to a file:

“n0,n1,n2,n3,n4”

I must be missing something because I’m only sending the messages I get in the handler for “broadcast”. And then after that I reply with “broadcast_ok” like I did in 3a.

benbjohnson · February 22, 2023, 7:50pm

If you can post the code to gist then I can see if there’s anything that jumps out.

andrewarrow · February 22, 2023, 8:34pm

gist.github.com

https://gist.github.com/andrewarrow/e55640675ddb4bf3126ebe8ad9ef696b

main.go

package main

import (
	"encoding/json"
	"log"

	maelstrom "github.com/jepsen-io/maelstrom/demo/go"
)

func main() {

This file has been truncated. show original

benbjohnson · February 22, 2023, 9:02pm

I wasn’t able to reproduce the "c11" destination issue although I noticed the list variable needs a mutex since handlers can be called in concurrent goroutines. That could be causing a race condition when printing it to STDOUT maybe?

Debugging with a log.Printf() will be much easier too if you fix two issues. First, you have a lot of message amplification for each message received from a client. And second, your message list doesn’t deduplicate messages so it gets quite long!

james1 · February 24, 2023, 2:28am

I’m also having some trouble with this one. No issues up until now.

With --node-count=5, it usually starts failing straight away with :net-timeout after an initial :ok :broadcast.

jepsen worker 0 - jepsen.util 0	:invoke	:broadcast	0
jepsen worker 0 - jepsen.util 0	:ok   	:broadcast	0
jepsen worker 1 - jepsen.util 1	:invoke	:broadcast	1
jepsen worker 2 - jepsen.util 2	:invoke	:read   	nil
...
jepsen worker 1 - jepsen.util 1	:info	:broadcast	1	:net-timeout
...
jepsen worker 4 - jepsen.util 4	:fail	:read   	nil	:net-timeout

Although while writing the above, a 5 node run made a lot more progress, failing only intermittently with :net-timeout for :read and :broadcast messages.

I would assume it was deadlocking somewhere, but I can’t see where any deadlocks could happen, and I have also had one instance of a run with --node-count=1 which has failed with :net-timeouts towards the end… The timeouts also occur with no mutexes.

The other thing I’ve noticed is that in the stderr output, the only message which is ever listed as ‘Received’, is {"type":"broadcast","message":0}, where the message value is always 0, never any other values. This is the case even for runs with successful responses to lots of broadcast and read messages.

Low-key hoping that the test command is broken on M1 Macs or something and that I’m not crazy.

Code below. Any ideas?

gist.github.com

https://gist.github.com/2-oS/941a05ea28bb8e12a5c037246bb99ab9

main.go

package main

import (
    "encoding/json"
    maelstrom "github.com/jepsen-io/maelstrom/demo/go"
    "golang.org/x/exp/slices"
    "log"
    "sync"
)

This file has been truncated. show original

ahmet · February 24, 2023, 3:11am

I am also having the same issue with 3b (3a was ok) where it seems like the messages are getting mixed up in transit. Seeing almost everything @james1 mentioned as well.

I’ve created my own message type just so that I don’t mix it with the incoming message accidentally:

type broadcastMsg struct {
	Message int    `json:"message,omitempty"`
	Type    string `json:"type"`
}

and I explicitly set msg.type="broadcast", yet somehow it arrives to other node as broadcast_ok:

for _, neighbor := range neighbors {
	 err := n.Send(toNode, replicationMsg{Type:"broadcast", Message: value})
         ...
}

at the end the log sequence ends like:

STDERR:
And to STDERR:

2023/02/23 19:05:31 Received {c2 n3 {"type":"init","node_id":"n3","node_ids":["n0","n1","n2","n3","n4"],"msg_id":1}}
2023/02/23 19:05:31 Node n3 initialized
2023/02/23 19:05:31 Sent {"src":"n3","dest":"c2","body":{"in_reply_to":1,"type":"init_ok"}}
2023/02/23 19:05:31 Received {c8 n3 {"type":"topology","topology":{"n0":["n3","n1"],"n1":["n4","n2","n0"],"n2":["n1"],"n3":["n0","n4"],"n4":["n1","n3"]},"msg_id":1}}
2023/02/23 19:05:31 Sent {"src":"n3","dest":"c8","body":{"in_reply_to":1,"type":"topology_ok"}}
2023/02/23 19:05:31 Received {n0 n3 {"type":"broadcast"}}
2023/02/23 19:05:31 Sent {"src":"n3","dest":"n0","body":{"in_reply_to":0,"type":"broadcast_ok"}}
2023/02/23 19:05:31 Sent {"src":"n3","dest":"n4","body":{"type":"broadcast"}}
2023/02/23 19:05:31 Sent {"src":"n3","dest":"n0","body":{"type":"broadcast"}}
2023/02/23 19:05:31 Received {n4 n3 {"in_reply_to":0,"type":"broadcast_ok"}}
2023/02/23 19:05:31 No handler for {"id":31,"src":"n4","dest":"n3","body":{"in_reply_to":0,"type":"broadcast_ok"}}

note that it ends with not finding a handler for "body":{"in_reply_to":0,"type":"broadcast_ok"}.

I’m inclined to think something’s wrong with the underlying maelstrom wrapper library or the round trip logic.

calebikh · February 24, 2023, 4:25pm

I managed to find a way around 3b. I wasn’t checking for already broadcasted messages initially, and it seemed to have created an infinite loop.

The broadcast doc was a big help

Code below

gist.github.com

https://gist.github.com/calebikhuohon/993b6f3ee5e300678399de8ffb71fe8c

3b.go

package main

import (
	"encoding/json"
	maelstrom "github.com/jepsen-io/maelstrom/demo/go"
	"log"
	"sync"
)

func main() {

This file has been truncated. show original

benbjohnson · February 24, 2023, 6:48pm

@ahmet @james1 There’s two methods in the Go library for sending messages. One is Send() which just sends the message body and doesn’t expect a response. The other is RPC() which adds a message ID to the outgoing message and you can add a handler for the response.

There’s two ways of getting around this issue:

Use Send() and register a handler on the node for "broadcast_ok".
Use RPC() and pass the response handler for that specific call.

@james1 I switched the Send() call to RPC() and it works:

n.RPC(node, msg_body, func(_ maelstrom.Message) error { return nil })

I’ll clarify those two methods on the site. Thanks for the feedback!

avi · February 25, 2023, 3:01pm

oh, but the wording on the challenge page says:

The value is always an integer and it is unique for each message from Maelstrom.

So are they not unique? I am not understanding the above sentence then.

Edit: I think I understand the duplication issue. But now I am wondering why I didn’t run into infinite loop problem where I keep passing the same message.

avi · February 25, 2023, 3:02pm

hey, I got the 3b working. So I can confirm that I did not run into any issues

avi · February 25, 2023, 3:03pm

whoa… could you elaborate on how did you run into infinite loop? I am not checking for the broadcasted messages either, but I did not run into such issue

edit: I now realised how does one run into this issue. My code is vulnerable to this, but I am not running into this issue at all.

kretaceous · February 27, 2023, 5:30pm

While the solution works and just replacing my Send with RPC worked, I still don’t understand why I had to do that.

This was my code with Send:


if newAdded := messages.Add(body.Message); newAdded {
    for _, id := range neighbours {
        // I don't understand why n.Send() doesn't work.
        n.Send(id, map[string]any{
            "type":    "broadcast",
            "message": body.Message,
        })
    }
}

if body.MsgID == nil {
    return nil
}

return n.Reply(msg, map[string]any{"type": "broadcast_ok"})

Replacing the Send with the following fixes the issue:

n.RPC(id, map[string]any{
    "type":    "broadcast",
    "message": body.Message,
}, func(msg maelstrom.Message) error {
    return nil
})

I don’t understand the difference in the behaviour. Morever, how is this following case possible when a node sends another node a broadcast_ok when I’ve explicitly set a return statement before the n.Reply()?

{n4 n3 {"in_reply_to":0,"type":"broadcast_ok"}}

Appreciate help and pointers. Thanks.

benbjohnson · February 27, 2023, 7:26pm

kretaceous:

I don’t understand the difference in the behaviour. Morever, how is this following case possible when a node sends another node a broadcast_ok when I’ve explicitly set a return statement before the n.Reply()?
{n4 n3 {"in_reply_to":0,"type":"broadcast_ok"}}

The issue with using Send() is that the node that you’re sending to returns a "broadcast_ok" message and the sending node has no handler for that message type. Maelstrom really just deals with individual messages and the “RPC” is just a way from the client code to easily associate a request & response message together.

It looks like your example message isn’t grabbing the message ID from msg when you call Reply(). Maybe try logging the msg value (e.g. log.Printf("%#v", msg)) to see what it contains.

kretaceous · March 11, 2023, 12:52pm

This clears things up, thanks!

Topic		Replies	Views
Gossip Glomers: Broadcast Questions/Clarification	2	879	March 10, 2023
Challenge 3b: Inconsistency with Jepsen dist-sys-challenge	3	1111	February 26, 2023
Multi-Node Broadcasting: Broadcast message propagation Questions / Help dist-sys-challenge	2	86	December 7, 2024
Challenge 3b: broadcast to multi-node Questions / Help dist-sys-challenge	1	551	April 9, 2023
Challenge #3c: Fault Tolerant Broadcast Questions / Help dist-sys-challenge	8	1135	March 28, 2023

Help with Gossip Glomers: Fly.io Distributed Systems Challenges

Related topics