Challenge #3c: Fault Tolerant Broadcast

chillgum · March 7, 2023, 4:04pm

Hi folks, I am stuck on 3c. I know that after a network partition, some nodes will be unreachable. What I can’t figure out is if RPC() or SyncRPC() calls to unreachable nodes be stuck forever (and I need to force a timeout) until the partition heals or the simulation will return a RPC error with a timeout. As far as I can tell none of my RPC calls return an error, I checked after adding log lines to each error path. My code can seen [here].(3c.go · GitHub)

benbjohnson · March 8, 2023, 5:22pm

There’s not many (if any) guarantees about the partitioning. You should assume that messages can get delayed or dropped and there’s nothing in Maelstrom to guarantee a timeout error will be returned.

johnkoepi · March 14, 2023, 5:24pm

100% to what @benbjohnson said.

it just the API functions names are confusing. Those RPC functions are not actually Remote Procedure Calls but they are just sending messages. If the message was dropped/lost somewhere in the middle - sorry, don’t wait for anything in return. Think of them as Send/Recv a UDP packet. The messages theoretically can be even dupped.

arhyth · March 15, 2023, 11:34am

hi! are we allowed to change the message structure for broadcast in 3c? i’m not sure how to guarantee all nodes receive the message without tracking who has already received it and sent it to someone else

sorry to piggyback here but i figure it’s much better than creating another topic since the challenges have been out for a while now and i assume is already finished(?)

johnkoepi · March 15, 2023, 4:22pm

@arhyth if maelstrom workload works (valid: true) than its fine

gororuns · March 18, 2023, 5:17pm

I’m struggling with this a bit, I feel like this is tricky without more hints, and didn’t find anything else in the forums. I tried adding the failed messages to a list and resending failed messages on each broadcast.

edit: managed to do it by sending all the failed messages at a regular interval (or just sending all the failed messages on a read, which works but not recommended if there are a lot of reads).

benbjohnson · March 21, 2023, 4:34pm

Can you provide some info on where you’re getting stuck? Or can you post some code? The challenges are open-ended so there can be multiple approaches that can work.

gororuns · March 21, 2023, 8:46pm

I managed to come up with a solution by storing the failed messages using a map. I think it was just quite a steep curve from the previous step to this one. What made it difficult was that I didn’t see the log messages initially. Also the API functions are a bit confusing especially when using the async sender. In the end, I just used the sync sender, and once I found where to view the logs on each node, then it was easier to debug.

system · March 28, 2023, 8:47pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
3c: Fault tolerant Multibroadcast Questions / Help dist-sys-challenge	2	98	January 1, 2025
Gossip Glomers: Broadcast Questions/Clarification	2	878	March 10, 2023
Multi-Node Broadcasting: Broadcast message propagation Questions / Help dist-sys-challenge	2	81	December 7, 2024
Challenge #3e: Efficient Broadcast, Part II dist-sys-challenge	2	951	February 23, 2023
maelstrom challenge: request to implement topology and then ignore it is very confusing. dist-sys-challenge	1	750	March 16, 2023

Challenge #3c: Fault Tolerant Broadcast

Related topics