That right, you would need to have to have a flip-flop or so to latch the state at each end point if using the fan-out approach.
That's going to be a *really* big and expensive fan-out board (more likely a great big stack of boards)! Perhaps better to use a cheap micro-controller PCB at each relay or relay cluster with a cheap RS-485 interface (MAX485/487 or so) on it. Add an RS-485 interface to the arduino. Now you can assign as much address space (32-bit for 4 billion if you want that crazy) as you want, while only using 3 lines on the arduino (TX/RX/DIR). Of course, every hundred node or so, you would want to insert an RS-485 signal re-generator (2 x RS-485 driver back-to-back) . The added advantage is that the nodes can be up to several Km apart, as opposed to the PCB idea where everything needs to be in the same area.