Skip to content
This repository has been archived by the owner on Aug 26, 2022. It is now read-only.

How to provide Entity activation/passivation #6

Open
sleipnir opened this issue Jul 27, 2020 · 15 comments
Open

How to provide Entity activation/passivation #6

sleipnir opened this issue Jul 27, 2020 · 15 comments
Labels
help wanted Extra attention is needed platform question Further information is requested

Comments

@sleipnir
Copy link
Member

What is the best way to implement passivation and activation of entities?

@sleipnir sleipnir added help wanted Extra attention is needed question Further information is requested platform labels Jul 27, 2020
@marcellanz
Copy link
Member

activation and passivation are all about the economics of the proxy regarding resources allocated or not, right?

This topic might lead to an even broader one. If the proxy balances load or lets the user follow an entity by frontend client or just pre-warms entities on cluster nodes, entity activation and passivation seems to be the tool to do that. Since we have no Akka-cluster functionality available (and I don't know which part of Akka-cluster is doing that), the response to this issue leads us to how and a which level we get this implemented. I think this too brings us to (persistent) state management?

@sleipnir
Copy link
Member Author

yes i think that all these questions you raised are valid and we should reflect on what level we want to reach.
One thing about Akka is that the proxy today only uses very little of the capabilities that a cluster offers. Basically just for Akka persistence, this is a design that I would not like to repeat. And not repeating this behavior we can create things like effects and routes to any node in the cluster (for example)

@sleipnir
Copy link
Member Author

But perhaps it is only about persistence the theme activation and passivation

@marcellanz
Copy link
Member

I agree, I'm too would expect the proxy would be able to route messages to any node. I think entity activation and passivation will be part of both topics of state management and message routing and then also remoting (with which whatever that might be).

@sleipnir
Copy link
Member Author

I agree.
Do you have any thoughts on how we can implement the activation / passivation mechanism? Do I say when we know we must passivate an entity? Or rather, what does it mean to passivate an entity?

@marcellanz
Copy link
Member

I think there is nothing in the cs-spec that defines activation and passivation but defines certain messages to re-establish state on a user functions entity.

So far, the reference cs-proxy does close the stream to passivate an entity. It might use a cloudstate.StreamCancelled message for CRDTs to announce the passivation but nothing special for event-sourcing and therefore just close the HTTP/2 gRPC stream. For activation the reference-proxy just re-establishes by issuing snapshot- and event-message for event-sourcing and state- and delta-messages for CRDTs.

As the user entities cannot emit events and changes without an incoming command, cloudstate can "pre-warm" entities and therefore activate entities whenever the proxy likes to do it.

@sleipnir
Copy link
Member Author

The fact appears to be a warm-up step contemplated in the proxy of the reference implementation, where a special EventSourcedStream Init message is sent and handled by the proxy, but it does not appear to be heating up the user's functions, but rather the proxy itself.

@marcellanz
Copy link
Member

I agree, pre-warming was an invention by my imagination; I'd consider doing that if an entity, if crashed, would benefit from something like that.

@sleipnir
Copy link
Member Author

I believe that if you were able to preemptively activate an entity function, perhaps sending an Init message to a particular entity key in order to make the user function benefit in some way, it would be interesting to support it.

@marcellanz
Copy link
Member

I agree. I don't know Akka cluster, but I'd imagine activation, and that is (pre-)initialize an entity, or even a stream even without state, could help in certain scenarios. So this is all what activation would be:

  • an entity ready to be sent a command
  • an entity ready with local state already initialised and ready to be sent a command

@sleipnir
Copy link
Member Author

I think it's valid but I don't know if we would be able to generate a stream without an init message, I think not following the CloudState protocol like the one that currently exists.

@sleipnir
Copy link
Member Author

sleipnir commented Apr 7, 2021

Hello guys, I took a moment to think about the BEAM Processes structure needed to build support for EventSourced and CRDT entity types. I know that the scala proxy uses Akka Cluster Sharding to spread each Entity identified with a persistenceId by the Akka cluster, so I reread the Akka Cluster Sharding documentation to try to understand how we could reproduce this behavior in our Proxy.
From the beginning I thought that the support of the Horde library would be enough to solve this problem, but I want to make
sure that this is the right way to go so I am sharing my conclusions with you.
Akka's documentation right at the beginning says the following:

"Cluster sharding is useful when you need to distribute actors across several nodes in the cluster and want to be able to 
interact with them using their logical identifier, but without having to care about their physical location in the cluster, 
which might also change over time.

It could for example be actors representing Aggregate Roots in Domain-Driven Design terminology. Here we call 
these actors “entities”. These actors typically have persistent (durable) state, but this feature is not limited to actors 
with persistent state."

The Introduction to Akka Cluster Sharding video is a good starting point for learning Cluster Sharding.

Cluster sharding is typically used when you have many stateful actors that together consume more resources (e.g. memory)
 than fit on one machine. If you only have a few stateful actors it might be easier to run them on a Cluster Singleton node.

In this context sharding means that actors with an identifier, so called entities, can be automatically distributed across multiple nodes in the cluster. Each entity actor runs only at one place, and messages can be sent to the entity without requiring the 
sender to know the location of the destination actor. This is achieved by sending the messages via a ShardRegion actor 
provided by this extension, which knows how to route the message with the entity id to the final destination."

It also says the following about persistent actors (in this case these are used in the Cloudstate reference implementation):

"When using sharding, entities can be moved to different nodes in the cluster. Persistence can be used to recover 
the state of an actor after it has moved.

Akka Persistence is based on the single-writer principle, for a particular PersistenceId only one persistent actor instance 
should be active. If multiple instances were to persist events at the same time, the events would be interleaved and 
might not be interpreted correctly on replay. Cluster Sharding is typically used together with persistence to ensure that
there is only one active entity for each PersistenceId (entityId)."

Now that we know how Akka Cluster Sharding works, let's take a look at Horde.
The Horde library exposes an API for clustered use of the OTP mechanisms called Registry and DynamicSupervisor
(this in turn is an Elixir extension for the standard OTP Supervisor), which in turn has the responsibility of serving as
a database of processes being able to locate process by a qualified name instead of a PID and the other is responsible for starting child processes under a Supervisor at run time.
Processes created under Horde.DynamicSupervisor are unique, that is, there can only be one process with the
same name active in the Cluster (we can use PersistenceId here), and when a process is migrated or restarted it is
possible to have its internal state migrated for the new Process (I created a module with this type of logic in the MassaProxy.Cluster.StateHandoff module in the massa_proxy project).

With these Horde resources I think we can use the following sequence:

eventsourced-activation

In this diagram I tried to explain the responsibilities of each process, for example Dispatcher is the mechanism that communicates with the gRPC server while the Entity speaks only with the user role.
In the lookup step, the dispatcher cannot try to create a process if it already exists for the same key (EntityId + PersistenceId).
We must also take into account which is the best strategy for rebalancing Entity processes, by default Horde will only try to migrate processes that have been killed, or its node is removed from the cluster, to another node, but this can be changed causing it rebalance in active mode, that is, whenever a node is added or removed from the cluster. I think this second strategy would be the closest to that of Akka Cluster Sharding and it makes sense to me. This implies that the migration of the entity's state must be very well implemented (if this is necessary because, in the worst case scenario, we can go to persistent storage and request this data at the start of the process, but I prefer not to have to do this when possible, or that is, migrating hot data would be the best scenario despite being more complex).

ping @marcellanz @ralphlaude
What do you think of what I described? Does it make any sense?

@marcellanz
Copy link
Member

@sleipnir makes all sense. Lets discuss shortly on discord and then update here.

@ralphlaude
Copy link
Collaborator

I understand the issue and it makes sense to me @sleipnir. I don't really understand the last sentence. In case of a migration from one to another we have to access the database on restart. It is not the best approach but it is the robust on. I don't really see how we can prevent access to the database during migration.

@sleipnir
Copy link
Member Author

sleipnir commented Apr 7, 2021

In case of a migration from one to another we have to access the database on restart. It is not the best approach but it is the robust on. I don't really see how we can prevent access to the database during migration.

Not really @ralphlaude , we don't access the physical storage database, we access the state of the previous Actor that was migrated from node to the new process, all of this through the passing of messages between actors, we access the real physical base if there is no state (state is empty) and just to make sure that the data is consistent

This is done through a process called StateHandoff that uses Horde to ensure that the process is only completed after the state has been migrated to a new process

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed platform question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants