release-queue #2

Merged
def merged 1 commits from release-batch into master 2 years ago
def commented 2 years ago
Owner

Nodes are released when they are no longer being observed. That happens in two situations:

  • When release is called on a root. The graph is traversed and all nodes are marked as no longer being observed.
  • In a join or bind node, when the inner graph changes, the previous one is released.

Releasing a sub-graph is a significant operation, as it traverses the whole sub-graph and can invoke arbitrary user functions (the release callback of primitives).

Release behavior should commute

Badly implemented, it lead to code that depends on evaluation order:

let computation_1 = Lwd.var (Lwd.pure 0)
let computation_2 = Lwd.var (Lwd.pure 0)

let expensive_int_computation : int Lwd.t =
  ...

let diff : int Lwd.t =
  Lwd.map2 (-)
    (Lwd.join (Lwd.get computation_1))
    (Lwd.join (Lwd.get computation_2))
    
let root = Lwd.observe diff

Now assume we are observing the diff node and that we are switching back-and-forth between these two configurations:

let setup_1 () =
  Lwd.set computation_1 (Lwd.pure 0);
  Lwd.set computation_2 expensive_int_computation

let setup_2 () =
  Lwd.set computation_1 expensive_int_computation;
  Lwd.set computation_2 (Lwd.pure 0)

Because of the left-to-right evaluation order, the call to map2 evaluates computation_1 first and then computation_2.

Let's trace the following sequence of calls:

let r0 = Lwd.sample root
let () = setup_1 ()
let r1 = Lwd.sample root
let () = setup_2 ()
let r2 = Lwd.sample root
let () = setup_1 ()
let r3 = Lwd.sample root

Evaluating after the first call to setup_1:

  • computation_1 does not do anything special
  • computation_2 causes expensive_int_computation to be observed for the first time, the graph is traversed and each node is update

Evaluating after the first call to setup_2:

  • computation_1 also makes use of expensive_int_computation. Because it is already in use (since computation_2 has not been updated yet), this requires no work, evaluation happens in $O(1)$.
  • computation_2 releases expensive_int_computation, but it is now used by computation_1 , evaluation happens in $O(1)$.

Evaluating after the second call to `setup_21:

  • computation_1 releases expensive_int_computation. Because it was the last observer, the sub-graph is released and previous evaluation results are dropped, evaluation happens in $O(n)$.
  • computation_2 acquires expensive_int_computation that was released. The sub-graph is traversed for acquire $O(n)$, and evaluation is done after, at least $o(n)$.

An apparently innocuous change made a computation go from $O(1)$ to $o(n)$. This lack of commutativity is worrying not just for performance reason, but because the program typically does side-effects during acquire/release phase that might can lead to different behaviors.

The solution to this problem is to delay the release operation to not happend during evaluation cycle. Sub-graphs to release are put in a queue that is flushed after evaluation.

The release_queue

To this end, we introduce a new release_queue object. It accumulates node to release and, when deemed appropriate, actually release them. The when is ill-defined, as it might depend a lot on the application.

If a single document is observed it can happen at the end of the current evaluation cycle. But without stretching imagination too much, we can find other scenarios that have a different when:

  • Nested document evaluation. For instance, for a visual application, a frame can be made of different phases: layout, render, event propagation. Each phase can be done by evaluating a document, but the right granularity for a cycle is a full frame rather than a phase.
    This is beyond the scope of Lwd, but Lwd should be flexible enough to integrate this use case well.
  • If a document is damaged during evaluation (as in the previous section), should we extend the evaluation cycle to the next fixed point? (Certain iterative layout algorithms might rely on that behavior) Release would thus be delayed a bit, but if the layout does not converge, this will turn into a memory leak.
  • If an exception is raised during evaluation of the graph, should the release queue be flushed before returning the exception to the caller?
    If no, then we might introduce memory leaks: as long as the computation fail, the release queue will grow.
    If yes, another can of worm opens:
    • Arbitrary code is executed during release. This can clobber the exception backtrace, and this code can itself raises. We know have multiple exceptions to report to the user! (This is the same problem as a try-finally function when the finally clause raises.)
    • Commutativity is lost again: we might release a sub-tree that would not have been released if the computation had finished. Maybe the caller was expecting the exception and will fix the problem and resume the computation.

Forbidding exception in Lwd is not acceptable either: there are valid use cases for exception, sometime code can legitimately fails and interrupting the computation is right. Lwd should do its best to handle these situations gracefully.

For all these reasons, it comes with a default behavior that does not require fiddling with release_queues and is well-behaved and commutative as long as no exception is raised.

When exceptions are raised, it defaults to releasing before returning control to the caller (the "If yes" clause above). The exception is wrapped with a decoration that:

  • captures the backtrace of the first exception
  • collect other exceptions that might have happened when flushing the queue.

All this can be overriden by providing a custom release_queue and catching exceptions in the caller.

New implementation

release_queue object

type release_failure = exn * Printexc.raw_backtrace            
                                                               
type release_queue                                             
val make_release_queue : unit -> release_queue                 
val flush_release_queue : release_queue -> release_failure list

A release_queue accumulates nodes to release and release all of them when flush_release_queue is called.
If releasing raises an exception (that will be in the user-provided release function of a primitive), the backtrace and the exception are captured and returned.

These failures are collected in a list. Normal execution will return the empty list.

Sampling and releasing with custom queue

val sample : release_queue -> 'a root -> 'a   
val release : release_queue -> 'a root -> unit

Nodes observability can change only during calls to sample or release.
The release_queue is filled with the nodes to release and nothing is released during evaluation.

If an exception is raised when sampling, the exception is not intercepted and evaluation will resume on the next call to sample.

exception Release_failure of exn option * release_failure list

val quick_sample : 'a root -> 'a   
val quick_release : 'a root -> unit

The easier quick_ functions are provided if you don't want to be bothered with release management. However, their behavior is subtler in presence of exception.

quick_sample release nodes immediately after the evaluation.

Exceptions raised during release are caught and the Release_failure exception is thrown at the end with all the exceptions.

If an exception is raised during the evaluation, it is intercepted and the queue is still flushed. It is reraised after unless another exception happens during release, in which case Release_failure is thrown with the original exception stored in the first parameter.

In quick_release, if exceptions happen during release they are caught and wrapped in the Release_failure exception.

Nodes are released when they are no longer being observed. That happens in two situations: - When `release` is called on a root. The graph is traversed and all nodes are marked as no longer being observed. - In a `join` or `bind` node, when the inner graph changes, the previous one is released. Releasing a sub-graph is a significant operation, as it traverses the whole sub-graph and can invoke arbitrary user functions (the `release` callback of primitives). # Release behavior should commute Badly implemented, it lead to code that depends on evaluation order: ```ocaml let computation_1 = Lwd.var (Lwd.pure 0) let computation_2 = Lwd.var (Lwd.pure 0) let expensive_int_computation : int Lwd.t = ... let diff : int Lwd.t = Lwd.map2 (-) (Lwd.join (Lwd.get computation_1)) (Lwd.join (Lwd.get computation_2)) let root = Lwd.observe diff ``` Now assume we are observing the `diff` node and that we are switching back-and-forth between these two configurations: ```ocaml let setup_1 () = Lwd.set computation_1 (Lwd.pure 0); Lwd.set computation_2 expensive_int_computation let setup_2 () = Lwd.set computation_1 expensive_int_computation; Lwd.set computation_2 (Lwd.pure 0) ``` Because of the left-to-right evaluation order, the call to `map2` evaluates `computation_1` first and then `computation_2`. Let's trace the following sequence of calls: ```ocaml let r0 = Lwd.sample root let () = setup_1 () let r1 = Lwd.sample root let () = setup_2 () let r2 = Lwd.sample root let () = setup_1 () let r3 = Lwd.sample root ``` Evaluating after the first call to `setup_1`: - `computation_1` does not do anything special - `computation_2` causes `expensive_int_computation` to be observed for the first time, the graph is traversed and each node is update Evaluating after the first call to `setup_2`: - `computation_1` also makes use of `expensive_int_computation`. Because it is already in use (since `computation_2` has not been updated yet), this requires no work, evaluation happens in $O(1)$. - `computation_2` releases `expensive_int_computation`, but it is now used by `computation_1` , evaluation happens in $O(1)$. Evaluating after the second call to `setup_21: - `computation_1` releases `expensive_int_computation`. Because it was the last observer, the sub-graph is released and previous evaluation results are dropped, evaluation happens in $O(n)$. - `computation_2` acquires `expensive_int_computation` that was released. The sub-graph is traversed for acquire $O(n)$, and evaluation is done after, at least $o(n)$. An apparently innocuous change made a computation go from $O(1)$ to $o(n)$. This lack of commutativity is worrying not just for performance reason, but because the program typically does side-effects during `acquire`/`release` phase that might can lead to different behaviors. The solution to this problem is to delay the `release` operation to not happend during evaluation cycle. Sub-graphs to release are put in a queue that is flushed after evaluation. #### The release_queue To this end, we introduce a new `release_queue` object. It accumulates node to release and, when deemed appropriate, actually release them. The *when* is ill-defined, as it might depend a lot on the application. If a single document is observed it can happen at the end of the current evaluation cycle. But without stretching imagination too much, we can find other scenarios that have a different *when*: - Nested document evaluation. For instance, for a visual application, a frame can be made of different phases: layout, render, event propagation. Each phase can be done by evaluating a document, but the right granularity for a cycle is a full frame rather than a phase. This is beyond the scope of Lwd, but Lwd should be flexible enough to integrate this use case well. - If a document is damaged during evaluation (as in the previous section), should we extend the evaluation cycle to the next fixed point? (Certain iterative layout algorithms might rely on that behavior) Release would thus be delayed a bit, but if the layout does not converge, this will turn into a memory leak. - If an exception is raised during evaluation of the graph, should the release queue be flushed before returning the exception to the caller? If no, then we might introduce memory leaks: as long as the computation fail, the release queue will grow. If yes, another can of worm opens: - Arbitrary code is executed during release. This can clobber the exception backtrace, and this code can itself raises. We know have multiple exceptions to report to the user! (This is the same problem as a try-finally function when the finally clause raises.) - Commutativity is lost again: we might release a sub-tree that would not have been released if the computation had finished. Maybe the caller was expecting the exception and will fix the problem and resume the computation. Forbidding exception in Lwd is not acceptable either: there are valid use cases for exception, sometime code can legitimately fails and interrupting the computation is right. Lwd should do its best to handle these situations gracefully. For all these reasons, it comes with a default behavior that does not require fiddling with `release_queue`s and is well-behaved and commutative as long as no exception is raised. When exceptions are raised, it defaults to releasing before returning control to the caller (the "If yes" clause above). The exception is wrapped with a decoration that: - captures the backtrace of the first exception - collect other exceptions that might have happened when flushing the queue. All this can be overriden by providing a custom `release_queue` and catching exceptions in the caller. # New implementation ## release_queue object ```ocaml type release_failure = exn * Printexc.raw_backtrace type release_queue val make_release_queue : unit -> release_queue val flush_release_queue : release_queue -> release_failure list ``` A `release_queue` accumulates nodes to release and release all of them when `flush_release_queue` is called. If releasing raises an exception (that will be in the user-provided `release` function of a primitive), the backtrace and the exception are captured and returned. These failures are collected in a list. Normal execution will return the empty list. ## Sampling and releasing with custom `queue` ```ocaml val sample : release_queue -> 'a root -> 'a val release : release_queue -> 'a root -> unit ``` Nodes observability can change only during calls to `sample` or `release`. The `release_queue` is filled with the nodes to release and nothing is released during evaluation. If an exception is raised when sampling, the exception is not intercepted and evaluation will resume on the next call to `sample`. ```ocaml exception Release_failure of exn option * release_failure list val quick_sample : 'a root -> 'a val quick_release : 'a root -> unit ``` The easier `quick_` functions are provided if you don't want to be bothered with release management. However, their behavior is subtler in presence of exception. `quick_sample` release nodes immediately after the evaluation. Exceptions raised during release are caught and the `Release_failure` exception is thrown at the end with all the exceptions. If an exception is raised during the evaluation, it is intercepted and the queue is still flushed. It is reraised after unless another exception happens during release, in which case `Release_failure` is thrown with the original exception stored in the first parameter. In `quick_release`, if exceptions happen during release they are caught and wrapped in the `Release_failure` exception.
def changed title from WIP: release-batch to release-queue 2 years ago
def closed this pull request 2 years ago
The pull request has been merged as 454562301a.
Sign in to join this conversation.
No reviewers
No Label
No Milestone
No Assignees
1 Participants
Notifications
Due Date

No due date set.

Dependencies

This pull request currently doesn't have any dependencies.

Loading…
There is no content yet.