I have a more-or-less complex data structure (list of dictionaries of sets) on which I perform a bunch of operations in a loop until the data structure reaches a steady-state, ie. doesn't change anymore. The number of iterations it takes to perform the calculation varies wildly depending on the input.
I'd like to know if there's an established way for forming a halting condition in this case. The best I could come up with is pickling the data structure, storing its md5 and checking if it has changed from the previous iteration. Since this is more expensive than my operations I only do this every 20 iterations but still, it feels wrong.
Is there a nicer or cheaper way to check for deep equality so that I know when to halt?
preguntado el 08 de noviembre de 11 a las 14:11
Echa un vistazo a python-deep. It should do what you want, and if it's not fast enough you can modify it yourself.
It also very much depends on how expensive the compare operation and how expensive one calculation iteration is. Say, one calculation iteration takes
c time and one test takes
t time and the chance of termination is
p then the optimal testing frequency is:
(t * p) / c
Eso es asumiendo
c < t, if that's not true then you should obviously check every loop.
So, since you can dynamically can track
t and estimate
p (with possible adaptions in the code if the code suspects the calculation is going to end) you can set your test frequency to an optimal value.
I think your only choices are:
Have every update mark a "dirty flag" when it alters a value from its starting state.
Doing a whole structure analysis (like the pickle/md5 combination you suggested).
Just run a fixed number of iterations known to reach a steady state (possibly running too many times but not having the overhead of checking the termination condition).
Option 1 is analogous to what Python itself does with ref-counting. Option 2 is analogous to what Python does with its garbage collector. Option 3 is common in numerical analysis (i.e. run divide-and-average 20 times to compute a square root).
Checking for equality to me doesn't seem the right way to go. Provided that you have full control over the operations you perform, I would introduce a "modified" flag (boolean variable) that is set to false at the beginning of each iteration. Whenever one of your operation modifies (part of) your data structure, it is set to true, and repetition is performed until modified remained "false" throughout a complete iteration.
I would trust the python equality operator to be reasonably efficient for comparing compositions of built-in objects. I expect it would be faster than pickling+hashing, provided python tests for list equality something like this:
def __eq__(a,b): if type(a) == list and type(b) == list: if len(a) != len(b): return False for i in range(len(a)): if a[i] != b[i]: return False return True #testing for other types goes here
Since the function returns as soon as it finds two elements that don't match, in the average case it won't need to iterate through the whole thing. Compare to hashing, which does need to iterate through the whole data structure, even in the best case.
Así es como lo haría yo:
import copy def perform_a_bunch_of_operations(data): #take care to not modify the original data, as we will be using it later my_shiny_new_data = copy.deepcopy(data) #do lots of math here... return my_shiny_new_data data = get_initial_data() while(True): nextData = perform_a_bunch_of_operations(data) if data == nextData: #steady state reached break data = nextData
This has the disadvantage of having to make a deep copy of your data each iteration, but it may still be faster than hashing - you can only know for sure by profiling your particular case.