-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PEP] Redesign Pyomo Set component #326
Comments
Note that the rewrite is currently underway on set-rewrite on the jsiirola/pyomo fork. |
My thoughts:
In other words, I agree with what the PEP is proposing for these 7 items. Questions:
|
The only functionality I use from the domain objects in Kernel is checking if a domain is discrete and extracting its bounds, then I throw the domain away and store those two pieces of information directly onto a variable when it is created. I could easily implement some Kernel-only domain objects that capture this, if you wanted move those original domain objects back into base under this new design. |
I have updated the PEP to include the following:
|
After discussion on the PEP, the PMC agreed that we should change the answer to #3 to NO. This will keep implicit sets using the Set object instead of the SetOf object. The major concern was potential performance issues regarding the use of SetOf (checking containment of a list scales poorly compared with the default behavior of Set today). |
Recommendations after reviewing the PEP:
|
@whart222: The majority of your comments (indicated by checked boxes) have been implemented / addressed. As for the others:
|
Proposal
This PEP proposes a partial redesign of the Set component. The goal is to be API-compatible with the current Set components, while resolving a large number of bug reports. The draft redesign is currently available on jsiirola/set-rewrite
Motivation
We have started accumulating a number of documented issues with the current implementation of
Set
components. KnownSet
issues:Discussion
The current Set design has a number of bugs and limitations. Some particularly problematic issues include:
Range objects
This redesign first proposes adding full support for discrete and continuous ranges. This is done through four fundamental "range" objects:
NumericRange
for representing a contiguous continuous (closed or open) or discrete (closed) rangeNonNumericRange
for representing a single discrete non-numeric valueAnyRange
for representing the special "Any" rangeRangeProduct
for representing cross products of rangesNote that range objects are NOT
Set
objects, although they implement a subset of aSet
-like API:__contains__
: return True if a value is within the range__eq__
: return True if two ranges are equalisdiscrete()
: return True if the range is over discrete valuesisfinite()
: return True if the range is over a finite set of discrete values (isfinite() == True
impliesisdiscrete() == True
)isdisjoint(other)
: return True if the range is completely disjoint from theother
range.issubset(other)
: return True if the range is contained within (equal to or strict subset of) theother
range.range_difference(other_ranges)
: return a list of ranges resulting from subtracting all of theother_ranges
from this rangerange_intersection(other_ranges)
: return a list of ranges resulting from intersecting this range with all of theother_ranges
.Numeric ranges are defined with a
start
,end
, andstep
values, along withclosed
(a 2-tuple of True/False). Continuous ranges are indicated bystep
== 0.start
andend
can both contain either a floating point value orNone
.start
must be less than or equal toend
.closed[0]
andclosed[1]
indicate ifstart
andend
(respectively) are included in the range. Discrete ranges are indicated bystep
taking a non-zero signed integer. Discrete ranges must be closed, thusclosed
==(True, True)
. The discrete range is "grounded" bystart
(which must be an integer and cannot beNone
) and proceeding up to and includingend
(which may beNone
).(end - start) % step
is guaranteed to be 0. Ifstart
==end
, thenstep
== 0 (by convention).NonNumericRange
andAnyRange
are included to support Set operations between discrete or Any Sets and range-based Sets.NonNumericRange
is a bit of a misnomer, as it does not contain a "range", but rather a single non-numeric value. Sets of non-numeric values are represented by lists ofNonNumericRange
objects.RangeProduct
represents the cross product of two ranges and is needed to provide complete support for Set operations (Set operations other than cross product are simple ranges and can be represented by lists ofNumericRange
andNonNumericRange
objects).Ranges must support "correct" set arithmetic (difference and intersection), or raise an exception. That is, operations like
should yield
The current draft implementation supports everything except unbounded continuous range minus unbounded discrete range (which raises an exception). The Any range is handled a bit like "infinity", in that
and
It is possible that we may desire the latter case to raise an exception.
RangeSet objects
RangeSet
has been significantly expanded over the previousRangeSet
implementation.RangeSet
objects are now based aroundNumericRange
objects, which includes support for non-finite ranges (both continuous and unbounded). Similarly, boutique ranges (like semi-continuous domains) can be represented, e.g.:The
RangeSet
object continues to support the old notation for specifying discrete ranges:By implementing RangeSet using NumericRanges, the global Sets (like
Reals
,Integers
,PositiveReals
, etc.) are now trivial instances of a RangeSet and support all Set operations. A consequence of this is that it may impact the implementation of Kernel. Kernel currently relies on class inheritance to determine domains (i.e.,isinstance
ofRealSet
,IntegerSet
,BooleanSet
, etc). As this new approach to the global sets makes use of theRangeSet
object for domains, this would pose two problems for Kernel:Set
providesisdiscrete()
to determine if the set is completely discrete. Continuous real ranges can be verified be extracting theranges()
from theRangeSet
and verifying that they are allnot isdiscrete()
and that they arenot isdisjoint
.Reals - domain
and seeing if the resulting Set only contained unbounded ranges (all(None in r.bounds() for r in (Reals - domain).ranges())
)Given @ghackebeil's comment below, it is possible that Kernel could operate completely on Set APIs (
isdiscrete()
andbounds()
) and not have to explicitly import anything from the AML. We could also mitigate compatibility concerns by extending the definition ofDeclareGlobalSet
to also support adding additional base classes (like an abstractRealSet
class). The abstract base classes could be defined in Kernel, thereby avoiding a circular import between Kernel and AML.SetData objects
Pyomo
Set
objects are designed to be "API-compatible" with Pythonset
objects. However, not all Set objects implement the fullset
API (e.g., only finite discrete Sets supportadd()
).All Sets implement one of the following APIs:
class _SetDataBase(ComponentData)
(pure virtual interface)class _SetData(_SetDataBase)
(base class for all AML Sets)__contains__
: [abstract] tests for set membership__eq__
: tests for set equivalence__ne__
: tests for set equivalence__str__
: [abstract] returns a string representation of the set (name or values)dimen
: [abstract] read-only property that returns the dimentionality of all set members (orNone
if the members are not required to have a common dimentionality.isfinite()
: [NEW] True if the set contains a countable number of membersisordered()
: [NEW] True if the set storage has a (deterministic) orderingranges()
: [NEW][abstract] generator yielding range objects that define the setisdisjoint(other)
: True if self and other are disjoint setsissubset(other)
: True if self is a subset of (<=) other. From python'sset
issuperset(other)
: True if self is a superset of (>=) other. From python'sset
virtual
[DEPRECATED] False for ass Set objects except SetOperator derivativesconcrete
[DEPRECATED] returnsisfinite()
union(*others)
: Returns a SetOperator that represents the union of self and other setsintersection(*others)
: Returns a SetOperator that represents the intersection of self and other setsdifference(*others)
: Returns a SetOperator that represents the difference of self and other setssymmetric_difference(*others)
: Returns a SetOperator that represents the symmetric difference of self and other setscross(*others)
: Returns a SetOperator that represents the cross product of self and other sets__le__
=issubset
__ge__
=issuperset
__or__
=union
__and__
=intersection
__sub__
=difference
__xor__
=symmetric_difference
__mul__
=cross
__ror__
,__rand__
,__rsub__
,__rxor__
,__rmul__
__lt__
: Returns True if self is a strict subset of other__gt__
: Returns True if self is a strict superset of otherclass _FiniteSetMixin(object)
(pure virtual interface, adds support for discrete/iterable sets)__len__
: [abstract] Returns the number of elements in the Set__iter__
: [abstract] Returns an iterator over the Set members__reversed__
: Returns a reversed iterator over the Set membersdata()
: [MODIFIED] Return all Set members as a tuple. NOTE: previous implementation returned a set() or list. This implementation proposes always returning a tuple in the same order as__iter__
ordered_data()
[NEW] return the set members in a deterministic order (the internal order if isordered(), otherwise sorted using sorted_robust)sorted_data()
[NEW] return the set members sorted (using the internal sorting for sorted sets, otherwise the results of sorted_robust())bounds()
: returns a tuple representing the current bounds of the Set (based on it's current members/definition). Note that incomparable or mixed-type sets will return(None,None)
class _FiniteSetData(_FiniteSetMixin, _SetData)
(data class for finite discrete sets; equivalent to a pythonset
)add()
: Add an element to this finite setremove()
: Remove an element from the set, raise an exception if the element is not founddiscard()
: Remove an element from the set, silently return if the item is not foundclear()
: Remove all elements from this setset_value()
: Set the members of the set, discarding any previously-stored elementsupdate()
: [NEW] Update the set by adding members from the passed iterablepop()
: [NEW] Remove one (random) element from the set_OrderedSetMixin(object)
(pure virtual interface, adds support for ordered Sets)first()
: The first element of the Setlast()
: The last element of the Setnext(x)
: The element after x, raising an exception afterlast()
nextw(x)
: The element after x, returningfirst()
afterlast()
prev(x)
: The element before x, raising an exception afterfirst()
prevw(x)
: The element before x, returninglast()
afterfirst()
__getitem__
: [abstract] Retrieve the nth element (1-based)ord(x)
: [abstract] Return the (1-based) index of xclass _OrderedSetData(_OrderedSetMixin, _FiniteSetData)
(data class for ordered finite discrete sets)This is a bit of a change from the current Set objects. First, the lowest-level (non-abstract) Data object supports infinite sets; that is, sets that contain an infinite number of values (this includes both bounded continuous ranges as well as unbounded discrete ranges). As there are an infinite number of values, iteration is not supported. The base class also implements all set operations. Note that
_SetData
does not implementlen()
, as Python requireslen()
to return a positive integer.Finite sets add iteration and support for
len()
. In addition, they support access to members through three methods:data()
returns the members as a tuple (in the internal storage order), and may not be deterministic.ordered_data()
returns the members, and is guaranteed to be in a deterministic order (in the case of insertion order sets, up to the determinism of the script that populated the set). Finally,sorted_data()
returns the members in a sorted order (guaranteed deterministic, up to the implementation of < and ==). TODO: should these three members all return generators? This would further change the implementation ofdata()
, but would allow consumers to potentially access the members in a more efficient manner.Ordered sets add support for
ord()
and__getitem__
, as well as thefirst
,last
,next
andprev
methods for stepping over set members. (No changes from the previous API)Note that the base APIs are all declared (and to the extent possible, implemented) through Mixin classes.
Behavioral changes / implementation notes
Sets preserve the same determinism properties of the current Set implementation: iterating over a Set with
isordered() == False
will not produce a deterministic ordering. However, if the client wants to ensure deterministic ordering, the Set API now providesordered_data()
andsorted_data()
.ordered_data()
is guaranteed to produce a deterministic ordering for all sets andsorted_data()
will produce a sorted ordering (either using the Set's internal sorting order or the results ofsorted_robust()
as necessary). Using these accessors is preferable to relying on the client to perform the sorting as the Set methods will only perform additional sorting if it is necessary. That is,tuple(__iter__) == ordered_data()
for Sets whereisordered() == True
, without performing any additional sorting.Casting a Set to a string has been extended. Standard components (and old Set objects) return their fully-qualified names when cast to a string (through
__str__()
). If the object had not yet been attached to a Block (and didn't have aname
specified when constructed), then the name of the Component class is returned. This PEP proposes changing the behavior for Set-like objects: when the object is not assigned to a Block, then__str__()
returns the contents of the set (for set-like objects), the range representation (for range objects) or a string representation of the operation (for _SetOperation objects). The intent of this change is that Sets (and Set-like things) that the user explicitly attached to a Block will return the name that the user gave them. However, implicitly-created objects (like _SetOperator objects and implicit Sets) will be "expanded". For example, note the VarIndex=
field:TODO Should the behavior of implicit indices be changed? Currently, implicitly declared indices (e.g.,
Var([1,2])
) are created as first-class Set objects withSet(initialize=[1,2])
. This is consistent with the previous implementation. However, this approach has several drawbacks:print(m.I | [3,4])
will giveI | {3, 4}
.The advantage of the current approach is that users are less likely to be "bitten" by reference counted objects, e.g.:
The current reference implementation for the PEP preserves previous behavior and creates implicit Sets using
Set()
. The leading alternative is to useSetOf()
. We could mitigate the concerns for multiple implicit Sets referring to the same underlying object through a global (weakref) registry of implicitly-declares SetOf() objects, and issuing a warning any time a duplicate is encountered.Set operators now preserve the "finiteness" and "orderedness" of the underlying Sets (as appropriate). This allows all Set operations to be performed on infinite Sets. In particular,
Var(Any, m.I)
is now valid, as isVar(Integers, m.I)
. IndexedComponent constructors should testisfinite()
before iterating over the component'sindex_set()
during construction.The Set()
__init__()
andconstruct()
make use of a new general system for processing arguments, based around the general functionsInitializer()
andSetInitializer()
. This centralizes much of the argument processing (e.g., handling of scalars, dictionaries, lists, functions/rules, generators, etc). My intent is to eventually propagate this to all IndexedComponents (in a later PEP). A preliminary implementation for Var is promising, with little/no impact on construction time.While the new implementation is far more robust and consistent compared to the current implementation, it comes at a significant increases in the number of classes. For example, there are now 3 "Simple" Set classes (
FiniteSimpleSet
,OrderedSimpleSet
, andSortedSimpleSet
). Further, SetOf requires 2 implementations (UnorderedSetOf
andOrderedSetOf
), and each Set operator requires 4 classes (e.g.,SetUnion
,SetUnion_InfiniteSet
,SetUnion_FiniteSet
, andSetUnion_OrderedSet
)This implementation provides a new approach for declaring "global" Sets (like Reals and Integers) that will correctly survive pickling and deepcopying (resolving open issues around using
is
for domain checks on cloned models).TODO we should determine if implicit sets (e.g., created by set operations or implicit casting of iterable objects into sets) should be added to the Model. The current behavior of
Block.add_component()
is to add these implicit sets to the model with hard-coded names. This has caused issues for users, especially when deleting and re-adding components (see del_component issue when deleting constraints #45). Not adding these Sets to the model would resolve that issue, although we would need to be careful how/when the implicit sets get constructed (part would have to be handled by the Setconstruct()
method.How should we handle type casting in Sets?
1 in Reals == True
(Python thinks so:1 in {1.0} == True
)1. in Integers == True
? (Python thinks so:1.0 in {1} == True
)RangeSet(10) | RangeSet(5,20,0)
?How should we handle 1-tuples? Conceptually, we could think of all indices as tuples (and the primary role for Set objects is as index domains). That said, for efficiency reasons, we store 1-dimensional indices as bare values (that is, we unpack 1-tuples to be proper scalars). The original argument was that the scalars are more memory efficient and are faster to generate and manipulate. However, they also cause us to do strange things, like trap for non-tuples when firing rules, e.g.:
It gets more confusing when we start doing set operations like SetProduct with non-dimensional sets (i.e., with dimen=None, so we have to calculate / determine / guess how to subdivide longer tuples into component indices). In addition, there is overlap in the meaning of dimen:
The draft implementation performs index normalization on set members, including unpacking all 1-tuples to scalars. This makes the internal storage more consistent and simplifies member lookup, guaranting that a Set will never contain a 1-tuple. However, this index normalization can be turned off (enabling both the storage of 1-tuples and nested tuples) by setting:
Change: The previous Set implementation only applies the filter function to the data provided at construction time (from initialize or the external data source). This PEP proposes a more consistent behavior and applies the filter any time data is added to the Set (even after construction).
Specific topics requesting discussion / comment
virtual
andconcrete
?data()
,ordered_data()
, andsorted_data()
return generators?The text was updated successfully, but these errors were encountered: