Skip to content

Can dataclass ordering avoid tuple wrappers for single field comparisons? #144191

@rhettinger

Description

@rhettinger

Proposal and Rationale

When order=True, the generated rich comparison methods wrap the target field in a tuple. This makes sense when multiple fields are being compared. However for a single field, it adds unnecessary extra work: two extra tuple allocations and deallocations, a tuple compare, and superfluous equality check on the field.

For example, the heapq docs suggest making the following class:

from dataclasses import dataclass, field
from typing import Any

@dataclass(order=True)
class PrioritizedItem:
    priority: int
    item: Any=field(compare=False)

The generated code for less-than operation is:

 def __lt__(self,other):
  if other.__class__ is self.__class__:
   return (self.priority,)<(other.priority,)
  return NotImplemented

Instead, it would be nicer to generate:

 def __lt__(self,other):
  if other.__class__ is self.__class__:
   return self.priority<other.priority
  return NotImplemented

Behavior change

When the tuple wrapper was removed from __eq__ in favor of individual field comparisons linked by and, it resulted in faster code that matched what people would write by hand. However, it also caused issues where users had been relying on the identity checks inside the tuple comparison. That affects us here as well:

from math import nan

>>> nan <= nan
False
>>> (nan,) <= (nan,)
True

Unlike the impact of the __eq__ change, users of ordering operations are far less likely to be relying on identity-implies-equality.

One reason is that __eq__ is used with hashing but ordering operations are not. So the prior change affected dict() and set() while the proposed change does not.

Another reason is that it only affects __le__ and __ge__ but not __eq__, __ne__, __lt__, and __ge__, so it doesn't show-up 100% of the time.

Also Python's ordering tools (sorted, min, max, nsmallest, nlargest, merge, bisect, and heapq) do not use __le__ and __ge__. So they are entirely unaffected.

The upside of the behavior change is that it fixes an existing inconsistency between '<', '==', and '<=':

>>> PrioritizedItem(nan, 'x') < PrioritizedItem(nan, 'y')
False
>>> PrioritizedItem(nan, 'x') == PrioritizedItem(nan, 'y')
False 

# Inconsistent with the previous two calls
>>> PrioritizedItem(nan, 'x') <= PrioritizedItem(nan, 'y')  
True

This inconsistency is unexpected because it does not show up in hand-written code:

>>> nan < nan
False
>>> nan == nan
False
>>> nan <= nan
False

Metadata

Metadata

Assignees

Labels

stdlibStandard Library Python modules in the Lib/ directorytopic-dataclassestype-featureA feature request or enhancement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions