Skip to content

SequenceMatcher.find_longest_match match expansion doesn't capture maximum range #144167

@dg-pb

Description

@dg-pb

Feature or enhancement

Proposal:

After match is found it is expanded both ways.
First for popular (autojunk), then for isjunk.

# NOTE `NOT isbjunk`
while besti > alo and bestj > blo and \
      not isbjunk(b[bestj-1]) and \
      a[besti-1] == b[bestj-1]:
    besti -= 1
    bestj -= 1
    bestsize += 1

while besti+bestsize < ahi and bestj+bestsize < bhi and \
      not isbjunk(b[bestj+bestsize]) and \
      a[besti+bestsize] == b[bestj+bestsize]:
    bestsize += 1

# NOTE: isbjunk
while besti > alo and bestj > blo and \
      isbjunk(b[bestj-1]) and \
      a[besti-1] == b[bestj-1]:
    besti -= 1
    bestj -= 1
    bestsize += 1

while besti+bestsize < ahi and bestj+bestsize < bhi and \
      isbjunk(b[bestj+bestsize]) and \
      a[besti+bestsize] == b[bestj+bestsize]:
    bestsize = bestsize + 1

However, the order of thee results in the situation where it does not capture what is possible if both were done in a single loop. I.e. maybe after isjunk, we can expand for popular again.

I think this should be done for both in one go. Simply:

while besti > alo and bestj > blo and a[besti-1] == b[bestj-1]:
    besti -= 1
    bestj -= 1
    bestsize += 1
lasti = besti + bestsize
lastj = bestj + bestsize
while lasti < ahi and lastj < bhi and a[lasti] == b[lastj]:
    lasti += 1
    lastj += 1
    bestsize += 1

This results in slightly wider match ranges for 2 test cases.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions