Skip to content
Open
44 changes: 44 additions & 0 deletions Doc/library/importlib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,28 @@ ABC hierarchy::
.. versionchanged:: 3.4
Returns ``None`` when called instead of :data:`NotImplemented`.

.. method:: discover(parent=None)

An optional method which searches for possible specs with given *parent*
module spec. If *parent* is *None*, :meth:`MetaPathFinder.discover` will
search for top-level modules.

Returns an iterable of possible specs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be the first time we have a public API yielding things from importlib, do we actually want to do that or should this return a list?

callers consuming the results might be writing code that makes changes that could impact future results... yielding could get messy.

what are the intended use cases for the API? if it's something we expect callers to short circuit and stop iterating on after the first match maybe yield makes sense, but then we should probably just have an explicit direct discover_first API for that instead.

Copy link
Member Author

@FFY00 FFY00 Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I considered.

The main use-case is finding a similar-named module to show as a hint on ModuleNotFoundError (eg. "Did you meant numpy?", when trying to import numby), so I think it would make sense to make this new API a generator, or at least some kind of lazy container.

For cases such as you describe, the user could just consume the full generator into a list to avoid any issue. Still leaving opportunity for the code that could leverage the benefit of this being a lazy API — scaning directories with a lot of files can take a while, not to mention the other exotic finders out there that may operate over the network or something like that.

I am not fundamentally opposed to make this method return a list, but I can't see the value in the trade-of if we document it properly.

callers consuming the results might be writing code that makes changes that could impact future results... yielding could get messy.

While this is technically possible, I would find it extremely uncommon. And I think it should be reasonable to assume that people who are knowledgeable enough to do that, would probably be aware of the downsides of making changes to the import machinery, while consuming the API

but then we should probably just have an explicit direct discover_first API for that instead

And what would that look like? Would it take a predicate function and return the first entry that matches?


So, would have a warning in the documentation regarding your concern be a good enough compromise?

Copy link
Contributor

@ncoghlan ncoghlan Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, for the "nearest matching module name" error message case, you just want to keep the closest match, so even though all the names need to be checked, you only need to keep a reference to one of them (the closest so far).

Since the potential number of results may be absurdly high in some situations, and the iterable form provides more options for consumers to handle that appropriately, I do think it makes sense to make it an iterable. The warning in the docs should explain why it's an iterable, since consumers that unconditionally convert the result to a list can run into problems regardless.

As far as prior examples of iterable import APIs goes, importlib.resources.contents was the first example I found in a quick look (covering a very similar situation, just for non-module resources rather than submodules)


Raises :exc:`ValueError` if *parent* is not a package module.

.. warning::
This method can potentially yield a very large number of objects, and
it may carry out IO operations when computing these values.

Because of this, it will generaly be desirable to compute the result
values on-the-fly, as they are needed. As such, the returned object is
only guaranteed to be an :class:`iterable <collections.abc.Iterable>`,
instead of a :class:`list` or other
:class:`collection <collections.abc.Collection>` type.

.. versionadded:: next


.. class:: PathEntryFinder

Expand Down Expand Up @@ -307,6 +329,28 @@ ABC hierarchy::
:meth:`importlib.machinery.PathFinder.invalidate_caches`
when invalidating the caches of all cached finders.

.. method:: discover(parent=None)

An optional method which searches for possible specs with given *parent*
module spec. If *parent* is *None*, :meth:`PathEntryFinder.discover` will
search for top-level modules.

Returns an iterable of possible specs.

Raises :exc:`ValueError` if *parent* is not a package module.

.. warning::
This method can potentially yield a very large number of objects, and
it may carry out IO operations when computing these values.

Because of this, it will generaly be desirable to compute the result
values on-the-fly, as they are needed. As such, the returned object is
only guaranteed to be an :class:`iterable <collections.abc.Iterable>`,
instead of a :class:`list` or other
:class:`collection <collections.abc.Collection>` type.

.. versionadded:: next


.. class:: Loader

Expand Down
48 changes: 48 additions & 0 deletions Lib/importlib/_bootstrap_external.py
Original file line number Diff line number Diff line change
Expand Up @@ -1323,6 +1323,23 @@ def find_spec(cls, fullname, path=None, target=None):
else:
return spec

@classmethod
def discover(cls, parent=None):
if parent is None:
path = sys.path
elif parent.submodule_search_locations is None:
raise ValueError(f'{parent} is not a package module')
else:
path = parent.submodule_search_locations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be None when parent is a non-package module? make a nicer error message or should this situation use sys.path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it be a non-package module if it has a child? It could be a namespace package, but that's still a package, and parent.submodule_search_locations should be an iterable objects when the spec is fully initialized, which it should always be at this point.

Unless I am missing something? Do we support package-like module extensions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @gpshead is referring to the error case where parent.submodule_search_locations is None because the caller supplied a "parent" spec from a non-package module. Instead of the default "'NoneType' object is not iterable" message from the failed iteration attempt, we should raise something more specific.


for entry in set(path):
if not isinstance(entry, str):
continue
if (finder := cls._path_importer_cache(entry)) is None:
continue
if discover := getattr(finder, 'discover', None):
yield from discover(parent)

@staticmethod
def find_distributions(*args, **kwargs):
"""
Expand Down Expand Up @@ -1472,6 +1489,37 @@ def path_hook_for_FileFinder(path):

return path_hook_for_FileFinder

def _find_children(self):
with _os.scandir(self.path) as scan_iterator:
while True:
try:
entry = next(scan_iterator)
if entry.name == _PYCACHE:
continue
# packages
if entry.is_dir() and '.' not in entry.name:
yield entry.name
# files
if entry.is_file():
yield from {
entry.name.removesuffix(suffix)
for suffix, _ in self._loaders
if entry.name.endswith(suffix)
}
except OSError:
pass # ignore exceptions from next(scan_iterator) and os.DirEntry
except StopIteration:
break

def discover(self, parent=None):
if parent and parent.submodule_search_locations is None:
raise ValueError(f'{parent} is not a package module')

module_prefix = f'{parent.name}.' if parent else ''
for child_name in self._find_children():
if spec := self.find_spec(module_prefix + child_name):
yield spec

def __repr__(self):
return f'FileFinder({self.path!r})'

Expand Down
19 changes: 19 additions & 0 deletions Lib/importlib/abc.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,16 @@ def invalidate_caches(self):
This method is used by importlib.invalidate_caches().
"""

def discover(self, parent=None):
"""An optional method which searches for possible specs with given *parent*
module spec. If *parent* is *None*, MetaPathFinder.discover will search
for top-level modules.

Returns an iterable of possible specs.
"""
return ()


_register(MetaPathFinder, machinery.BuiltinImporter, machinery.FrozenImporter,
machinery.PathFinder, machinery.WindowsRegistryFinder)

Expand All @@ -58,6 +68,15 @@ def invalidate_caches(self):
This method is used by PathFinder.invalidate_caches().
"""

def discover(self, parent=None):
"""An optional method which searches for possible specs with given
*parent* module spec. If *parent* is *None*, PathEntryFinder.discover
will search for top-level modules.

Returns an iterable of possible specs.
"""
return ()

_register(PathEntryFinder, machinery.FileFinder)


Expand Down
121 changes: 121 additions & 0 deletions Lib/test/test_importlib/test_discover.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
from unittest.mock import Mock

from test.test_importlib import util

importlib = util.import_importlib('importlib')
machinery = util.import_importlib('importlib.machinery')


class DiscoverableFinder:
def __init__(self, discover=[]):
self._discovered_values = discover

def find_spec(self, fullname, path=None, target=None):
raise NotImplemented

def discover(self, parent=None):
yield from self._discovered_values


class TestPathFinder:
"""PathFinder implements MetaPathFinder, which uses the PathEntryFinder(s)
registered in sys.path_hooks (and sys.path_importer_cache) to search
sys.path or the parent's __path__.

PathFinder.discover() should redirect to the .discover() method of the
PathEntryFinder for each path entry.
"""

def test_search_path_hooks_top_level(self):
modules = [
self.machinery.ModuleSpec(name='example1', loader=None),
self.machinery.ModuleSpec(name='example2', loader=None),
self.machinery.ModuleSpec(name='example3', loader=None),
]

with util.import_state(
path_importer_cache={
'discoverable': DiscoverableFinder(discover=modules),
},
path=['discoverable'],
):
discovered = list(self.machinery.PathFinder.discover())

self.assertEqual(discovered, modules)


def test_search_path_hooks_parent(self):
parent = self.machinery.ModuleSpec(name='example', loader=None, is_package=True)
parent.submodule_search_locations.append('discoverable')

children = [
self.machinery.ModuleSpec(name='example.child1', loader=None),
self.machinery.ModuleSpec(name='example.child2', loader=None),
self.machinery.ModuleSpec(name='example.child3', loader=None),
]

with util.import_state(
path_importer_cache={
'discoverable': DiscoverableFinder(discover=children)
},
path=[],
):
discovered = list(self.machinery.PathFinder.discover(parent))

self.assertEqual(discovered, children)

def test_invalid_parent(self):
parent = self.machinery.ModuleSpec(name='example', loader=None)
with self.assertRaises(ValueError):
list(self.machinery.PathFinder.discover(parent))


(
Frozen_TestPathFinder,
Source_TestPathFinder,
) = util.test_both(TestPathFinder, importlib=importlib, machinery=machinery)


class TestFileFinder:
"""FileFinder implements PathEntryFinder and provides the base finder
implementation to search the file system.
"""

def get_finder(self, path):
loader_details = [
(self.machinery.SourceFileLoader, self.machinery.SOURCE_SUFFIXES),
(self.machinery.SourcelessFileLoader, self.machinery.BYTECODE_SUFFIXES),
]
return self.machinery.FileFinder(path, *loader_details)

def test_discover_top_level(self):
modules = {'example1', 'example2', 'example3'}
with util.create_modules(*modules) as mapping:
finder = self.get_finder(mapping['.root'])
discovered = list(finder.discover())
self.assertEqual({spec.name for spec in discovered}, modules)

def test_discover_parent(self):
modules = {
'example.child1',
'example.child2',
'example.child3',
}
with util.create_modules(*modules) as mapping:
example = self.get_finder(mapping['.root']).find_spec('example')
finder = self.get_finder(example.submodule_search_locations[0])
discovered = list(finder.discover(example))
self.assertEqual({spec.name for spec in discovered}, modules)

def test_invalid_parent(self):
with util.create_modules('example') as mapping:
finder = self.get_finder(mapping['.root'])
example = finder.find_spec('example')
with self.assertRaises(ValueError):
list(finder.discover(example))


(
Frozen_TestFileFinder,
Source_TestFileFinder,
) = util.test_both(TestFileFinder, importlib=importlib, machinery=machinery)
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Introduced :meth:`importlib.abc.MetaPathFinder.discover`
and :meth:`importlib.abc.PathEntryFinder.discover` to allow module and submodule
name discovery without assuming the use of traditional filesystem based imports.
Loading