Fix TSS key exhaustion in implicitly_convertible() (gh-5975) (#6020)

Replace `static thread_specific_storage<int>` with `thread_local bool`
in the implicit conversion reentrancy guard. Since implicitly_convertible
is a template function, each unique <InputType, OutputType> pair created
its own TSS key via PyThread_tss_create(). Projects with hundreds of
modules and many implicit conversions could exhaust PTHREAD_KEYS_MAX
(1024 on Linux, 512 on macOS), especially on Python 3.12+ where CPython
itself consumes more TSS keys for subinterpreter support.

thread_local bool is safe here because:
- bool is trivially destructible, so it works on all C++11 platforms
  including older macOS (the concern that motivated the TSS approach in
  PR #5777 applied only to types with non-trivial destructors needing
  __cxa_thread_atexit runtime support)
- Each thread gets its own copy, so it is thread-safe for free-threading
- Subinterpreter sharing is benign: the guard prevents recursive implicit
  conversions on the same thread regardless of which interpreter is active
- The v3.0.0 code already used thread_local bool under Py_GIL_DISABLED

This effectively reverts the core change from PR #5777 while keeping
the non-copyable/non-movable set_flag guard.

Made-with: Cursor
1 file changed