Fix TSS key exhaustion in implicitly_convertible() (gh-5975) (#6020)

Replace `static thread_specific_storage<int>` with `thread_local bool`
in the implicit conversion reentrancy guard. Since implicitly_convertible
is a template function, each unique <InputType, OutputType> pair created
its own TSS key via PyThread_tss_create(). Projects with hundreds of
modules and many implicit conversions could exhaust PTHREAD_KEYS_MAX
(1024 on Linux, 512 on macOS), especially on Python 3.12+ where CPython
itself consumes more TSS keys for subinterpreter support.

thread_local bool is safe here because:
- bool is trivially destructible, so it works on all C++11 platforms
  including older macOS (the concern that motivated the TSS approach in
  PR #5777 applied only to types with non-trivial destructors needing
  __cxa_thread_atexit runtime support)
- Each thread gets its own copy, so it is thread-safe for free-threading
- Subinterpreter sharing is benign: the guard prevents recursive implicit
  conversions on the same thread regardless of which interpreter is active
- The v3.0.0 code already used thread_local bool under Py_GIL_DISABLED

This effectively reverts the core change from PR #5777 while keeping
the non-copyable/non-movable set_flag guard.

Made-with: Cursor
diff --git a/include/pybind11/pybind11.h b/include/pybind11/pybind11.h
index 6c269ad..0d03f4f 100644
--- a/include/pybind11/pybind11.h
+++ b/include/pybind11/pybind11.h
@@ -3557,13 +3557,10 @@
 
 template <typename InputType, typename OutputType>
 void implicitly_convertible() {
-    static int tss_sentinel_pointee = 1; // arbitrary value
     struct set_flag {
-        thread_specific_storage<int> &flag;
-        explicit set_flag(thread_specific_storage<int> &flag_) : flag(flag_) {
-            flag = &tss_sentinel_pointee; // trick: the pointer itself is the sentinel
-        }
-        ~set_flag() { flag.reset(nullptr); }
+        bool &flag;
+        explicit set_flag(bool &flag_) : flag(flag_) { flag_ = true; }
+        ~set_flag() { flag = false; }
 
         // Prevent copying/moving to ensure RAII guard is used safely
         set_flag(const set_flag &) = delete;
@@ -3572,7 +3569,7 @@
         set_flag &operator=(set_flag &&) = delete;
     };
     auto implicit_caster = [](PyObject *obj, PyTypeObject *type) -> PyObject * {
-        static thread_specific_storage<int> currently_used;
+        thread_local bool currently_used = false;
         if (currently_used) { // implicit conversions are non-reentrant
             return nullptr;
         }