fix: using thread pool on macOS (#1861)
The python parser uses ProcessPoolExecutor, which is problematic on
macOS when it is distributed as a zip file, leading to errors like:
```
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/var/folders/fw/vythc6112ygfsvky8mdb5p580000gn/T/Bazel.runfiles_esxfeg_v/runfiles/python3_aarch64-apple-darwin/lib/python3.9/multiprocessing/resource_tracker.py", line 24, in <module>
from . import spawn
File "/var/folders/fw/vythc6112ygfsvky8mdb5p580000gn/T/Bazel.runfiles_esxfeg_v/runfiles/python3_aarch64-apple-darwin/lib/python3.9/multiprocessing/spawn.py", line 13, in <module>
import runpy
File "/var/folders/fw/vythc6112ygfsvky8mdb5p580000gn/T/Bazel.runfiles_esxfeg_v/runfiles/python3_aarch64-apple-darwin/lib/python3.9/runpy.py", line 19, in <module>
from pkgutil import read_code, get_importer
ModuleNotFoundError: No module named 'pkgutil'
```
According to ["Contexts and start methods"
section](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods)
of the documentation:
> On macOS, the spawn start method is now the default. The fork start
method should be considered unsafe as it can lead to crashes of the
subprocess as macOS system libraries may start threads.
meanwhile:
> The 'spawn' and 'forkserver' start methods generally cannot be used
with “frozen” executables (i.e., binaries produced by packages like
PyInstaller and cx_Freeze) on POSIX systems.
This means there is no way to start a ProcessPoolExecutor when the
Python zip file is running on macOS. This PR switches it to
ThreadPoolExecutor instead.diff --git a/gazelle/python/parse.py b/gazelle/python/parse.py
index daa6d2b..ae9ce87 100644
--- a/gazelle/python/parse.py
+++ b/gazelle/python/parse.py
@@ -20,6 +20,7 @@
import concurrent.futures
import json
import os
+import platform
import sys
from io import BytesIO
from tokenize import COMMENT, NAME, OP, STRING, tokenize
@@ -108,8 +109,19 @@
return output
+def create_main_executor():
+ # We cannot use ProcessPoolExecutor on macOS, because the fork start method should be considered unsafe as it can
+ # lead to crashes of the subprocess as macOS system libraries may start threads. Meanwhile, the 'spawn' and
+ # 'forkserver' start methods generally cannot be used with “frozen” executables (i.e., Python zip file) on POSIX
+ # systems. Therefore, there is no good way to use ProcessPoolExecutor on macOS when we distribute this program with
+ # a zip file.
+ # Ref: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
+ if platform.system() == "Darwin":
+ return concurrent.futures.ThreadPoolExecutor()
+ return concurrent.futures.ProcessPoolExecutor()
+
def main(stdin, stdout):
- with concurrent.futures.ProcessPoolExecutor() as executor:
+ with create_main_executor() as executor:
for parse_request in stdin:
parse_request = json.loads(parse_request)
repo_root = parse_request["repo_root"]