pw_tokenizer: Decode token databases as UTF-8

Windows doesn't default to UTF-8.

Change-Id: I9a3d5e6034044f7085dd3fe056f33aa12e58e105
Reviewed-on: https://pigweed-review.googlesource.com/c/pigweed/pigweed/+/37960
Pigweed-Auto-Submit: Wyatt Hepler <hepler@google.com>
Commit-Queue: Auto-Submit <auto-submit@pigweed.google.com.iam.gserviceaccount.com>
Reviewed-by: Ewout van Bekkum <ewout@google.com>
diff --git a/pw_tokenizer/py/pw_tokenizer/tokens.py b/pw_tokenizer/py/pw_tokenizer/tokens.py
index 825aec9..663935e 100644
--- a/pw_tokenizer/py/pw_tokenizer/tokens.py
+++ b/pw_tokenizer/py/pw_tokenizer/tokens.py
@@ -457,7 +457,7 @@
 
         # Read the path as a CSV file.
         _check_that_file_is_csv_database(self.path)
-        with self.path.open('r', newline='') as file:
+        with self.path.open('r', newline='', encoding='utf-8') as file:
             super().__init__(parse_csv(file))
             self._export = write_csv