recreate word table when refreshing counts

The counting touches a large part of the word table, leaving
bloated tables and indexes. Thus recreate the table instead and
swap it in.
This commit is contained in:
Sarah Hoffmann
2024-02-04 16:43:33 +01:00
parent 33c0f249b1
commit 81eed0680c
10 changed files with 130 additions and 82 deletions

View File

@@ -201,7 +201,7 @@ class AbstractTokenizer(ABC):
@abstractmethod
def update_statistics(self) -> None:
def update_statistics(self, config: Configuration) -> None:
""" Recompute any tokenizer statistics necessary for efficient lookup.
This function is meant to be called from time to time by the user
to improve performance. However, the tokenizer must not depend on