Feature/9 improved acquisition process of japanese tokens in contentbasedrecommenderts (#11)

kemsakurai · kemsakurai · web-flow · commit 442821e39c02 · 2025-07-01T08:24:35.000+09:00
* feat: Implement English and Japanese tokenizers with filtering capabilities

- Added EnglishTokenizer and JapaneseTokenizer classes for text tokenization.
- Integrated HTML tag stripping and case normalization in EnglishTokenizer.
- Implemented morphological analysis using kuromoji in JapaneseTokenizer.
- Created EnglishTokenFilter and JapaneseTokenFilter for token filtering.
- Developed a ProcessingPipelineFactory to create tokenization and filtering pipelines.
- Enhanced ContentBasedRecommender with improved training and recommendation logic.
- Added comprehensive unit tests for tokenizers, filters, and recommender functionality.
- Implemented integration tests for end-to-end functionality of the recommender system.

* Update package.json

---------

Co-authored-by: kemsakurai &lt;sakurai.kem@mail.com&gt;
diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "ts-content-based-recommender",
-  "version": "1.6.0",
+  "version": "1.6.1",
   "description": "A TypeScript-based content-based recommender with multilingual support (Japanese & English). Forked from content-based-recommender.",
   "homepage": "https://github.com/kensakurai/ts-content-based-recommender",
   "repository": {

Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "ts-content-based-recommender",`
`3`		`- "version": "1.6.0",`
	`3`	`+ "version": "1.6.1",`
`4`	`4`	`"description": "A TypeScript-based content-based recommender with multilingual support (Japanese & English). Forked from content-based-recommender.",`
`5`	`5`	`"homepage": "https://github.com/kensakurai/ts-content-based-recommender",`
`6`	`6`	`"repository": {`