masroore/stopwords

A PHP package to remove common stopwords from an input text, it covers most languages.

1.0.2 2022-06-12 12:33 UTC

This package is auto-updated.

Last update: 2024-05-12 19:38:04 UTC


README

Latest Version on Packagist GitHub Tests Action Status GitHub Code Style Action Status Total Downloads

Overview

Stopwords in multiple languages that you can easily use with your PHP applications.

Supported languages

Currently provides stopwords for the following languages:

  • Arabic
  • Azerbaijani
  • Bengali
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Greek
  • Hungarian
  • Indonesian
  • Italian
  • Kazakh
  • Nepali
  • Norwegian
  • Portuguese
  • Romanian
  • Russian
  • Slovene
  • Spanish
  • Swedish
  • Tajik
  • Turkish

Installation

Requires PHP 8.0+

You can install the package via composer:

composer require masroore/stopwords

Usage

$stopwords = new Kaiju\Stopwords\Stopwords();

// get the list of available languages
print_r($stopwords->getLanguages());

// load stopwords for a language
$stopwords->load('english');

// load stopwords for multiple languages
$stopwords->load(['english', 'french']);

// load stopwords for all available languages
$stopwords->load('*');

// check if the given word is a stop-word
$stopwords->isStopword('the'); // TRUE
$stopwords->isStopword('America'); // FALSE

// return a tokenized copy of the text, with stop-words and punctuation marks removed
$text = "Good muffins cost $3.88\nin New York.  Please buy me two of them.\n\nThanks!\n";
print_r($stopwords->strip($text));
// ["Good","muffins","cost","$3.88","New","York","Please","buy","two","Thanks"]

echo $stopwords->clean($text);
// "Good muffins cost $3.88 New York Please buy two Thanks"

Testing

composer test

Changelog

Please see CHANGELOG for more information on what has changed recently.

Contributing

Thank you for considering to contribute to Collision. All the contribution guidelines are mentioned here.

Security Vulnerabilities

Please review our security policy on how to report security vulnerabilities.

Credits

License

Collision is an open-sourced software licensed under the MIT license.