This function attempts to stem Turkish tokens using a look-up table (derived from Nuve) as a fast substitute for more complex but more accurate morphological analysis. If tokens contain an apostrophe, only characters before are stemmed and the remainder discarded.

wordStem(x, ...)

Arguments

x

A token or a vector of tokens

...

Extra arguments, currently ignored

Value

A stemmed token or vector of stemmed tokens, or the originals if no stems could be found

Details

This code should work the same way as the original Java implementation. The interface on the other hand is designed to work feel like the SnowballC package.

References

Resha: https://github.com/hrzafer/resha-turkish-stemmer

Nuve: https://github.com/hrzafer/nuve

Examples

toks <- c("kitapçığında", "kitapçıdaki", "İstanbul'da") wordStem(toks)
#> [1] "kitapçık" "kitapçı" "İstanbul"
# "kitapçık" "kitapçı" "İstanbul"