wordStem.Rd
This function attempts to stem Turkish tokens using a look-up table (derived from Nuve) as a fast substitute for more complex but more accurate morphological analysis. If tokens contain an apostrophe, only characters before are stemmed and the remainder discarded.
wordStem(x, ...)
x | A token or a vector of tokens |
---|---|
... | Extra arguments, currently ignored |
A stemmed token or vector of stemmed tokens, or the originals if no stems could be found
This code should work the same way as the original Java implementation.
The interface on the other hand is designed to work feel like
the SnowballC
package.
Resha: https://github.com/hrzafer/resha-turkish-stemmer
Nuve: https://github.com/hrzafer/nuve
toks <- c("kitapçığında", "kitapçıdaki", "İstanbul'da") wordStem(toks)#> [1] "kitapçık" "kitapçı" "İstanbul"# "kitapçık" "kitapçı" "İstanbul"