View on GitHub


Simple cross-platform multilingual content analysis

The Yoshikoder is a cross-platform multilingual content analysis program developed as part of the Identity Project at Harvard’s Weatherhead Center for International Affairs.

You can load documents, construct and apply content analysis dictionaries, examine keywords-in-context, and perform basic content analyses, in any language. About two laptops ago it looked like this:

Screenshot of Yoshikoder

The Yoshikoder works with text documents, whether in plain ASCII, Unicode (e.g. UTF-8), or national encodings (e.g. Big5 Chinese.) You can construct, view, and save keywords-in-context. You can write content analysis dictionaries. Yoshikoder provides summaries of documents, either as word frequency tables or according to a content analysis dictionary. You can also apply a dictionary analysis to the results of a concordance, which provides a flexible way to study local word contexts. Yoshikoder’s native file format is XML, so dictionaries and keyword-in-context files are non-proprietary and human readable.


You can find some dictionaries and other resources at the old homepage.


To install, choose a version from the releases page appropriate for your operating system. Mac users without Java 1.8+ installed, or who just want to be sure should choose the ‘bundled’ version.


If you run into a bug you can tell me about it here.


There’s not much new lately as I’m working on R packages for text analysis instead. Mostly Quanteda and Austin.


If you’d like to refer to the package in written work (and you should) you can use this:

Will Lowe (2015) ‘Yoshikoder: Cross-platform multilingual content analysis’. Java software, version 0.6.5, URL