To use Joshua as a standalone decoder (with language packs), you only need to download and install the runtime version of the decoder. If you also wish to build translation models from your own data, you will want to install the full version. See the instructions below.
Set up some basic environment variables.
You need to define $JAVA_HOME
export JAVA_HOME=/path/to/java
# JAVA_HOME is not very standardized. Here are some places to look:
# OS X: export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_71.jdk/Contents/Home
# Linux: export JAVA_HOME=/usr/java/default
If you are installing the full version of Joshua, you also need to define $HADOOP
to point to your Hadoop installation.
(Joshua looks for the Hadoop executuble in $HADOOP/bin/hadoop
)
export HADOOP=/usr
If you don’t have a Hadoop installation, Joshua’s pipeline can install a standalone version for you.
To install just the runtime version of Joshua, type
wget -q http://cs.jhu.edu/~post/files/joshua-runtime-6.0.5.tgz
Then build everything
tar xzf joshua-runtime-6.0.5.tgz
cd joshua-runtime-6.0.5
# Add this to your init files
export JOSHUA=$(pwd)
# build everything
ant
To instead install the full version, type
wget -q http://cs.jhu.edu/~post/files/joshua-6.0.5.tgz
tar xzf joshua-6.0.5.tgz
cd joshua-6.0.5
# Add this to your init files
export JOSHUA=$(pwd)
# build everything
ant
If you wish to build models for new language pairs from existing data (such as the WMT data), you need to install some additional dependencies.
For learning hierarchical models, Joshua includes a tool called Thrax, which
is built on Hadoop. If you have a Hadoop installation, make sure that the environment variable
$HADOOP
is set and points to it. If you don’t, Joshua will roll one out for you in standalone
mode. Hadoop is only needed if you plan to build new models with Joshua.
You will need to install Moses if either of the following applies to you:
You wish to build phrase-based models (Joshua 6 includes a phrase-based decoder, but not the tools for building such a model)
You are building your own models (phrase- or syntax-based) and wish to use Cherry & Foster’s batch MIRA tuner instead of the included MERT implementation, Z-MERT.
Follow the instructions for installing Moses
here, and then define the $MOSES
environment variable to point to the root of the Moses installation.
For more detail on the decoder itself, including its command-line options, see the Joshua decoder page. You can also learn more about other steps of the Joshua MT pipeline, including grammar extraction with Thrax and Joshua’s efficient grammar representation.
If you have problems or issues, you might find some help on our answers page or in the mailing list archives.
A bundled configuration, which is a minimal set of configuration, resource, and script files, can be created and easily transferred and shared.