Accessing and using whale

Whale cluster

The clusters consist of a login node ( and a number of compute nodes. The cluster shares home directories crill, but are otherwise separate. The only access method to whale from the outside world is by using ssh. If you would like to get an account, please contact gabriel [at]

Login Node Usage

The login nodes are to be used for editing, compiling and submitting jobs. They are not to be used for running jobs such as parallel programs. Program runs are submitted through Hadoop on this cluster.

What is Hadoop?

Hadoop is a framework which supports the distributed execution of large scale data processing, and is famous for the MapReduce parallel programming pattern. Please refer to the Hadoop webpages for details on the framework, this page only provides a quick summary of the most relevant commands. The cluster does have a HDFS file system that can be accessed from all nodes, and should be used for MapReduce jobs.

HDFS commands

Job Submission

Jobs are submitted from the login node and run on 1 or more compute nodes. Jobs then run until they terminate in some way, e.g. normal completion, timeout, abort.