London e-Science Centre homepage London e-Science Centre homepage UK Research Council - e-Science homepage
 
Home Page
Projects
  Sun CoE in e-Science
  Sun Grid Engine Intergration
  Intel Virtual Grid Centre
  GridSAM
  EPIC
  Multi-User Multi-Job Utilisation
  OSCAR-G
  Community Authorisation Server
  Computational Markets
  OGSA Evaluation
  WoSE
  CoreGRID
  RealityGrid
  Discovery Net
  GENIE
  Proteome Grid
  Microarray Analysis
  BAIR
  GridCC
  GridPP
  APPP
  inSORS Collaboration
  MESSAGE
  CLAHRC
  TRANSFORM
Supported Activities
Resources
Services
News and Events
Publications
ICENI- Grid Middleware
Articles and Links
Current Vacancies
Contacts

   Project     Overview     Goals     Bibliography
   Links     Sun Microsystems     Sun Grid Engine     uPortal
EPIC - Sun Grid Engine Integration with Globus Toolkit 3

Introduction

This page describes how to configure a GT3 server to be able to submit jobs to a local Sun Grid Engine installation.

Prerequisites

This installation guide assumes that you have the following already configured and working:

  • A server containing:
    • A properly configured Globus Toolkit 3 installation. Specifically, you should already be able to run jobs on your server using the managed-job-globusrun command using the Master Managed Job Factory Service (MMJFS) and the 'Fork' Jobmanager backend.
    • A properly configured Sun Grid Engine installation. Specifically, you should be able to run tasks on your SGE cluster using the qsub command.
Installation

Overview

The intallation of the SGE integration software can be broken down into several steps -- downloading the software, building it on your GT3 server, updating the configuration of the server and user hosting environment, and finally testing to ensure it operates correctly.

Download

You will need to download the three software packages which make up the glue between the Globus Toolkit and the Sun Grid Engine; these are:

The first of these contains the actual Perl glue code which generates the SGE job script from the RSL specification passed to the managed-job-globusrun command on the client and submits it using qsub -- and monitors its state after it has been submitted.

The second package contains configuration information which will add the MasterSGEManagedJobFactoryService to your GT3 installation. It is this service that the managed-job-globusrun script will connect to when it wishes to execute a job using SGE.

Finally, the last package will add the SGEManagedJobFactoryService to your GT3 installation which actually provides the SGE job execution services that will be used by the MMJFS.

Upgrading from a previous release

Before installing an updated version of the SGE job manager packages, you should first uninstall any existing version using the following command:

% gpt-uninstall globus_gram_job_manager_setup_sge

Build and install

To build and install the SGE jobmanager components, run the following commands in the temporary directory as the user who owns the globus installation:

% gpt-build globus_gram_job_manager_setup_sge-0.11.tar.gz
% gpt-build mmjfs_sge_setup-0.0.tar.gz
% gpt-build mjs_sge_setup-0.0.tar.gz
This will build the three source components and prepare them for installation into the GT3 deployment. Once they have finished, you should run:
% gpt-postinstall
to update the live configuration of the GT3 installation.

Custom configuration

As the gpt-build tool does not allow you to customise the configuration of the SGE package from the commandline, if you wish to override any of the default settings for the jobmanager you will need to run the configuration program again after installation. Run:

% ${GLOBUS_LOCATION}/setup/globus/setup-globus-job-manager-sge --help
for a list of configuration options.

Propagate configuration changes

Although the gpt tools have updated the server-config.wsdd in the ${GLOBUS_LOCATION} directory, the User Hosting Environment (or UHE) spawned by the MMJFS for each user will still have an out of date copy in those users' ${HOME}/.globus/uhe-'hostname'/ directory. So, to remove the old configuration for each user, simply remove the uhe directory:

% rm -Rf $HOME/.globus/uhe-*/
You will also need to restart the GT3 server and UHEs so that the changes will take effect; they only read their configuration files on startup.

Testing

Once you have restarted your GT3 server with the new configuration and services, you can test it using the managed-job-globusrun tool and test job script provided by GT3. After setting ${GLOBUS_LOCATION} in your environment, source the GT3 environment configuration script:

% source ${GLOBUS_LOCATION}/etc/globus-user-env.csh
(for csh users)
$ . ${GLOBUS_LOCATION}/etc/globus-user-env.sh
(for bash users) With that done, you can now run the test job:
% managed-job-globusrun -factory HOSTNAME:PORT -type SGE -file ${GLOBUS_LOCATION}/etc/test.xml -output
(Please substitue the hostname and port of your GT3 service instead of HOSTNAME and PORT above. Run managed-job-globusrun -help if you need to see the full list of commandline options.)


This command will read in the XML-encoded RSL job specification in the file test.xml and will submit it to the MasterSGEManagedJobFactoryService on your GT3 server. This will create a new SGEManagedJobService and feed the job specification to the SGE script generator which will generate and submit a new job for execution on your SGE cluster.

The SGE MJS will poll the state of your job using qstat, and once it has determined it has completed it will return the standard output of the test script to the managed-job-globusrun program which will print that output on the user's terminal.

Troubleshooting

If the above test did not succeed, here a few common error conditions that may occur along with steps to resolve the problem.

Problem: The managed-job-globusrun command aborts with a "Read timeout" exception.

This can occur the first time that a user tries to execute a job after the GT3 server has been started; it can simply take a significant length of time -- longer than the timeout normally tolerated -- to start the MJS which will execute a user's job. Simply try running the job again.

Problem: The managed-job-globusrun command aborts with an AXIS exception indicating it could not find the SGE MMJFS service!

This can occur either when the SGE packages have not been fully installed or when the configuration changes that they made to the GT3 installation have not been propagated to a user's hosting environment. Make sure that you've successfully run gpt-postinstall and that your ~.globus/uhe-'hostname' folder has been cleared and the UHE restarted.


For further information please contact david.mcbride@imperial.ac.uk


Back to top

Comments to lesc@imperial.ac.uk. © The London e-Science Centre.
This page was last modified on Thu Apr 15 13:12:47 BST 2010