Back Up Next
Job Submission Tutorial
If you are unfamiliar with HTCondor, you can get an overview of how to prepare a task and submit jobs following this tutorial. Below, {user} must be replaced by your user name.
-
Copy /usr/local/share/tutorial into your work area:
cp -r /usr/local/share/tutorial /work/{user}/
-
Submit a job (will look into the details later):
cd /work/{user}/tutorial/first_test
./submit.sh
-
Check your job status:
condor_q {user}
You will see an output like this:
-- Schedd: SUBMIT.MIT.EDU : <18.77.2.251:9615?...
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
298263.0 yiiyama 11/22 12:54 0+00:00:00 I 0 0.0 first_test.sh 0
1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
There will be a line for each of your jobs currently in queue. Status "I" means that the job has not been picked up by an execute node. When execution starts, the status changes to "R". When the job completes or fails, depeding on the setting used at the submission time, the job will either disappear from the queue or will be kept with "C"=completed or "H"=held status.
To remove a job from the queue any time, issue the command
condor_rm {jobid}
where {jobid} can be the "job cluster" number (298263 in the example above) or the full job id (298263.0). Removing jobs by cluster number is used when all jobs that are submitted at the same time and thus share a cluster number should be removed. It is also possible to remove all jobs of a user: condor_rm {user}.
-
Once your job completes and disappears from the queue, check the output:
cat /work/{user}/tutorial/output/first_test_output_0.txt
You should see the following:
Hello! This is file 0 opened at:
submit.mit.edu
This means that a job successfully ran on the local condor testbed.
-
Now we should take a look at the scripts. First open submit.sh with your favorite text editor. You will find several lines of setup options, followed by some logistical lines, and at end of the script, a command condor_submit. This command parses the job description generated in the preceding lines and submits a job to the "pool" of execute nodes.
In the job description, you must specify an executable (main program of the job), its command-line arguments, job input and output files if there are any. Check the script to see how they are all set up.
-
The executable script used by submit.sh is first_test.sh. Open the file with an editor. What the script does is to append the host name of the execute node to one of the input files and rename the file, so that it will be picked up by condor and returned to the submit host.
Packing together many input files into a single tarball is a common technique in HTCondor usage. Typically, the executable script acts as a "wrapper script" that unpacks the inputs, sets up the environment, executes the actual program you want to run, and takes care of the output file at the end.
-
Now go back to submit.sh and set LOCALTEST=false, NJOBS=10. Run the script. You will see 10 jobs submitted with a shared cluster number. Wait for a few minutes and check the output directory. You should see 10 output files, each possibly reporting a different execute node.
-
That's it! The flexibility of HTCondor makes it impossible to write a one-size-fits-all job submission package, and therefore you are encouraged to set up your own for your needs following the example of this tutorial.