Back Up Next

Tips and guidelines

Below are some tips and guidelines for submitting condor jobs from subMIT.

Index


General notes

The requirements line

The "requirements" line of the job description can be used to restrict the execute nodes to be used. There is a script to generate the requirements string. Use reqgen.py {user} to print out a user-customized requirements string that can be pasted into a condor job description. There are many options to the script to fine-tune the execution targets. See the help menu (-h option) of the script for details.

Examples for writing the requirements by hand

See the notes on individual clusters for a full list of configurations to use for various clusters.

Migrating from CMS MIT T2

Restricting your jobs to MIT T2

While you have access to a larger computing pool than MIT T2, some of your jobs may be optimized or designed to be run at MIT T2. To restrict your jobs to only run at MIT, add the following to your requirements. requirements = GLIDEIN_Site == "MIT_CampusFactory" && BOSCOCluster == "ce03.cmsaf.mit.edu" && BOSCOGroup == "@group@" && HAS_CVMFS_cms_cern_ch Where @group@ is bosco_cms for MIT CMS HEP collaborators, bosco_cmshi for MIT CMS HIG collaborators, and bosco_lns for other LNS members.

Important differences between SubMIT and T2

Operations no longer possible at SubMIT

Copy tools and file permissions

Recommended method for writing to T2 storage is lcg-cp or gfal-copy. By reading inputs via CVMFS or xrootd and writing the output with such grid tools, your jobs will be fully unbound and can run on any resource.

To copy back the job output to T2, use the following lines at the end of your executable script (bash): if which gfal-copy then gfal-copy file://$PWD/output_file srm://se01.cmsaf.mit.edu:8443/srm/v2/server?SFN=destination_full_path else lcg-cp -v -D srmv2 -b file://$PWD/output_file srm://se01.cmsaf.mit.edu:8443/srm/v2/server?SFN=destination_full_path fi In a highly heterogeneous pool, it is possible that the worker node does not have either of the command installed. In such a case the output is not retrievable; you may want to check the command availability at the very beggining of your script and abort the job immediately if there is no way to get the output back.

Files written by gfal-copy or lcg-cp will be owned by a "grid user". To delete them, use gfal-rm from SubMIT. This also means that the output directory must have proper permissions. The grid copy tools will create whatever directories that do not exist, as long as allowed by the permission settings.

Running a CMSSW job

Condor tricks

Making jobs identifiable

A standard technique to make your jobs identifiable is to pass the ClusterId and ProcessId of the job to the job as command-line arguments. This is done by using the arguments line of the job description: arguments = {other arguments} $(ClusterId) $(ProcessId) {other arguments}

Using condor_chirp for semi-realtime logging

Only for test and debugging.

Condor comes with a command condor_chirp to allow communication (file read/write and more) between a running job and the submitter. You can use this mechanism to ship the stdout and stderr logs as they are being written by your executable. To do this, add a line in your job description: +WantIOProxy = true and add lines like below in your executable script whenever you want the job to report back. $(condor_config_val LIBEXEC)/condor_chirp put _condor_stdout _condor_stdout.{some_identifier_string} Replace stdout with stderr for error output. You can pass the ClusterId and ProcessId to the job as discussed above and use them as the identifier. The files are copied to the directory where condor_submit command was issued, or to initialdir if it is specified in the job description.