SubMIT - Tips and guidelines

Examples for writing the requirements by hand

See the notes on individual clusters for a full list of configurations to use for various clusters.

Example 1: Run only on MIT Campus Factory (CMS HEP user) requirements = GLIDEIN_Site == "MIT_CampusFactory" && (BOSCOGroup == "paus" || BOSCOGroup == "bosco_cms") && HAS_CVMFS_cms_cern_ch For other users, "bosco_cms" should be replaced by "bosco_cmshi" (CMS HI) or "bosco_lns" (other LNS). HAS_CVMFS_cms_cern_ch is only necessary if you need CMS CVMFS.
Example 2: Running on a specific MIT resource, e.g. T2_US_MIT requirements = GLIDEIN_Site == "MIT_CampusFactory" && BOSCOCluster == "ce03.cmsaf.mit.edu" && (BOSCOGroup == "paus" || BOSCOGroup == "bosco_cms") && HAS_CVMFS_cms_cern_ch
Example 3: Enabling OSG +ProjectName = MyProject +REQUIRED_OS = "rhel6" requirements = \ (OSGVO_OS_STRING == "RHEL 6" && HAS_CVMFS_cms_cern_ch) || \ (HAS_SINGULARITY == true || GLIDEIN_REQUIRED_OS == "rhel6") || \ (GLIDEIN_Site == "MIT_CampusFactory" && (BOSCOGroup == "paus" || BOSCOGroup == "bosco_cms") && HAS_CVMFS_cms_cern_ch)
Example 4: Adding US CMS +ProjectName = MyProject +DESIRED_Sites = "site1,site2,..." +REQUIRED_OS = "rhel6" requirements = \ (OSGVO_OS_STRING == "RHEL 6" && HAS_CVMFS_cms_cern_ch) || \ (HAS_SINGULARITY == true || GLIDEIN_REQUIRED_OS == "rhel6") || \ (GLIDEIN_Site == "MIT_CampusFactory" && (BOSCOGroup == "paus" || BOSCOGroup == "bosco_cms") && HAS_CVMFS_cms_cern_ch)
Example 5: Avoiding certain sites +ProjectName = MyProject +DESIRED_Sites = "site1,site2,..." requirements = ( \ (OSGVO_OS_STRING == "RHEL 6" && HAS_CVMFS_cms_cern_ch) || \ (HAS_SINGULARITY == true || GLIDEIN_REQUIRED_OS == "rhel6") || \ (GLIDEIN_Site == "MIT_CampusFactory" && (BOSCOGroup == "paus" || BOSCOGroup == "bosco_cms") && HAS_CVMFS_cms_cern_ch) \ ) && ( \ isUndefined(GLIDEIN_Entry_Name) || \ !stringListMember(GLIDEIN_Entry_Name, "CMS_T2_US_Nebraska_Red_op,CMS_T2_US_Nebraska_Red_gw1_op,CMS_T2_US_Nebraska_Red_gw2_op,CMS_T3_MX_Cinvestav_proton_work,CMS_T3_US_Omaha_tusker,CMSHTPC_T3_US_Omaha_tusker,Glow_US_Syracuse_condor,Glow_US_Syracuse_condor-ce01,Gluex_US_NUMEP_grid1,HCC_US_BNL_gk01,HCC_US_BNL_gk02,HCC_US_BU_atlas-net2,OSG_US_FIU_HPCOSGCE,OSG_US_Hyak_osg,OSG_US_UConn_gluskap,OSG_US_SMU_mfosgce", ",") \ )

Restricting your jobs to MIT T2

While you have access to a larger computing pool than MIT T2, some of your jobs may be optimized or designed to be run at MIT T2. To restrict your jobs to only run at MIT, add the following to your requirements. requirements = GLIDEIN_Site == "MIT_CampusFactory" && BOSCOCluster == "ce03.cmsaf.mit.edu" && BOSCOGroup == "@group@" && HAS_CVMFS_cms_cern_ch Where @group@ is bosco_cms for MIT CMS HEP collaborators, bosco_cmshi for MIT CMS HIG collaborators, and bosco_lns for other LNS members.

Important differences between SubMIT and T2

SubMIT does not have a large user local space. Large job output should be copied to T2 /mnt/hadoop via gfal-copy or lcg-cp in the job executable. If you restrict your jobs to run at T2, it is possible to use normal cp and mv, but the commands will be executed by the user of the BOSCOGroup.
HTCondor at SubMIT cannot access the user home directory (AFS authenticates with kerberos, the system user condor does not have your ticket). All job inputs and scripts should be placed within /work/{user}.

Operations no longer possible at SubMIT

Log in to a worker node to read the output of a running job. You can instead use condor_chirp to make the executables send the log file.
Anything T2 local, unless you restrict your jobs to T2 only.

Copy tools and file permissions

Recommended method for writing to T2 storage is lcg-cp or gfal-copy. By reading inputs via CVMFS or xrootd and writing the output with such grid tools, your jobs will be fully unbound and can run on any resource.

To copy back the job output to T2, use the following lines at the end of your executable script (bash): if which gfal-copy then gfal-copy file://$PWD/output_file srm://se01.cmsaf.mit.edu:8443/srm/v2/server?SFN=destination_full_path else lcg-cp -v -D srmv2 -b file://$PWD/output_file srm://se01.cmsaf.mit.edu:8443/srm/v2/server?SFN=destination_full_path fi In a highly heterogeneous pool, it is possible that the worker node does not have either of the command installed. In such a case the output is not retrievable; you may want to check the command availability at the very beggining of your script and abort the job immediately if there is no way to get the output back.

Files written by gfal-copy or lcg-cp will be owned by a "grid user". To delete them, use gfal-rm from SubMIT. This also means that the output directory must have proper permissions. The grid copy tools will create whatever directories that do not exist, as long as allowed by the permission settings.

Running a CMSSW job

/cvmfs/cms.cern.ch is mounted on SubMIT, and therefore most of CMSSW setup should work without additional changes.
The only difference is in Frontier (i.e. GlobalTag) connection. The node is not part of any CMS site and therefore needs to be manually pointed to a squid proxy: process.GlobalTag.connect = 'frontier://(proxyurl=http://squid.cmsaf.mit.edu:3128)(proxyurl=http://squid1.cmsaf.mit.edu:3128)(proxyurl=http://squid2.cmsaf.mit.edu:3128)(serverurl=http://cmsfrontier.cern.ch:8000/FrontierProd)/CMS_CONDITIONS' This is only necessary when executing cmsRun directly on submit.mit.edu. Be sure to remove the line for the CMSSW configuration that is shipped to the condor pool.

Making jobs identifiable

A standard technique to make your jobs identifiable is to pass the ClusterId and ProcessId of the job to the job as command-line arguments. This is done by using the arguments line of the job description: arguments = {other arguments} $(ClusterId) $(ProcessId) {other arguments}

Using condor_chirp for semi-realtime logging

Only for test and debugging.

Condor comes with a command condor_chirp to allow communication (file read/write and more) between a running job and the submitter. You can use this mechanism to ship the stdout and stderr logs as they are being written by your executable. To do this, add a line in your job description: +WantIOProxy = true and add lines like below in your executable script whenever you want the job to report back. $(condor_config_val LIBEXEC)/condor_chirp put _condor_stdout _condor_stdout.{some_identifier_string} Replace stdout with stderr for error output. You can pass the ClusterId and ProcessId to the job as discussed above and use them as the identifier. The files are copied to the directory where condor_submit command was issued, or to initialdir if it is specified in the job description.

Tips and guidelines

Index

General notes

The requirements line