Navigation Navigation

Web Content Display Web Content Display

eth0
DHCPDARGS=" eth0"

yum -y install libtirpc

SLURM
yum install perl-Switch.noarch (headnode only)


cat /proc/sys/fs/inotify/max_user_watches

echo 100000|sudo tee /proc/sys/fs/inotify/max_user_watches

To increase max_user_watches at boot, edit /etc/sysctl.conf and add fs.inotify.max_user_watches=100000 or fix fs.inotify.max_user_watches= to 100000


optional next steps:                                                         *
*                                                                              *
* - install additional cmu packages (ex: cmu-windows-moonshot-addon)           *
*                                                                              *
* - restore a cluster configuration with /opt/cmu/tools/restoreConfig          *
* - complete the cmu management node setup: /opt/cmu/bin/cmu_mgt_config -c     *
* - setup CMU HA (more than one mgt node): /opt/cmu/tools/cmu_ha_postinstall   *
*                                                                              *
* after setup is finished, unset audit mode and start cmu :                    *
*                                                                              *
* /etc/init.d/cmu unset_audit                                                  *
*                                                                              *
* /etc/init.d/cmu start      


To create a new backup image after customizing your golden node, you can use this command:sss

    /opt/cmu/bin/cmu_backup -l tesla -n tesla1

To clone the image on all the nodes, you can use this command:

    /opt/cmu/bin/cmu_clone -i tesla -n tesla[1-16]


configNodeGroup.sh

isCmuConfigured.sh

2017-01-24 18:06:12,200 [DEBUG]: [OK] configILOCredentials - iLO username and password successfully configured and tested

postSwStackComputeNode.sh completed successfully on tesla-int01
Job for slurmctld.service failed because a configured resource limit was exceeded. See "systemctl status slurmctld.service" and "journalctl -xe" for details.
[ERROR] (239) schedulerPostConfiguration.sh - error starting slurm on mcp.sci.utah.edu
[ERROR] (239) schedulerPostConfiguration.sh - The "slurm" scheduler has not been properly configured.

increase NETBOOT_TIMEOUT into /opt/cmu/etc/cmuserver.conf


root      6980  3454  0 10:31 ?        00:00:00 /usr/bin/tclsh /opt/cmu/tools/cmu_do_capture_image -i rack1 -d sda -n tesla-int01 -a x86_64 -b iLO
root      7102  6980  0 10:31 ?        00:00:00 /bin/bash /opt/cmu/bin/cmu_boot -n tesla-int01 -d CMU_NETBOOT
root     11998  6980  0 10:39 ?        00:00:00 /usr/bin/expect -f /opt/cmu/tools/cmu_xsh.exp -- /usr/bin/ssh tesla-int01 stdout hostname

 


/opt/hpe/hpc/slurm/default/etc/slurm.conf --> /opt/hpe/hpc/shared/slurm
shared to all nodes as nfs mount

added

SchedulerType=sched/backfill


CST config file /opt/cluster_admin/tesla/installation_files/cst-setup-info

 


To create a new backup image after customizing your golden node, you can use this command:

    /opt/cmu/bin/cmu_backup -l cerebro -n cerebro1

To clone the image on all the nodes, you can use this command:

    /opt/cmu/bin/cmu_clone -i cerebro -n cerebro[1-10]

head node PAM authintication

/etc/pam.d/sshd

edit line

sshd:# pam_slurm.so for authorize access to compute node
sshd:account  required  pam_slurm.so

If this line is NOT commented out users can not

On the computer nodes edit

/etc/pam.d/password-auth
account     required      pam_slurm.so

comment out for ssh node access