Web Content Display
eth0
DHCPDARGS=" eth0"
yum -y install libtirpc
SLURM
yum install perl-Switch.noarch (headnode only)
cat /proc/sys/fs/inotify/max_user_watches
echo 100000|sudo tee /proc/sys/fs/inotify/max_user_watches
To increase max_user_watches at boot, edit /etc/sysctl.conf and add fs.inotify.max_user_watches=100000 or fix fs.inotify.max_user_watches= to 100000
optional next steps: *
* *
* - install additional cmu packages (ex: cmu-windows-moonshot-addon) *
* *
* - restore a cluster configuration with /opt/cmu/tools/restoreConfig *
* - complete the cmu management node setup: /opt/cmu/bin/cmu_mgt_config -c *
* - setup CMU HA (more than one mgt node): /opt/cmu/tools/cmu_ha_postinstall *
* *
* after setup is finished, unset audit mode and start cmu : *
* *
* /etc/init.d/cmu unset_audit *
* *
* /etc/init.d/cmu start
To create a new backup image after customizing your golden node, you can use this command:sss
/opt/cmu/bin/cmu_backup -l tesla -n tesla1
To clone the image on all the nodes, you can use this command:
/opt/cmu/bin/cmu_clone -i tesla -n tesla[1-16]
configNodeGroup.sh
isCmuConfigured.sh
2017-01-24 18:06:12,200 [DEBUG]: [OK] configILOCredentials - iLO username and password successfully configured and tested
postSwStackComputeNode.sh completed successfully on tesla-int01
Job for slurmctld.service failed because a configured resource limit was exceeded. See "systemctl status slurmctld.service" and "journalctl -xe" for details.
[ERROR] (239) schedulerPostConfiguration.sh - error starting slurm on mcp.sci.utah.edu
[ERROR] (239) schedulerPostConfiguration.sh - The "slurm" scheduler has not been properly configured.
increase NETBOOT_TIMEOUT into /opt/cmu/etc/cmuserver.conf
root 6980 3454 0 10:31 ? 00:00:00 /usr/bin/tclsh /opt/cmu/tools/cmu_do_capture_image -i rack1 -d sda -n tesla-int01 -a x86_64 -b iLO
root 7102 6980 0 10:31 ? 00:00:00 /bin/bash /opt/cmu/bin/cmu_boot -n tesla-int01 -d CMU_NETBOOT
root 11998 6980 0 10:39 ? 00:00:00 /usr/bin/expect -f /opt/cmu/tools/cmu_xsh.exp -- /usr/bin/ssh tesla-int01 stdout hostname
/opt/hpe/hpc/slurm/default/etc/slurm.conf --> /opt/hpe/hpc/shared/slurm
shared to all nodes as nfs mount
added
SchedulerType=sched/backfill
CST config file /opt/cluster_admin/tesla/installation_files/cst-setup-info
To create a new backup image after customizing your golden node, you can use this command:
/opt/cmu/bin/cmu_backup -l cerebro -n cerebro1
To clone the image on all the nodes, you can use this command:
/opt/cmu/bin/cmu_clone -i cerebro -n cerebro[1-10]
head node PAM authintication
/etc/pam.d/sshd
edit line
sshd:# pam_slurm.so for authorize access to compute node
sshd:account required pam_slurm.so
If this line is NOT commented out users can not
On the computer nodes edit
/etc/pam.d/password-auth
account required pam_slurm.so
comment out for ssh node access