Deploying Ray Cluster for AI/ML workloads on a Kubernetes Cluster

Rangaswamy P V
13 min readApr 7, 2023

--

In this article we are going to install a Ray operator in a Kubernetes cluster. We have looked into setting up a Kubernetes cluster in our previous article “Terraform scripts to create a K8s Cluster using “kubeadm” in AWS from scratch”. We can use the same cluster to deploy the Ray Operator. Ray is an open-source unified framework for scaling AI and Python applications like machine learning. This is the same Ray cluster that is being deployed at scale at ChatGPT /OpenAI

Note: If you followed my earlier article mentioned above to create the AWS infrastructure, where in the “variables.tf” file I would have used a “t2.large” as the instance type..

variable "ins_type" {
default = "t2.large"
}

For this article to Install Ray I would be needing a higher configuration because of the Ray Cluster requirements hence gonna use “t2.xlarge” as the default value in the “ins_type” variable, kindly change it and then apply the terraform scripts.

There are 2 files that needs to be changed. The first is the modify.secret file. Rename it as secrets.tfvars and make your account specific changes. Change the “accesskey” and “secretkey” to your AWS account specific values . Change the “ami” reflecting the Ubuntu 22.04 ami in your AWS specific account. Change the value of “keypath” to your private account key in AWS in the pem format. Mine says YCStartup2018.pem. The particular .pem format of openssl is mentioned in my earlier article Addendum1 . “sftp” the Private key that was downloaded at the time of creation of the keys in AWS account to this folder where Terraform is running. The second file that needs modification is the “varaibles .tf” file which has “region”, “ins_type” “key_name” (the Public key part name that was created in the AWS account) and “server_names” to suit your requirements.

If you are bringing your own cluster then you can follow along.

Use the following script to install the Ray Operator and the Autoscaler

$ mkdir rayapp;cd rayapp
$ git pull https://github.com/rangapv/Rayapp.git
$ ls
README.md rayinstall.sh script.py rayauto.yaml


$ ./rayinstall.sh
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Printing Ray-Cluster Status
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
NAME AGE
raycluster-autoscaler 4h28m
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Printing Ray-Pod Status
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
NAME READY STATUS RESTARTS AGE
kuberay-operator-6789c45846-6bdbm 1/1 Running 1 (140m ago) 4h28m
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Printing Ray-AutoScaler Status
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
NAME READY STATUS RESTARTS AGE
raycluster-autoscaler-head-c6dff 2/2 Running 3 (139m ago) 4h25m
raycluster-autoscaler-worker-small-group-24m2g 1/1 Running 1 (140m ago) 4h25m

Ray supports both Python and Java workloads, in this article we will deal with Python only.

First make sure you have a working Python runtime installed along with pip. You can use my build Python from source script from my github page as shown below

$ git pull https://github.com/rangapv/ansible-install.git
$ ls
README.md ............ openssl-1.0.2o p2.sh py.sh test.sh
$./py.sh
...
...
...

ubuntu@ip-172-1-137-152:~$ python -V
Python 3.10.12

ubuntu@ip-172-1-137-152:~$ pip -V
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
pip 23.2.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)

ubuntu@ip-172-1-137-152:~$ python3 -V
Python 3.10.12

ubuntu@ip-172-1-137-152:~$ pip3 -V
pip 23.2.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)

The script “py.sh” will install python3 and all the necessary packages from the source and it takes care of all the dependencies. If you already have a working Python runtime you could SKIP this step!

The essential for running your AI/ML workloads is Ray Core which converts you Python Classes and Functions into Ray Actors and Objects. You can read more about the Ray Core concepts in this link https://docs.ray.io/en/latest/ray-core/key-concepts.html

Install Ray from pip
$ pip -V

pip 23.0.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
ubuntu@ip-172-1-89-12:~$ sudo pip install -U "ray[default]"
Defaulting to user installation because normal site-packages is not writeable
Collecting ray[default]
Downloading ray-2.3.1-cp310-cp310-manylinux2014_x86_64.whl (58.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.5/58.5 MB 23.9 MB/s eta 0:00:00
Requirement already satisfied: jsonschema in /usr/lib/python3/dist-packages (from ray[default]) (3.2.0)
Collecting protobuf!=3.19.5,>=3.15.3
Downloading protobuf-4.22.1-cp37-abi3-manylinux2014_x86_64.whl (302 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 302.4/302.4 kB 37.1 MB/s eta 0:00:00
Requirement already satisfied: pyyaml in /usr/lib/python3/dist-packages (from ray[default]) (5.4.1)
Collecting frozenlist
Downloading frozenlist-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (149 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.6/149.6 kB 22.7 MB/s eta 0:00:00
Collecting filelock
Downloading filelock-3.10.7-py3-none-any.whl (10 kB)
Requirement already satisfied: attrs in /usr/lib/python3/dist-packages (from ray[default]) (21.2.0)
Collecting aiosignal
Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Requirement already satisfied: click>=7.0 in /usr/lib/python3/dist-packages (from ray[default]) (8.0.3)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from ray[default]) (2.25.1)
Collecting grpcio>=1.42.0
Downloading grpcio-1.53.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.0/5.0 MB 53.9 MB/s eta 0:00:00
Collecting msgpack<2.0.0,>=1.0.0
Downloading msgpack-1.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (316 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 316.8/316.8 kB 32.7 MB/s eta 0:00:00
Collecting packaging
Downloading packaging-23.0-py3-none-any.whl (42 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.7/42.7 kB 6.1 MB/s eta 0:00:00
Collecting numpy>=1.19.3
Downloading numpy-1.24.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 52.4 MB/s eta 0:00:00
Collecting virtualenv>=20.0.24
Downloading virtualenv-20.21.0-py3-none-any.whl (8.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.7/8.7 MB 66.3 MB/s eta 0:00:00
Collecting opencensus
Downloading opencensus-0.11.2-py2.py3-none-any.whl (128 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 128.2/128.2 kB 21.6 MB/s eta 0:00:00
Collecting prometheus-client>=0.7.1
Downloading prometheus_client-0.16.0-py3-none-any.whl (122 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 122.5/122.5 kB 20.0 MB/s eta 0:00:00
Collecting aiohttp-cors
Downloading aiohttp_cors-0.7.0-py3-none-any.whl (27 kB)
Collecting pydantic
Downloading pydantic-1.10.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 70.3 MB/s eta 0:00:00
Collecting colorful
Downloading colorful-0.5.5-py2.py3-none-any.whl (201 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.4/201.4 kB 30.3 MB/s eta 0:00:00
Collecting py-spy>=0.2.0
Downloading py_spy-0.3.14-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (3.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.0/3.0 MB 67.7 MB/s eta 0:00:00
Collecting smart-open
Downloading smart_open-6.3.0-py3-none-any.whl (56 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.8/56.8 kB 10.7 MB/s eta 0:00:00
Collecting gpustat>=1.0.0
Downloading gpustat-1.0.0.tar.gz (90 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 90.5/90.5 kB 15.8 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting aiohttp>=3.7
Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 59.8 MB/s eta 0:00:00
Collecting async-timeout<5.0,>=4.0.0a3
Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting multidict<7.0,>=4.5
Downloading multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 114.5/114.5 kB 17.8 MB/s eta 0:00:00
Collecting charset-normalizer<4.0,>=2.0
Downloading charset_normalizer-3.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (199 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 kB 24.5 MB/s eta 0:00:00
Collecting yarl<2.0,>=1.0
Downloading yarl-1.8.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (264 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 264.0/264.0 kB 33.7 MB/s eta 0:00:00
Collecting blessed>=1.17.1
Downloading blessed-1.20.0-py2.py3-none-any.whl (58 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.4/58.4 kB 9.0 MB/s eta 0:00:00
Collecting nvidia-ml-py<=11.495.46,>=11.450.129
Downloading nvidia_ml_py-11.495.46-py3-none-any.whl (25 kB)
Collecting psutil>=5.6.0
Downloading psutil-5.9.4-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (280 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 280.2/280.2 kB 32.0 MB/s eta 0:00:00
Requirement already satisfied: six>=1.7 in /usr/lib/python3/dist-packages (from gpustat>=1.0.0->ray[default]) (1.16.0)
Collecting distlib<1,>=0.3.6
Downloading distlib-0.3.6-py2.py3-none-any.whl (468 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.5/468.5 kB 46.6 MB/s eta 0:00:00
Collecting platformdirs<4,>=2.4
Downloading platformdirs-3.2.0-py3-none-any.whl (14 kB)
Collecting google-api-core<3.0.0,>=1.0.0
Downloading google_api_core-2.11.0-py3-none-any.whl (120 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 120.3/120.3 kB 19.4 MB/s eta 0:00:00
Collecting opencensus-context>=0.1.3
Downloading opencensus_context-0.1.3-py2.py3-none-any.whl (5.1 kB)
Collecting typing-extensions>=4.2.0
Downloading typing_extensions-4.5.0-py3-none-any.whl (27 kB)
Collecting wcwidth>=0.1.4
Downloading wcwidth-0.2.6-py2.py3-none-any.whl (29 kB)
Collecting googleapis-common-protos<2.0dev,>=1.56.2
Downloading googleapis_common_protos-1.59.0-py2.py3-none-any.whl (223 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 223.6/223.6 kB 26.6 MB/s eta 0:00:00
Collecting google-auth<3.0dev,>=2.14.1
Downloading google_auth-2.17.1-py2.py3-none-any.whl (178 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 178.1/178.1 kB 22.3 MB/s eta 0:00:00
Requirement already satisfied: idna>=2.0 in /usr/lib/python3/dist-packages (from yarl<2.0,>=1.0->aiohttp>=3.7->ray[default]) (3.3)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/lib/python3/dist-packages (from google-auth<3.0dev,>=2.14.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]) (4.8)
Collecting cachetools<6.0,>=2.0.0
Downloading cachetools-5.3.0-py3-none-any.whl (9.3 kB)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/lib/python3/dist-packages (from google-auth<3.0dev,>=2.14.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]) (0.2.1)
Building wheels for collected packages: gpustat
Building wheel for gpustat (setup.py) ... done
Created wheel for gpustat: filename=gpustat-1.0.0-py3-none-any.whl size=19888 sha256=a9b8107e4bb661bed2839074a3631f9c85d115a4ebc0c6319a7e2adaa64a312d
Stored in directory: /home/ubuntu/.cache/pip/wheels/d2/48/27/33e31726d2001b997a11c23a7c76f7a48d8d96851f14ef0cd2
Successfully built gpustat
Installing collected packages: wcwidth, py-spy, opencensus-context, nvidia-ml-py, msgpack, distlib, colorful, typing-extensions, smart-open, psutil, protobuf, prometheus-client, platformdirs, packaging, numpy, multidict, grpcio, frozenlist, filelock, charset-normalizer, cachetools, blessed, async-timeout, yarl, virtualenv, pydantic, gpustat, googleapis-common-protos, google-auth, aiosignal, ray, google-api-core, aiohttp, opencensus, aiohttp-cors
WARNING: The scripts f2py, f2py3 and f2py3.10 are installed in '/home/ubuntu/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script normalizer is installed in '/home/ubuntu/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script virtualenv is installed in '/home/ubuntu/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script gpustat is installed in '/home/ubuntu/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The scripts ray, rllib, serve and tune are installed in '/home/ubuntu/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed aiohttp-3.8.4 aiohttp-cors-0.7.0 aiosignal-1.3.1 async-timeout-4.0.2 blessed-1.20.0 cachetools-5.3.0 charset-normalizer-3.1.0 colorful-0.5.5 distlib-0.3.6 filelock-3.10.7 frozenlist-1.3.3 google-api-core-2.11.0 google-auth-2.17.1 googleapis-common-protos-1.59.0 gpustat-1.0.0 grpcio-1.53.0 msgpack-1.0.5 multidict-6.0.4 numpy-1.24.2 nvidia-ml-py-11.495.46 opencensus-0.11.2 opencensus-context-0.1.3 packaging-23.0 platformdirs-3.2.0 prometheus-client-0.16.0 protobuf-4.22.1 psutil-5.9.4 py-spy-0.3.14 pydantic-1.10.7 ray-2.3.1 smart-open-6.3.0 typing-extensions-4.5.0 virtualenv-20.21.0 wcwidth-0.2.6 yarl-1.8.2
Start the Ray Head

ubuntu@ip-172-1-182-92:~$ which ray
/usr/local/bin/ray

ubuntu@ip-172-1-182-92:~$ ray --version
ray, version 2.6.3

Now lets just Check the Kubernetes cluster for all the pods and service once. The working setup…

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-1-237-177.us-west-2.compute.internal Ready <none> 18h v1.26.3
ip-172-1-89-12.us-west-2.compute.internal Ready control-plane 18h v1.26.3
$ kubectl get po --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default raycluster-autoscaler-head-6hdgb 2/2 Running 3 (92s ago) 18h
default raycluster-autoscaler-worker-small-group-fstn9 1/1 Running 1 (2m19s ago) 18h
kube-system cloud-controller-manager-lfx4w 1/1 Running 1 (2m8s ago) 18h
kube-system coredns-787d4945fb-2z8qz 1/1 Running 1 (2m8s ago) 18h
kube-system coredns-787d4945fb-wxzg4 1/1 Running 1 (2m8s ago) 18h
kube-system etcd-ip-172-1-89-12.us-west-2.compute.internal 1/1 Running 1 (2m8s ago) 18h
kube-system kube-apiserver-ip-172-1-89-12.us-west-2.compute.internal 1/1 Running 1 (2m8s ago) 18h
kube-system kube-controller-manager-ip-172-1-89-12.us-west-2.compute.internal 1/1 Running 1 (2m8s ago) 18h
kube-system kube-flannel-ds-gdzvq 1/1 Running 1 (2m19s ago) 18h
kube-system kube-flannel-ds-xwmfk 1/1 Running 1 (2m8s ago) 18h
kube-system kube-proxy-66p2x 1/1 Running 1 (2m8s ago) 18h
kube-system kube-proxy-p69ml 1/1 Running 1 (2m19s ago) 18h
kube-system kube-scheduler-ip-172-1-89-12.us-west-2.compute.internal 1/1 Running 1 (2m8s ago) 18h
kubernetes-dashboard dashboard-metrics-scraper-7bc864c59-hkhfj 1/1 Running 1 (2m19s ago) 18h
kubernetes-dashboard kubernetes-dashboard-6c7ccbcf87-rwbjw 1/1 Running 1 (2m19s ago) 18h
ray-system kuberay-operator-86f7c988d4-sqt27 1/1 Running 1 (2m19s ago) 18h
ubuntu@ip-172-1-89-12:~$ kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 18h
default raycluster-autoscaler-head-svc ClusterIP 10.103.255.21 <none> 10001/TCP,6379/TCP,8265/TCP 18h
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 18h
kubernetes-dashboard dashboard-metrics-scraper ClusterIP 10.103.156.129 <none> 8000/TCP 18h
kubernetes-dashboard kubernetes-dashboard NodePort 10.111.67.154 <none> 443:30002/TCP 18h
ray-system kuberay-operator ClusterIP 10.99.83.230 <none> 8080/TCP 18h

We can see that the “kuberay-operator” is running as a service and also the autoscaler.

Let us connect to the Ray Cluster for testing purpose.

$ kubectl get pods --selector=ray.io/cluster=raycluster-autoscaler
NAME READY STATUS RESTARTS AGE
raycluster-autoscaler-head-6hdgb 2/2 Running 6 (14m ago) 4d18h
raycluster-autoscaler-worker-small-group-fstn9 1/1 Running 2 (14m ago) 4d18h

ubuntu@ip-172-1-89-12:~$ kubectl get pods --selector=ray.io/cluster=raycluster-autoscaler --selector=ray.io/node-type=head -o custom-columns=POD:metadata.name --no-headers
raycluster-autoscaler-head-6hdgb

ubuntu@ip-172-1-89-12:~$ kubectl exec raycluster-autoscaler-head-6hdgb -it -c ray-head -- python -c "import ray; ray.init()"
2023-04-02 23:42:40,992 INFO worker.py:1243 -- Using address 127.0.0.1:6379 set in the environment variable RAY_ADDRESS
2023-04-02 23:42:40,993 INFO worker.py:1364 -- Connecting to existing Ray cluster at address: 10.244.1.16:6379...
2023-04-02 23:42:41,000 INFO worker.py:1550 -- Connected to Ray cluster. View the dashboard at http://10.244.1.16:8265

Let us now deploy the Ray workload. For this we need to start the ray process

$ ray start --head
Enable usage stats collection? This prompt will auto-proceed in 10 seconds to avoid blocking cluster startup. Confirm [Y/n]: Y
Usage stats collection is enabled. To disable this, add `--disable-usage-stats` to the command that starts the cluster, or run the following command: `ray disable-usage-stats` before starting the cluster. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.

Local node IP: 172.1.89.12

--------------------
Ray runtime started.
--------------------

Next steps
To connect to this Ray runtime from another node, run
ray start --address='172.1.89.12:6379'

Alternatively, use the following Python code:
import ray
ray.init(address='auto')

To see the status of the cluster, use
ray status
To monitor and debug Ray, view the dashboard at
127.0.0.1:8265

If connection fails, check your firewall settings and network configuration.

To terminate the Ray runtime, run
ray stop

Submit jobs: To test the setup let us run a demo script in Python.

$ cd rayapp ; ls
README.md rayinstall.sh script.py

$ vi script.py
# script.py
import ray

@ray.remote
def hello_world():
return "hello world"

# Automatically connect to the running Ray cluster.
ray.init()
print(ray.get(hello_world.remote()))
$

We are going to submit the above python job in this demo. Now set the environment variable RAY_ADDRESSS , to the node IP:PORT and submit the job as shown below.

ubuntu@ip-172-1-89-12:~$ export RAY_ADDRESS="http://127.0.0.1:8265"
ubuntu@ip-172-1-89-12:~$ ~$ ray job submit --working-dir ./ -- python script.py
Job submission server address: http://127.0.0.1:8265


ray job submit --working-dir ./ -- python script.py
Job submission server address: http://127.0.0.1:8265
2023-04-06 08:37:47,302 INFO dashboard_sdk.py:315 -- Uploading package gcs://_ray_pkg_32efc9ab97ab3947.zip.
2023-04-06 08:37:47,302 INFO packaging.py:503 -- Creating a file package for local directory './'.

-------------------------------------------------------
Job 'raysubmit_cuZjbMFdDHYDpnLm' submitted successfully
-------------------------------------------------------

Next steps
Query the logs of the job:
ray job logs raysubmit_cuZjbMFdDHYDpnLm
Query the status of the job:
ray job status raysubmit_cuZjbMFdDHYDpnLm
Request the job to be stopped:
ray job stop raysubmit_cuZjbMFdDHYDpnLm

Tailing logs until the job exits (disable with --no-wait):
2023-04-06 08:37:48,986 INFO worker.py:1242 -- Using address 172.1.182.92:6379 set in the environment variable RAY_ADDRESS
2023-04-06 08:37:48,986 INFO worker.py:1364 -- Connecting to existing Ray cluster at address: 172.1.182.92:6379...
2023-04-06 08:37:48,994 INFO worker.py:1544 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265
hello world

------------------------------------------
Job 'raysubmit_cuZjbMFdDHYDpnLm' succeeded
------------------------------------------

Ray Dashboard : This is running in the node where the Operator was installed. Let us bring up the browser to check the status. To install chrome in your box..

$ mkdir browser: cd browser
$ git pull https://github.com/rangapv/ichrome.git
$ ls

ichrome.sh
$ ./ichrome.sh

This will install chrome and you can open the browser by typing $HOME/chrome

Let us navigate to http://127.0.0.1:8265 link . This will bring up the Ray Dashboard as shown below. You can check the Jobs submitted and their status as shown in the various screen shots shown.

More ….

To Stop the Ray process

$ ray stop
Stopped all 7 Ray processes.

If you have any issues, you can open an issue in the github repo page or alternatively contact me at rangapv@yahoo.com and you can also find me on Twitter @rangapv

--

--

Rangaswamy P V

Works on Devops in Startups, reach me @rangapv on twitter or email: rangapv@gmail.com