Common Problems

How to diagnose and resolve common problems in Seldon Deploy.

Insufficient ephemeral storage in EKS clusters

When using eksctl, the volume size for each node will be of 20Gb by default. However, with large images this may not be enough. This is discussed at length on this thread in the eksctl repository.

When this happens, pods usually start to get evicted. If you run kubectl describe on any of these pods, you should be able to see errors about not enough ephemeral storage. You should also be able to see some DiskPressure events on the output of kubectl describe nodes.

To fix it, it should be enough to increase the available space. With eksctl, you can do so by tweaking the nodeGroups config and adding a volumeSize and volumeType keys. For instance, to change the volume to 100Gb you could do the following in your ClusterConfig spec:

kind: ClusterConfig


  - volumeSize: 100
    volumeType: gp2

Last modified April 22, 2020: Fix on hierarchy (92ba48c)