-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: on demand docker environments #1796
base: master
Are you sure you want to change the base?
feature: on demand docker environments #1796
Conversation
…anges to conda_environment
|
…e images inside flows.
get_pinned_conda_libs, | ||
) | ||
|
||
BAKERY_METAFILE = ".imagebakery-cache" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the imagebakery-cache
can also be made an optimization detail within docker_environment.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, can you help me understand the contents of this cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved caching to the environment side. what is in the cache at the moment is
- a unique hash derived from the requested environment (python, packages, image-type etc.)
- the image tag that was returned from the bakery endpoint for the request
- we also persist the request payload that was sent to the bakery in order to be able to list details of cached images via
docker cache list
import hashlib | ||
import json | ||
import requests | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something of this sorts work here -
import json | |
import requests | |
class FastBakery(object): | |
def __init__(self, url): | |
self.url = url | |
def bake( | |
self, | |
python_ver=None, # TODO - Ensure the keys are same in the API spec and client | |
pypi_packages=None, | |
conda_packages=None, | |
base_image=None, | |
image_kind="esgz-zstd", | |
): | |
deep_clean = (lambda f: lambda d: f(f, d))( | |
lambda self, item: { | |
k: v | |
for k, v in ( | |
(k, self(self, v)) if isinstance(v, dict) else (k, v) | |
for k, v in item.items() | |
) | |
if v is not None and v != {} | |
} | |
) | |
payload = deep_clean( | |
{ | |
"pythonVersion": python_ver, | |
"imageKind": image_kind, | |
"pipRequirements": pypi_packages, | |
"condaMatchspecs": conda_packages, | |
"baseImage": {"imageReference": base_image}, | |
} | |
) | |
return self.aws_iam_invoker(payload) | |
def aws_iam_invoker(self, payload): | |
try: | |
# AWS_IAM requires a signed request to be made | |
headers = {"Content-Type": "application/json"} | |
payload = json.dumps(payload) | |
import boto3 | |
from botocore.auth import SigV4Auth | |
from botocore.awsrequest import AWSRequest | |
from botocore.exceptions import BotoCoreError, NoCredentialsError | |
import urllib | |
session = boto3.Session() | |
credentials = session.get_credentials().get_frozen_credentials() | |
# credits to https://github.com/boto/botocore/issues/1784#issuecomment-659132830, | |
# We need to jump through some hoops when calling the endpoint with IAM auth | |
# as botocore does not offer a direct utility for signing arbitrary requests | |
req = AWSRequest("POST", self.url, headers, payload) | |
SigV4Auth( | |
credentials, service_name="lambda", region_name=session.region_name | |
).add_auth(req) | |
response = requests.post(req.url, data=req.body, headers=req.headers) | |
# Manually check the status code and raise an exception if it's an error | |
if response.status_code >= 400: | |
raise requests.exceptions.HTTPError( | |
f"HTTP {response.status_code} Error: {response.reason}\nResponse Body: {response.text}", | |
response=response | |
) | |
return response | |
except requests.exceptions.HTTPError as e: | |
raise e | |
except (requests.exceptions.RequestException, BotoCoreError, NoCredentialsError) as e: | |
# Handle specific errors related to requests and AWS services | |
raise Exception(f"Failed to invoke AWS service due to network or AWS error: {e}") | |
except Exception as e: | |
# Handle other possible exceptions | |
raise Exception(f"An unexpected error occurred: {e}") |
self.bake_image_for_step(step) | ||
echo("Environments are ready!") | ||
if self.steps_to_delegate: | ||
# TODO: add debug echo to output steps that required a conda environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these comments too
self.delegate.validate_environment(echo, self.datastore_type) | ||
self.delegate.init_environment(echo, self.steps_to_delegate) | ||
|
||
def bake_image_for_step(self, step): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple steps might share the same image - maybe we can check if that is the case and only bake unique images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the images are baked in sequence so the consecutive steps with identical packages should be hitting the cache at this point, skipping the baking. previously the caching was happening inside the fast_bakery, but moved to the environment side now so it should be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than baking in sequence, we can perhaps bake them in parallel - that should save precious seconds
metaflow/plugins/aws/batch/batch.py
Outdated
@@ -23,6 +24,9 @@ | |||
DEFAULT_SECRETS_BACKEND_TYPE, | |||
AWS_SECRETS_MANAGER_DEFAULT_REGION, | |||
S3_SERVER_SIDE_ENCRYPTION, | |||
FAST_BAKERY_URL, | |||
FAST_BAKERY_AUTH, | |||
FAST_BAKERY_TYPE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can avoid introducing these as env vars here and in @kubernetes with the expectation that using fast bakery may involve using @secrets
or @environment
- to configure the URL and auth. The client can try to read from env vars unless explicitly configured with the URL and auth?
# a conda decorator always exists alongside pypi so this needs to be accounted for | ||
dependency_deco = pypi_deco if pypi_deco is not None else conda_deco | ||
if dependency_deco is not None: | ||
python = dependency_deco.attributes["python"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python comes from conda_deco
Adds support for on demand Docker images. Open for discussion
open todo's
conda
environment when executing locally with--environment docker