A quick disclaimer: visualneurons.com, together with this post, is a 2-man work, adding up the efforts of Francesco Pochetti and Gabriele Lanaro. The code is available here on Github.
First things first. Before diving into the implementation, let’s see how the web application looks like when in action. You can check it out yourself at visualneurons.com and below you can find a quick demo. The user chooses an artwork from the drop-down menu, uploads a personal picture, and then lets the magic happen. The Deep Learning model, under the curtains, blends the style of the piece of art with the content of the photo, creating a brand new image.
Neural Style Transfer was first introduced by Gatys et al in a famous 2015 paper. Researchers addressed the following question: given a picture, how would it look like, had it been painted by Van Gogh? The Dutch master is just an example, of course. The core idea was obviously not constrained by specific artists, and consisted in picking two images, a style one and a content one, and teaching a neural network to paint the content of the second with the style of the first. The trick behind the technique is nothing else than choosing the right loss function. As usual. The MSE loss for the content side, the Gram loss for the style side. Sum it up and minimize with respect to the pixel values of the new image. Strictly speaking, there is no model, really. No parameters to optimize for future inference. Just get a pre-trained CNN (VGG19, in our case) and use it as a feature extractor. The loss is calculated directly on top of those features. Given those, the optimizer’s job is to iterate over style, content and new image, tuning the pixels of the blended picture to minimize the loss.
You get the point.
As the goal of this post is not to cover in detail the science behind Neural Style Transfer, we’ll cut it short here. If we managed to tickle your curiosity, you can find a more detailed explanation in this blog post.
Now that you know what we achieved and what is the science behind it, let’s focus on how we got there. This section is a deep dive into the technical implementation of our web application, what we used, why we did what we did and how we debugged the pipeline as it grew in complexity.
Here a diagram of the architecture we ended up implementing.
Everything starts with users hitting visualneurons.com. Amazon Route 53 is in charge of resolving the domain name and redirecting traffic to the S3 bucket we used to host our static website.
Run Style Transfer
. This action triggers a POST request to API Gateway.Running
, Lambda returns the EC2’s public DNS to the frontend, which needs it to fire a javascript websocket and connect it to the backend.The p2.xlarge
EC2 instance is set to auto-terminate after either 7 minutes from its spin-up, or model-run completion. Whatever happens first.
Here a deep dive around the communication between client and server websockets. In the upcoming sections the logic will be explained in greater detail.
Style transfer (at least in its original form) is an iterative process as it requires minimizing the combined content loss and style loss. The procedure usually takes several iterations to obtain visible results.
To have a more interesting and pleasing effect, we need to stream images in real time to the browser and to do that, it’s convenient to open a bidirectional channel of communication between the browser and the server
Enter WebSockets. Through a websocket we can open a channel, talk to the server, and let the server talk to the browser directly. For WebSockets to work, we need to implement a Client in the browser and a Server in a powerful machine capable of running style transfer (let’s assume that the machine is already running, we’ll describe how it is activated more in detail further below). The client and server can send each other messages (typically as JSON strings, but you are free to use the protocol that you prefer).
To its core, a very bare-bone client would be something like the one illustrated in the following Javascript snippet. We specify a new websocket connection, and we instruct the client to send a message as soon as the connection is established (the open
callback), and we provide a callback to execute when a message is received.
ws = new WebSocket(url);
ws.open = function (event) { ws.send({"my message": "whatever"}); };
ws.onmessage = function (event) { /* do something with the message received */};
ws.close = function (event) { /* close */ };
On the other side of the wire (in Python) we have the server that is implemented in an equivalent way, but in Python and using the networking framework tornado.
To implement our style transfer application, we designed the following workflow:
The workflow can be better explained using the timeline drawn in the following diagram, that illustrates how the information flows back and forth between the client and the server.
An in-depth description of the implementation is beyond the scope of this post (the full code is available in the github repo). However it’s worth briefly mentioning the design of the classes involved
The implementation was done by implementing three classes:
Now that we have an idea about the components and how the interact, what is left to do is deploying the client and server and make them talk!
To serve the client’s code all is needed is a simple frontend server capable of serving static content. S3 makes this kind of operation extremely easy. We just create a bucket, drop our index.html, styles.css and scripts.js into it and we are ready to go.
When it comes to redirecting traffic to the bucket, it is as simple as buying a domain on Amazon Route 53 and pointing it to S3. An in-depth tutorial on how to achieve this is available here and here.
Something worth mentioning when setting up the website, and the client websocket which comes with it, is a quick note on security. For obvious reasons, whenever building a web application, it is highly recommended to buy a SSL certificate and secure the domain with https. Running things through this protocol is good, as any communication between a browser and a server is encrypted. This also means, though, that we have to do some extra work to decrypt messages whenever either the browser or the server needs to do something useful with those messages. This is the case of our application, which is based on JSON-formatted strings sent back and forth across the web.As we were running the dev version of the app directly via S3 (e.g. via https), what we noticed was that the python server websocket would not understand the initial connection message from the javascript client websocket. Given the additional complexity of dealing with this issue, we decided to compromise on security and stick to the http protocol instead. This is why our application runs on http://visualneurons.com/ and not on https://visualneurons.com/. This also means the URL the javascript websockets needs to connect to is of the form "ws://" + EC2_public_dns + ":8000/styletransfer"
and not "wss://" + EC2_public_dns + ":8000/styletransfer"
. Small details making a whole lot of a difference.
Since we didn’t want to keep an expensive GPU instance running 24/7 for our demo, we designed a way to spin up EC2 systems on demand using API Gateway and Lambda. The code that spins up the instance and return its full address, relies on the amazon boto library for Python. For reference, the full code of the function can be found at this link. In this section we illustrate the step necessary to configure and run the backend.
To prepare our base image, we started from one of the amazon deep learning AMIs and customized it to our needs (we only needed to run styletransfer once so that the model files for VGG are already cached on the server). We then created our own AMI using EC2 so that the instance is already set up at start time.
The port we listen on the websocket server (8000) has to be open. This is easily achieved using Security Groups.
Lambda will be responsible to create an EC2 instance using the run_instances
method, and will need to provide the AMI_ID
(the id of our newly created AMI), as well as our security group. It’s also necessary to provide an initialization script (UserData
argument), which we will discuss more in the following section. Below is an example of the run_instances
call.
instance = ec2_object.run_instances(
ImageId=AMI_ID,
InstanceType=INSTANCE_TYPE,
MinCount=1,
MaxCount=1,
KeyName=KEY_NAME,
SecurityGroups=[
SECURITY_GROUP,
],
# make shutdown in script terminate ec2
InstanceInitiatedShutdownBehavior='terminate',
UserData=script # file to run on instance init.
)
Once the instance is spun up, we need to execute an initialization script (code below) that:
Note that the init script is run as root, while we wanted to run our websocket server as a normal user (this was also necessary to properly load our settings). To do that, we used sudo
to invoke a bash
process and passed the initalization script for the user between two EOF
delimiter. Troubleshooting this issue was quite a pain, and we used the init script logfiles that are located in /var/log/cloud-init-output.log
#!/bin/bash
cd /home/ubuntu
cd ml-prototypes
git pull
shutdown -h +7
# Necessary to being able to load all the necessary environment
sudo -i -u ubuntu bash <<-EOF
source ~/.bashrc
source activate tensorflow_p36
export PYTHONPATH=/home/ubuntu/ml-prototypes
python -m prototypes.styletransfer.app --address=0.0.0.0 --port=8000
EOF
shutdown -h now
A better practice would have been to use something like supervisor to manage, starting, stopping and logging of the websocket server. But we opted for simplicity.
CORS stands from Cross Origin Resource Sharing. This is a mechanism getting triggered as soon as two resources sitting on different domains try sending data to each other. The two resources are the frontend, i.e. the website sitting on http://visualneurons.com
, and the backend API Gateway, sitting on https://myapi.execute-api.eu-west-1.amazonaws.com
. When the API returns the output of Lambda to the frontend, the browser checks whether this communication is allowed by looking at the headers of the resource being sent. Specifically, the header in question is 'Access-Control-Allow-Origin': '*'
. If this is missing, the browser blocks the incoming JSON. When working within the AWS ecosystem and dealing with web development, make sure you check the following two boxes to avoid suffering from some nasty headaches:
format_response
is for, here.While we were quite happy with the overall results, we did face some issues while building the application and we have idea for further improvements.
We experienced starting times of 3-5 minutes for the EC2 instances, which translate having to wait a long time before seeing any image on the screen. Having an instance running 24/7 would easily solve this issue ($$$). Another solution would be to implement a proper job runner that would automatically reuse existing instances instead of spinning up a new one every time.
Also, as VGG-19 is very memory intensive, we could also have tried using a smaller model or a more efficient style transfer implementation.
Overall we decided to live with these issues for the purpose of this demo and showcase a cost-effective productionalization of a machine learning application.