AWS Auto Scaling Group – CodeDeploy Challenges
First here is my setup
- A single development / test server in the AWS cloud, backed by a separate Git Repository.
- WHen code is completed in the development environment it is commited to the development branch (using whichever branching scheme best fits the project)
- At the same time the code is merged to the test branch, and the code is available for client testing on the ‘test_stage’ site if they would like
- Then on an as needed basis the code in in the test branch (on the test_stage server) is deployed to AWS using their CodeDeploy api
- git archive test -> deploy.zip
- upload the file an S3 bucket (s3cmd)
- register the zip file as a revision using the AWS Register Revision API call
- This creates a file that can be deployed to any deployment group
- I setup two groups in my AWS account , test and live.
- When the client is ready, I run a script which deploys thes the ziped up revision to the Test server, where they are able to look atit and approve.
- Then I use the same method but move it instead of the www deployment group.
(The complexities of setting this up are deeper than I am going in this article, but for future prospects, all of this programming knowledges is stored in our deploy.php file)
A couple of tricks “they” dont tell you.
- Errors can be difficult to debug – if you update your code deployment to do more verbose logging it can help you to determine what some of the errors were.
- update /etc/codedeploy-agent/conf/codedeployment.yml, set verbose to yes.
- restart the service /etc/init.d/code-deployment restart (it can take several minutes to restart, this is normal)
- tail the log files to watch a deployment in real time, or investigate it after the fact (tail /var/log/aws/codedeploy-agent)
- Deploying a Revision to servers while they may be going through some termination instability, may likely cause your deployment to fail when one of you servers terminates.
- To prevent this, update the deployment autoscaling plan to have a minim and a maximum of the server, and do not take it under load during the 10 – 15 minutes (up to 2 hours) issues will cause errors
- Depending on the load on your servers, your deployment could take a lot of cpu and could generate an autoscaling alert and could spin up new tasks or send you an email. There is not a correct way to deal with this, however it is a good idea to know about it before you deploy.
- Finally the item that I wrote this because of, it appears that when you attempt to deploy a revision to an autoscaling group, it can cause some failures.
- The obvious one is that the deployment will fail if it is attempted while the server is shutting down
- However, it seems that if you have decided to upgrade your AMI, and your Launch Configuration, that a deployment will fail. And for me, it actually caused a key failure to login as well (this could have been because of multiple server terminations and then another server took over the IPs within a few minutes) Anyway, much caution about these things.
Well, the problem was actually that the by ‘afterinstall.sh’ script, was cleaning up the /opt/codedeployment/ directory (so we didn’t run out of space after a couple dozen deployments), but I was also removing the appspec.yml file.
So I updated the command that runs in the afterinstall to be
/usr/bin/find /opt/codedeploy-agent/deployment-root/ -mindepth 2 -mtime +1 -not -path '*deployment-instruction*' -delete