AWS Auto Scaling Group – CodeDeploy Challenges

AWS Auto Scaling Group – CodeDeploy Challenges

First here is my setup

  • A single development / test server in the AWS cloud,     backed by a separate Git Repository.
  • WHen code is completed in the development  environment it is commited to the development branch (using whichever branching scheme best fits the  project)
  • At the same time the code is merged to the test branch,  and the code is available for client testing on the ‘test_stage’ site if they would like
  • Then on an  as needed basis the code in in the test branch (on the test_stage server) is deployed to AWS using their CodeDeploy api
    • git archive test -> deploy.zip
    • upload the file an S3 bucket (s3cmd)
    • register the zip file as a revision using the AWS  Register Revision API call
  • This creates a file that can be deployed to any deployment group
  • I setup two groups in my AWS account , test and live.
  • When the client is ready,  I run a script which deploys thes the ziped up revision to the Test server,  where they are able to look atit and approve.
  • Then I use the same method but move it instead of the www deployment group.

(The complexities of setting this up are deeper than I am going in this article,   but for future prospects,  all of this programming knowledges is stored in our deploy.php  file)

A couple of tricks “they” dont tell you.

  •  Errors can be difficult to debug – if you update your code deployment to do more verbose logging it can help you to determine what some of the errors were.
    • update /etc/codedeploy-agent/conf/codedeployment.yml,   set verbose to yes.
    • restart the service /etc/init.d/code-deployment  restart (it can take several minutes to restart,  this is normal)
    • tail the log files to watch a deployment in real time,  or investigate it after the fact (tail /var/log/aws/codedeploy-agent)
  • Deploying a Revision to servers while they may be going through some termination instability,  may likely cause your deployment to fail when one of you servers terminates.
    • To prevent this,   update the deployment autoscaling plan to have a minim and a maximum of the server,  and do not take it under load during the 10 – 15 minutes (up to 2 hours) issues will cause errors
    • Depending on the load on your servers,  your deployment could take a lot of cpu and could generate an autoscaling alert and could spin up new tasks or send you an email.   There is not a correct way to deal with this,  however it is a good idea to know about it before you  deploy.
    • Finally the item that I wrote this because of,   it appears that when you attempt to deploy a revision to an autoscaling group,  it can cause some failures.
      • The obvious one is that the deployment will fail if it is attempted while the server is shutting down
      • However,  it seems that if you have decided to upgrade your AMI,  and your Launch Configuration,  that a deployment will fail.   And for me,  it actually caused a key failure to login as well (this could have been because of multiple  server terminations and then another server took over the IPs within a few minutes)  Anyway,   much caution about these things.

 

UPDATE:

Well,  the problem was actually that the by ‘afterinstall.sh’ script,  was cleaning up the /opt/codedeployment/ directory (so we didn’t run out of space after a couple dozen deployments),  but I was also removing the appspec.yml file.

 

So I updated the command that runs in the afterinstall to be

 /usr/bin/find /opt/codedeploy-agent/deployment-root/ -mindepth 2 -mtime +1 -not -path '*deployment-instruction*' -delete