AWS Auto Scaling Group – CodeDeploy Challenges

AWS Auto Scaling Group – CodeDeploy Challenges

First here is my setup

  • A single development / test server in the AWS cloud,     backed by a separate Git Repository.
  • WHen code is completed in the development  environment it is commited to the development branch (using whichever branching scheme best fits the  project)
  • At the same time the code is merged to the test branch,  and the code is available for client testing on the ‘test_stage’ site if they would like
  • Then on an  as needed basis the code in in the test branch (on the test_stage server) is deployed to AWS using their CodeDeploy api
    • git archive test -> deploy.zip
    • upload the file an S3 bucket (s3cmd)
    • register the zip file as a revision using the AWS  Register Revision API call
  • This creates a file that can be deployed to any deployment group
  • I setup two groups in my AWS account , test and live.
  • When the client is ready,  I run a script which deploys thes the ziped up revision to the Test server,  where they are able to look atit and approve.
  • Then I use the same method but move it instead of the www deployment group.

(The complexities of setting this up are deeper than I am going in this article,   but for future prospects,  all of this programming knowledges is stored in our deploy.php  file)

A couple of tricks “they” dont tell you.

  •  Errors can be difficult to debug – if you update your code deployment to do more verbose logging it can help you to determine what some of the errors were.
    • update /etc/codedeploy-agent/conf/codedeployment.yml,   set verbose to yes.
    • restart the service /etc/init.d/code-deployment  restart (it can take several minutes to restart,  this is normal)
    • tail the log files to watch a deployment in real time,  or investigate it after the fact (tail /var/log/aws/codedeploy-agent)
  • Deploying a Revision to servers while they may be going through some termination instability,  may likely cause your deployment to fail when one of you servers terminates.
    • To prevent this,   update the deployment autoscaling plan to have a minim and a maximum of the server,  and do not take it under load during the 10 – 15 minutes (up to 2 hours) issues will cause errors
    • Depending on the load on your servers,  your deployment could take a lot of cpu and could generate an autoscaling alert and could spin up new tasks or send you an email.   There is not a correct way to deal with this,  however it is a good idea to know about it before you  deploy.
    • Finally the item that I wrote this because of,   it appears that when you attempt to deploy a revision to an autoscaling group,  it can cause some failures.
      • The obvious one is that the deployment will fail if it is attempted while the server is shutting down
      • However,  it seems that if you have decided to upgrade your AMI,  and your Launch Configuration,  that a deployment will fail.   And for me,  it actually caused a key failure to login as well (this could have been because of multiple  server terminations and then another server took over the IPs within a few minutes)  Anyway,   much caution about these things.

 

UPDATE:

Well,  the problem was actually that the by ‘afterinstall.sh’ script,  was cleaning up the /opt/codedeployment/ directory (so we didn’t run out of space after a couple dozen deployments),  but I was also removing the appspec.yml file.

 

So I updated the command that runs in the afterinstall to be

 /usr/bin/find /opt/codedeploy-agent/deployment-root/ -mindepth 2 -mtime +1 -not -path '*deployment-instruction*' -delete

Debugging CodeDeployment on AWS

Debugging CodeDeployment on AWS

This article is being written well after I have already installed the CodeDeployment daemon on an ubuntu server,   created an AMI out of it and set it up as an auto launch server from a Scaling Group.

I am documenting the process I went through to dig into the error a bit more, this helps to identify and remember where the logs files are and how to get additional information,  even if the issue is never the same again.

Issues with running the codedeployment showed up as a python error in the log

more /var/log/aws/codedeploy-agent/codedeployment-agent.log 
 put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{"error_code":5,"script_name":"","message":"not opened for reading","log":""}"

I decided this is not enough information to troubleshoot an error so I had to dig in and fina way to make it more verbose.   I found this file,   just update verbose from false to true

vi /etc/codedeploy-agent/conf/codedeployagent.yml

Then restart the codedeploy-agent

/etc/init.d/code-deployagent restart

This can take quite a while since it runs quite a bit of background installing and checking for duplicate processes. but once it is complete you can check that the process is running again.

ps ax|grep codedeploy

Once this is running in verbose mode,   monitor the log

tail -f /var/log/aws/codedeploy-agent/codedeployment-agent.log 

and re-run the deployment and view the results of the log,  the most useful thing for me was to grep for the folder that more specific error information was written to.

grep -i "Creating deployment" /var/log/aws/codedeploy-agent/codedeployment-agent.log

This showed me the folder that all of the code WAS going to be extracted to,   since there was an error the system actually dumped the contents of an error into a file called bundle.tar in the folder that it would have exported to.

cat /opt/codedeploy-agent/deployment-root/7ddce865-0611-45f0-bf74-459fcf806f23/d-YK4NWBJD7/bundle.tar

This returned an error from the S3 showing that the Code Deploy was having an error downloading from S3,  so I had to add access to the policy to download from S3 buckets as well

 

 

Call Now Button(208) 344-1115

SIGN UP TO
GET OUR 
FREE
 APP BLUEPRINT

Join our email list

and get your free whitepaper