Note

You are viewing the documentation for an older version of boto (boto2).

Boto3, the next version of Boto, is now stable and recommended for general use. It can be used side-by-side with Boto in the same project, so it is easy to start using Boto3 in your existing projects as well as new projects. Going forward, API updates and all new feature work will be focused on Boto3.

For more information, see the documentation for boto3.

Table Of Contents

Note

Boto3, the next version of Boto, is now stable and recommended for general use. It can be used side-by-side with Boto in the same project, so it is easy to start using Boto3 in your existing projects as well as new projects. Going forward, API updates and all new feature work will be focused on Boto3.

An integrated interface to current and future infrastructural services offered by Amazon Web Services.

Currently, all features work with Python 2.6 and 2.7. Work is under way to support Python 3.3+ in the same codebase. Modules are being ported one at a time with the help of the open source community, so please check below for compatibility with Python 3.3+.

To port a module to Python 3.3+, please view our Contributing Guidelines and the Porting Guide. If you would like, you can open an issue to let others know about your work in progress. Tests must pass on Python 2.6, 2.7, 3.3, and 3.4 for pull requests to be accepted.

Getting Started

If you’ve never used boto before, you should read the Getting Started with Boto guide to get familiar with boto & its usage.

Currently Supported Services

Release Notes

Bumped to 2.46.1

boto v2.46.1

date:2017/02/20

Fixes a bug where a recently added module was not added to setup.py

Changes

boto v2.45.0

date:2016/12/14

Add support for eu-west-2 region.

Changes

boto v2.44.0

date:2016/12/08

Adds support for ca-central-1 region and gs object-level storage class.

Changes

boto v2.43.0

date:2016/10/17

Adds support for us-east-2 endpoint.

Changes

boto v2.42.0

date:2016/07/19

Updates the Mechanical Turk API and fixes some bugs.

Changes

boto v2.41.0

date:2016/06/27

Update documentation and endpoints file.

Changes

boto v2.40.0

date:2016/04/28

Fixes several bugs.

Changes

  • ryansydnor-s3: Allow s3 bucket lifecycle policies with multiple transitions (commit c6d5af3)
  • Fixes upload parts for glacier (issue 3524, commit d1973a4)
  • pslawski-unicode-parse-qs: Move utility functions over to compat Add S3 integ test for non-ascii keys with sigv4 Fix quoting of tilde in S3 canonical_uri for sigv4 Parse unicode query string properly in Python 2 (issue 2844, commit 5092c6d)
  • ninchat-config-fix: Add __setstate__ to fix pickling test fail Add unit tests for config parsing Don’t access parser through __dict__ Config: Catch specific exceptions when wrapping ConfigParser methods Config: Don’t inherit from ConfigParser (issue 3474, commit c21aa54)

boto v2.39.0

date:2016/01/18

Add support for ap-northeast-2, update documentation, and fix several bugs.

Changes

boto v2.38.0

date:2015/04/09

This release adds support for Amazon Machine Learning and fixes a couple of issues.

Changes

boto v2.37.0

date:2015/04/02

This release updates AWS CloudTrail to the latest API to suppor the LookupEvents operation, adds new regional service endpoints and fixes bugs in several services.

Note

The CloudTrail create_trail operation no longer supports the deprecated trail parameter, which has been marked for removal by the service since early 2014. Instead, you pass each trail parameter as a keyword argument now. Please see the reference to help port over existing code.

Changes

boto v2.36.0

date:2015/01/27

This release adds support for AWS Key Management Service (KMS), AWS Lambda, AWS CodeDeploy, AWS Config, AWS CloudHSM, Amazon EC2 Container Service (ECS), Amazon DynamoDB online indexing, and fixes a few issues.

Changes

boto v2.35.2

date:2015/01/19

This release adds ClassicLink support for Auto Scaling and fixes a few issues.

Changes

boto v2.35.1

date:2015/01/09

This release fixes a regression which results in an infinite while loop of requests if you query an empty Amazon DynamoDB table.

Changes

boto v2.35.0

date:2015/01/08

This release adds support for Amazon EC2 Classic Link which allows users to link classic instances to Classic Link enabled VPCs, adds support for Amazon CloudSearch Domain, adds sigv4 support for Elastic Load Balancing, and fixes several other issues including issues making anonymous AWS Security Token Service requests.

Changes

boto v2.34.0

date:2014/10/23

This release adds region support for eu-central-1 , support to create virtual mfa devices for Identity and Access Management, and fixes several sigv4 issues.

Changes

boto v2.33.0

date:2014/10/08

This release adds support for Amazon Route 53 Domains, Amazon Cognito Identity, Amazon Cognito Sync, the DynamoDB document model feature, and fixes several issues.

Changes

boto v2.32.1

date:2014/08/04

This release fixes an incorrect Amazon VPC peering connection call, and fixes several minor issues related to Python 3 support including a regression when pickling authentication information.

Fixes

boto v2.32.0

date:2014/07/30

This release includes backward-compatible support for Python 3.3 and 3.4, support for IPv6, Amazon VPC connection peering, Amazon SNS message attributes, new regions for Amazon Kinesis, and several fixes.

Python 3 Support

Features

Fixes

boto v2.31.1

date:2014/07/10

This release fixes an installation bug in the 2.31.0 release.

boto v2.31.0

date:2014/07/10

This release adds support for Amazon CloudWatch Logs.

Changes

boto v2.30.0

date:2014/07/01

This release adds new Amazon EC2 instance types, new regions for AWS CloudTrail and Amazon Kinesis, Amazon S3 presigning using signature version 4, and several documentation and bugfixes.

Changes

boto v2.29.1

date:2014/05/30

This release fixes a critical bug when the provider is not set to aws, e.g. for Google Storage. It also fixes a problem with connection pooling in Amazon CloudSearch.

Changes

boto v2.29.0

date:2014/05/29

This release adds support for the AWS shared credentials file, adds support for Amazon Elastic Block Store (EBS) encryption, and contains a handful of fixes for Amazon EC2, AWS CloudFormation, AWS CloudWatch, AWS CloudTrail, Amazon DynamoDB and Amazon Relational Database Service (RDS). It also includes fixes for Python wheel support.

A bug has been fixed such that a new exception is thrown when a profile name is explicitly passed either via code (profile="foo") or an environment variable (AWS_PROFILE=foo) and that profile does not exist in any configuration file. Previously this was silently ignored, and the default credentials would be used without informing the user.

Changes

boto v2.28.0

date:2014/05/08

This release adds support for Amazon SQS message attributes, Amazon DynamoDB query filters and enhanced conditional operators, adds support for the new Amazon CloudSearch 2013-01-01 API and includes various features and fixes for Amazon Route 53, Amazon EC2, Amazon Elastic Beanstalk, Amazon Glacier, AWS Identity and Access Management (IAM), Amazon S3, Mechanical Turk and MWS.

Changes

boto v2.27.0

date:2014/03/06

This release adds support for configuring access logs on Elastic Load Balancing (including what Amazon Simple Storage Service (S3) bucket to use & how frequently logs should be added to the bucket), adds request hook documentation & a host of doc updates/bugfixes.

Changes

boto v2.26.1

date:2014/03/03

This release fixes an issue with the newly-added boto.rds2 module when trying to use boto.connect_rds2. Parameters were not being passed correctly, which would cause an immediate error.

Changes

boto v2.26.0

date:2014/02/27

This release adds support for MFA tokens in the AWS STS assume_role & the introduction of the boto.rds2 module (which has full support for the entire RDS API). It also includes the addition of request hooks & many bugfixes.

Changes

boto v2.25.0

date:2014/02/07

This release includes Amazon Route53 service and documentation updates, preliminary log file support for Amazon Relational Database Service (RDS), as well as various other small fixes. Also included is an opt-in to use signature version 4 with Amazon EC2.

IMPORTANT - This release also include a SIGNIFICANT underlying change to the Amazon S3 get_bucket method, to addresses the blog post by AppNeta. We’ve altered the default behavior to now perform a HEAD on the bucket, in place of the old GET behavior (which would fetch a zero-length list of keys).

This should reduce all users costs & should also be mostly backward-compatible. HOWEVER, if you were previously parsing the exception message from S3Connection.get_bucket, you will have to change your code (see the S3 tutorial for details). HEAD does not return as detailed of error messages & while we’ve attempted to patch over as much of the differences as we can, there may still be edge-cases over the prior behavior.

Features

Bugfixes

boto v2.24.0

date:2014/01/29

This release adds M3 instance types to Amazon EC2, adds support for dead letter queues to Amazon Simple Queue Service (SQS), adds a single JSON file for all region and endpoint information and provides several fixes to a handful of services and documentation. Additionally, the SDK now supports using AWS Signature Version 4 with Amazon S3.

Features

Bugfixes

boto v2.23.0

date:2014/01/10

This release adds new pagination & date range filtering to Amazon Glacier, more support for selecting specific attributes within Amazon DynamoDB, security tokens from environment/config variables & many bugfixes/small improvements.

Features

Bugfixes

boto v2.22.1

date:2014/01/06

This release fixes working with keys with special characters in them while using Signature V4 with Amazon Simple Storage Service (S3). It also fixes a regression in the ResultSet object, re-adding the nextToken attribute. This was most visible from within Amazon Elastic Compute Cloud (EC2) when calling the get_spot_price_history method.

Users in the cn-north-1 region or who make active use of get_spot_price_history are recommended to upgrade.

Bugfixes

boto v2.22.0

date:2014/01/02

This release updates the Auto Scaling to support the latest API, the ability to control the response sizes in Amazon DynamoDB queries/scans & a number of bugfixes as well.

Features

Bugfixes

boto v2.21.2

date:2013/12/24

This release is a bugfix release which corrects one more bug in the Mechanical Turk objects.

Bugfixes

boto v2.21.1

date:2013/12/23

This release is a bugfix release which corrects how the Mechanical Turk objects work & a threading issue when using datetime.strptime.

Bugfixes

boto v2.21.0

date:2013/12/19

This release adds support for the latest AWS OpsWorks, AWS Elastic Beanstalk, Amazon DynamoDB, Amazon Elastic MapReduce (EMR), Amazon Simple Storage Service (S3), Amazon Elastic Transcoder, AWS CloudTrail, and AWS Support APIs. It also includes documentation and other fixes.

Note

Although Boto now includes support for the newly announced China (Beijing) Region, the service endpoints will not be accessible until the Region’s limited preview is launched in early 2014. To find out more about the new Region and request a limited preview account, please visit http://www.amazonaws.cn/.

Features

Bugfixes

boto v2.20.1

date:2013/12/13

This release fixes an important Amazon EC2 bug related to fetching security credentials via the meta-data service. It is recommended that users of boto-2.20.0 upgrade to boto-2.20.1.

Bugfixes

boto v2.20.0

date:2013/12/12

This release adds support for Amazon Kinesis and AWS Direct Connect. Amazon EC2 gets support for new i2 instance types and is more resilient against metadata failures, Amazon DynamoDB gets support for global secondary indexes and Amazon Relational Database Service (RDS) supports new DBInstance and DBSnapshot attributes. There are several other fixes for various services, including updated support for CloudStack and Eucalyptus.

Features

Bugfixes

boto v2.19.0

date:2013/11/27

This release adds support for max result limits for Amazon EC2 calls, adds support for Amazon RDS database snapshot copies and fixes links to the changelog.

Features

Bugfixes

boto v2.18.0

date:2013/11/22

This release adds support for new AWS Identity and Access Management (IAM), AWS Security Token Service (STS), Elastic Load Balancing (ELB), Amazon Elastic Compute Cloud (EC2), Amazon Relational Database Service (RDS), and Amazon Elastic Transcoder APIs and parameters. Amazon Redshift SNS notifications are now supported. CloudWatch is updated to use signature version four, issues encoding HTTP headers are fixed and several services received documentation fixes.

Features

Bugfixes

boto v2.17.0

date:2013/11/14

This release adds support for the new AWS CloudTrail service, support for Amazon Redshift’s new features related encryption, audit logging, data load from external hosts, WLM configuration, database distribution styles and functions, as well as cross region snapshot copying.

Features

Bugfixes

  • Add missing argument for Google Storage resumable uploads. (commit b777b62)

boto v2.16.0

date:2013/11/08

This release adds new Amazon Elastic MapReduce functionality, provides updates and fixes for Amazon EC2, Amazon VPC, Amazon DynamoDB, Amazon SQS, Amazon Elastic MapReduce, and documentation updates for several services.

Features

BugFixes

boto v2.15.0

date:2013/10/17

This release adds support for Amazon Elastic Transcoder audio transcoding, new regions for Amazon Simple Storage Service (S3), Amazon Glacier, and Amazon Redshift as well as new parameters in Amazon Simple Queue Service (SQS), Amazon Elastic Compute Cloud (EC2), and the lss3 utility. Also included are documentation updates and fixes for S3, Amazon DynamoDB, Amazon Simple Workflow Service (SWF) and Amazon Marketplace Web Service (MWS).

Features

Bugfixes

boto v2.14.0

date:2013/10/09

This release makes s3put region-aware, adds some missing features to EC2 and SNS, enables EPUB documentation output, and makes the HTTP(S) connection pooling port-aware, which in turn enables connecting to e.g. mock services running on localhost. It also includes support for the latest EC2 and OpsWorks features, as well as several important bugfixes for EC2, DynamoDB, MWS, and Python 2.5 support.

Features

Bugfixes

boto v2.13.3

date:2013/09/16

This release fixes a packaging error with the previous version of boto. The version v2.13.2 was provided instead of 2.13.2, causing things like pip to incorrectly resolve the latest release.

That release was only available for several minutes & was removed from PyPI due to the way it would break installation for users.

boto v2.13.2

date:2013/09/16

This release is a bugfix-only release, correcting several problems in EC2 as well as S3, DynamoDB v2 & SWF.

Note

There was no v2.13.1 release made public. There was a packaging error that was discovered before it was published to PyPI.

We apologise for the fault in the releases. Those responsible have been sacked.

Bugfixes

boto v2.13.0

date:2013/09/12

This release adds support for VPC within AWS Opsworks, added dry-run support & the ability to modify reserved instances in EC2 as well as several important bugfixes for EC2, SNS & DynamoDBv2.

Features

  • Added support for VPC within Opsworks. (commit 56e1df3)
  • Added support for dry_run within EC2. (commit dd7774c)
  • Added support for modify_reserved_instances & describe_reserved_instances_modifications within EC2. (commit 7a08672)

Bugfixes

boto v2.12.0

date:2013/09/04

This release adds support for Redis & replication groups to Elasticache as well as several bug fixes.

Features

  • Added support for Redis & replication groups to Elasticache. (commit f744ff6)

Bugfixes

boto v2.11.0

date:2013/08/29

This release adds Public IP address support for VPCs created by EC2. It also makes the GovCloud region available for all services. Finally, this release also fixes a number of bugs.

Features

Bugfixes

boto v2.10.0

date:2013/08/13

This release adds Mobile Push Notification support to Amazon Simple Notification Service, better reporting for Amazon Redshift, SigV4 authorization for Amazon Elastic MapReduce & lots of bugfixes.

Features

  • Added support for Mobile Push Notifications to SNS. This enables you to send push notifications to mobile devices (such as iOS or Android) using SNS. (commit ccba574)
  • Added support for better reporting within Redshift. (commit 9d55dd3)
  • Switched Elastic MapReduce to use SigV4 for authorization. (commit b80aa48)

Bugfixes

boto v2.9.9

date:2013/07/24

This release updates Opsworks to add AMI & Chef 11 support, DBSubnetGroup support in RDS & many other bugfixes.

Features

Bugfixes

boto v2.9.8

date:2013/07/18

This release is adds new methods in AWS Security Token Service (STS), AWS CloudFormation, updates AWS Relational Database Service (RDS) & Google Storage. It also has several bugfixes & documentation improvements.

Features

Bugfixes

boto v2.9.7

date:2013/07/08

This release is primarily a bugfix release, but also inludes support for Elastic Transcoder updates (variable bit rate, max frame rate & watermark features).

Features

  • Added support for selecting specific attributes in DynamoDB v2. (issue 1567, commit d9e5c2)
  • Added support for variable bit rate, max frame rate & watermark features in Elastic Transcoder. (commit 3791c9)

Bugfixes

boto v2.9.6

date:2013/06/18

This release adds large payload support to Amazon SNS/SQS (from 32k to 256k bodies), several minor API additions, new regions for Redshift/Cloudsearch & a host of bugfixes.

Features

  • Added large body support to SNS/SQS. There’s nothing to change in your application code, but you can now send payloads of up to 256k in size. (commit b64947)
  • Added Vault.retrieve_inventory_job to Glacier. (issue 1532, commit 33de29)
  • Added Item.get(...) support to DynamoDB v2. (commit 938cb6)
  • Added the ap-northeast-1 region to Redshift. (commit d3eb61)
  • Added all the current regions to Cloudsearch. (issue 1465, commit 22b3b7)

Bugfixes

boto v2.9.5

date:2013/05/28

This release adds support for web identity federation within the Secure Token Service (STS) & fixes several bugs.

Features

  • Added support for web identity federation - You can now delegate token access via either an Oauth 2.0 or OpenID provider. (commit 9bd0a3)

Bugfixes

boto v2.9.4

date:2013/05/20

This release adds updated Elastic Transcoder support & fixes several bugs from recent releases & API updates.

Features

Bugfixes

  • Fixed a bug in the canonicalization of URLs on Windows. (commit 09ef8c)

  • Fixed glacier part size bug (issue 1478, commit 9e04171)

  • Fixed a bug in the bucket regex for S3 involving capital letters. (commit 950031)

  • Fixed a bug where timestamps from Cloudformation would fail to be parsed. (commit b40542)

  • Several documentation improvements/fixes:

boto v2.9.3

date:2013/05/15

This release adds ELB support to Opsworks, optimized EBS support in EC2 AutoScale, Parallel Scan support to DynamoDB v2, a higher-level interface to DynamoDB v2 and API updates to DataPipeline.

Features

  • ELB support in Opsworks - You can now attach & describe the Elastic Load Balancers within the Opsworks client. (commit ecda87)
  • Optimized EBS support in EC2 AutoScale - You can now specify whether an AutoScale instance should be optimized for EBS I/O. (commit f8acaa)
  • Parallel Scan support in DynamoDB v2 - If you have extra read capacity & a large amount of data, you can scan over the records in parallel by telling DynamoDB to split the table into segments, then spinning up threads/processes to each run over their own segment. (commit db7f7b & commit 7ed73c)
  • Higher-level interface to DynamoDB v2 - A more convenient API for using DynamoDB v2. The DynamoDB v2 Tutorial has more information on how to use the new API. (commit 0f7c8b)

Backward-Incompatible Changes

  • API Update for DataPipeline - The error_code (integer) argument to set_task_status changed to error_id (string). Many documentation updates were also added. (commit a78572)

Bugfixes

  • Bumped the AWS Support API version. (commit 0323f4)

  • Fixed the S3 ResumableDownloadHandler so that it no longer tries to use a hashing algorithm when used outside of GCS. (commit 29b046)

  • Fixed a bug where Sig V4 URIs were improperly canonicalized. (commit 5269d8)

  • Fixed a bug where Sig V4 ports were not included. (commit cfaba3)

  • Fixed a bug in CloudWatch’s build_put_params that would overwrite existing/necessary variables. (commit 550e00)

  • Several documentation improvements/fixes:

boto v2.9.2

date:2013/04/30

A hotfix release that adds the boto.support into setup.py.

Features

  • None.

Bugfixes

boto v2.9.1

date:2013/04/30

Primarily a bugfix release, this release also includes support for the new AWS Support API.

Features

  • AWS Support API - A client was added to support the new AWS Support API. It gives programmatic access to Support cases opened with AWS. A short example might look like:

    >>> from boto.support.layer1 import SupportConnection
    >>> conn = SupportConnection()
    >>> new_case = conn.create_case(
    ...     subject='Description of the issue',
    ...     service_code='amazon-cloudsearch',
    ...     category_code='performance',
    ...     communication_body="We're seeing some latency from one of our...",
    ...     severity_code='low'
    ... )
    >>> new_case['caseId']
    u'case-...'
    

    The Support Tutorial has more information on how to use the new API. (commit 8c0451)

Bugfixes

  • The reintroduction of ResumableUploadHandler.get_upload_id that was accidentally removed in a previous commit. (commit 758322)

  • Added OrdinaryCallingFormat to support Google Storage’s certificate verification. (commit 4ca83b)

  • Added the eu-west-1 region for Redshift. (commit e98b95)

  • Added support for overriding the port any connection in boto uses. (commit 08e893)

  • Added retry/checksumming support to the DynamoDB v2 client. (commit 969ae2)

  • Several documentation improvements/fixes:

boto v2.9.0

The 2.9.0 release of boto is now available on PyPI.

You can get a comprehensive list of all commits made between the 2.8.0 release and the 2.9.0 release at https://github.com/boto/boto/compare/2.8.0...2.9.0.

This release includes:

  • Support for Amazon Redshift
  • Support for Amazon DynamoDB’s new API
  • Support for AWS Opsworks
  • Add copy_image to EC2 (AMI copy)
  • Add describe_account_attributes and describe_vpc_attribute, and modify_vpc_attribute operations to EC2.

There were 240 commits made by 34 different authors:

  • g2harris
  • Michael Barrett
  • Pascal Hakim
  • James Saryerwinnie
  • Mitch Garnaat
  • ChangMin Jeon
  • Mike Schwartz
  • Jeremy Katz
  • Alex Schoof
  • reinhillmann
  • Travis Hobrla
  • Zach Wilt
  • Daniel Lindsley
  • ksacry
  • Michael Wirth
  • Eric Smalling
  • pingwin
  • Chris Moyer
  • Olivier Hervieu
  • Iuri de Silvio
  • Joe Sondow
  • Max Noel
  • Nate
  • Chris Moyer
  • Lars Otten
  • Nathan Grigg
  • Rein Hillmann
  • Øyvind Saltvik
  • Rayson HO
  • Martin Matusiak
  • Royce Remer
  • Jeff Terrace
  • Yaniv Ovadia
  • Eduardo S. Klein

boto v2.8.0

The 2.8.0 release of boto is now available on PyPI.

You can get a comprehensive list of all commits made between the 2.7.0 release and the 2.8.0 release at https://github.com/boto/boto/compare/2.7.0...2.8.0.

This release includes:

  • Added support for Amazon Elasticache
  • Added support for Amazon Elastic Transcoding Service

As well as numerous bug fixes and improvements.

Commits

There were 115 commits in this release from 21 different authors. The authors are listed below, in alphabetical order:

  • conorbranagan
  • dkavanagh
  • gaige
  • garnaat
  • halfaleague
  • jamesls
  • jjhooper
  • jordansissel
  • jterrace
  • Kodiologist
  • kopertop
  • mfschwartz
  • nathan11g
  • pasc
  • phobologic
  • schworer
  • seandst
  • SirAlvarex
  • Yaniv Ovadia
  • yig
  • yovadia12

boto v2.7.0

The 2.7.0 release of boto is now available on PyPI.

You can get a comprehensive list of all commits made between the 2.6.0 release and the 2.7.0 release at https://github.com/boto/boto/compare/2.6.0...2.7.0.

This release includes:

  • Added support for AWS Data Pipeline - commit 999902
  • Integrated Slick53 into Route53 module - issue 1186
  • Add ability to use Decimal for DynamoDB numeric types - issue 1183
  • Query/Scan Count/ScannedCount support and TableGenerator improvements - issue 1181
  • Added support for keyring in config files - issue 1157
  • Add concurrent downloader to glacier - issue 1106
  • Add support for tagged RDS DBInstances - issue 1050
  • Updating RDS API Version to 2012-09-17 - issue 1033
  • Added support for provisioned IOPS for RDS - issue 1028
  • Add ability to set SQS Notifications in Mechanical Turk - issue 1018

Commits

There were 447 commits in this release from 60 different authors. The authors are listed below, in alphabetical order:

  • acrefoot
  • Alex Schoof
  • Andy Davidoff
  • anoopj
  • Benoit Dubertret
  • bobveznat
  • dahlia
  • dangra
  • disruptek
  • dmcritchie
  • emtrane
  • focus
  • fsouza
  • g2harris
  • garnaat
  • georgegoh
  • georgesequeira
  • GitsMcGee
  • glance-
  • gtaylor
  • hashbackup
  • hinnerk
  • hoov
  • isaacbowen
  • jamesls
  • JerryKwan
  • jimfulton
  • jimbrowne
  • jorourke
  • jterrace
  • jtriley
  • katzj
  • kennu
  • kevinburke
  • khagler
  • Kodiologist
  • kopertop
  • kotnik
  • Leftium
  • lpetc
  • marknca
  • matthewandrews
  • mfschwartz
  • mikek
  • mkmt
  • mleonhard
  • mraposa
  • oozie
  • phunter
  • potix2
  • Rafael Cunha de Almeida
  • reinhillmann
  • reversefold
  • Robie Basak
  • seandst
  • siroken3
  • staer
  • tpodowd
  • vladimir-sol
  • yovadia12

boto v2.6.0

The 2.6.0 release of boto is now available on PyPI.

You can get a comprehensive list of all commits made between the 2.5.2 release and the 2.6.0 release at https://github.com/boto/boto/compare/2.5.2...2.6.0.

This release includes:

  • Support for Amazon Glacier
  • Support for AWS Elastic Beanstalk
  • CORS support for Amazon S3
  • Support for Reserved Instances Resale in Amazon EC2
  • Support for IAM Roles

SSL Certificate Verification

In addition, this release of boto changes the default behavior with respect to SSL certificate verification. Our friends at Google contributed code to boto well over a year ago that implemented SSL certificate verification. At the time, we felt the most prudent course of action was to make this feature an opt-in but we always felt that at some time in the future we would enable cert verification as the default behavior. Well, that time is now!

However, in implementing this change, we came across a bug in Python for all versions prior to 2.7.3 (see http://bugs.python.org/issue13034 for details). The net result of this bug is that Python is able to check only the commonName in the SSL cert for verification purposes. Any subjectAltNames are ignored in large SSL keys. So, in addition to enabling verification as the default behavior we also changed some of the service endpoints in boto to match the commonName in the SSL certificate.

If you want to disable verification for any reason (not advised, btw) you can still do so by editing your boto config file (see https://gist.github.com/3762068) or you can override it by passing validate_certs=False to the Connection class constructor or the connect_* function.

Commits

There were 440 commits in this release from 53 different authors. The authors are listed below, in alphabetical order:

  • acorley
  • acrefoot
  • aedeph
  • allardhoeve
  • almost
  • awatts
  • buzztroll
  • cadams
  • cbednarski
  • cosmin
  • dangra
  • darjus-amzn
  • disruptek
  • djw
  • garnaat
  • gertjanol
  • gimbel0893
  • gochist
  • graphaelli
  • gtaylor
  • gz
  • hardys
  • jamesls
  • jijojv
  • jimbrowne
  • jtlebigot
  • jtriley
  • kopertop
  • kotnik
  • marknca
  • mark_nunnikhoven
  • mfschwartz
  • moliware
  • NeilW
  • nkvoll
  • nsitarz
  • ohe
  • pasieronen
  • patricklucas
  • pfig
  • rajivnavada
  • reversefold
  • robie
  • scott
  • shawnps
  • smoser
  • sopel
  • staer
  • tedder
  • yamatt
  • Yossi
  • yovadia12
  • zachhuff386

boto v2.5.2

Release 2.5.2 is a bugfix release. It fixes the following critical issues: * issue 830

This issue only affects you if you are using DynamoDB on an EC2 instance with IAM Roles.

boto v2.5.1

Release 2.5.1 is a bugfix release. It fixes the following critical issues: * issue 819

boto v2.5.0

The 2.5.0 release of boto is now available on PyPI.

You can get a comprehensive list of all commits made between the 2.4.1 release and the 2.5.0 release at https://github.com/boto/boto/compare/2.4.1...2.5.0.

This release includes:

  • Support for IAM Roles for EC2 Instances
  • Added support for Capabilities in CloudFormation
  • Spot instances in autoscaling groups
  • Internal ELB’s
  • Added tenancy option to run_instances

There were 77 commits in this release from 18 different authors. The authors are listed below, in no particular order:

  • jimbrowne
  • cosmin
  • gtaylor
  • garnaat
  • brianjaystanley
  • jamesls
  • trevorsummerssmith
  • Bryan Donlan
  • davidmarble
  • jtriley
  • rdodev
  • toby
  • tpodowd
  • srs81
  • mfschwartz
  • rdegges
  • gholms

boto v2.4.0

The 2.4.0 release of boto is now available on PyPI.

You can get a comprehensive list of all commits made between the 2.3.0 release and the 2.4.0 release at https://github.com/boto/boto/compare/2.3.0...2.4.0.

This release includes:

  • Initial support for Amazon Cloudsearch Service.
  • Support for Amazon’s Marketplace Web Service.
  • Latency-based routing for Route53
  • Support for new domain verification features of SES.
  • A full rewrite of the FPS module.
  • Support for BatchWriteItem in DynamoDB.
  • Additional EMR steps for installing and running Pig scripts.
  • Support for additional batch operations in SQS.
  • Better support for VPC group-ids.
  • Many, many bugfixes from the community. Thanks for the reports and pull requests!

There were 175 commits in this release from 32 different authors. The authors are listed below, in no particular order:

  • estebistec
  • tpodowd
  • Max Noel
  • garnaat
  • mfschwartz
  • jtriley
  • akoumjian
  • jreese
  • mulka
  • Nuutti Kotivuori
  • mboersma
  • ryansb
  • dampier
  • crschmidt
  • nithint
  • sievlev
  • eckamm
  • imlucas
  • disruptek
  • trevorsummerssmith
  • tmorgan
  • evanworley
  • iandanforth
  • oozie
  • aedeph
  • alexanderdean
  • abrinsmead
  • dlecocq
  • bsimpson63
  • jamesls
  • cosmin
  • gtaylor

boto v2.3.0

The 2.3.0 release of boto is now available on PyPI.

You can view a list of issues that have been closed in this release at https://github.com/boto/boto/issues?milestone=6&state=closed.

You can get a comprehensive list of all commits made between the 2.2.2 release and the 2.3.0 release at https://github.com/boto/boto/compare/2.2.2...2.3.0.

This release includes initial support for Amazon Simple Workflow Service.

The API version of the FPS module was updated to 2010-08-28.

This release also includes many bug fixes and improvements in the Amazon DynamoDB module. One change of particular note is the behavior of the new_item method of the Table object. See http://readthedocs.org/docs/boto/en/2.3.0/ref/dynamodb.html#module-boto.dynamodb.table for more details.

There were 109 commits in this release from 21 different authors. The authors are listed below, in no particular order:

  • theju
  • garnaat
  • rdodev
  • mfschwartz
  • kopertop
  • tpodowd
  • gtaylor
  • kachok
  • croach
  • tmorgan
  • Erick Fejta
  • dherbst
  • marccohen
  • Arif Amirani
  • yuzeh
  • Roguelazer
  • awblocker
  • blinsay
  • Peter Broadwell
  • tierney
  • georgekola

boto v2.2.2

The 2.2.2 release of boto is now available on PyPI.

You can view a list of issues that have been closed in this release at https://github.com/boto/boto/issues?milestone=8&state=closed.

You can get a comprehensive list of all commits made between the 2.2.1 release and the 2.2.2 release at https://github.com/boto/boto/compare/2.2.1...2.2.2.

This is a bugfix release.

There were 71 commits in this release from 11 different authors. The authors are listed below, in no particular order:

  • aficionado
  • jimbrowne
  • rdodev
  • milancermak
  • garnaat
  • kopertop
  • samuraisam
  • tpodowd
  • psa
  • mfschwartz
  • gtaylor

boto v2.2.1

The 2.2.1 release fixes a packaging problem that was causing problems when installing via pip.

boto v2.2.0

The 2.2.0 release of boto is now available on PyPI.

You can view a list of issues that have been closed in this release at https://github.com/boto/boto/issues?milestone=5&state=closed.

You can get a comprehensive list of all commits made between the 2.0 release and the 2.1.0 release at https://github.com/boto/boto/compare/fa0d6a1e49c8468abbe2c99cdc9f5fd8fd19f8f8...26c8eb108873bf8ce1b9d96d642eea2beef78c77.

Some highlights of this release:

  • Support for Amazon DynamoDB service.
  • Support for S3 Object Lifecycle (Expiration).
  • Allow anonymous request for S3.
  • Support for creating Load Balancers in VPC.
  • Support for multi-dimension metrics in CloudWatch.
  • Support for Elastic Network Interfaces in EC2.
  • Support for Amazon S3 Multi-Delete capability.
  • Support for new AMIversion and overriding of parameters in EMR.
  • Support for SendMessageBatch request in SQS.
  • Support for DescribeInstanceStatus request in EC2.
  • Many, many improvements and additions to API documentation and Tutorials. Special thanks to Greg Taylor for all of the Sphinx cleanups and new docs.

There were 336 commits in this release from 40 different authors. The authors are listed below, in no particular order:

  • Garrett Holmstrom
  • mLewisLogic
  • Warren Turkal
  • Nathan Binkert
  • Scott Moser
  • Jeremy Edberg
  • najeira
  • Marc Cohen
  • Jim Browne
  • Mitch Garnaat
  • David Ormsbee
  • Blake Maltby
  • Thomas O’Dowd
  • Victor Trac
  • David Marin
  • Greg Taylor
  • rdodev
  • Jonathan Sabo
  • rdoci
  • Mike Schwartz
  • l33twolf
  • Keith Fitzgerald
  • Oleksandr Gituliar
  • Jason Allum
  • Ilya Volodarsky
  • Rajesh
  • Felipe Reyes
  • Andy Grimm
  • Seth Davis
  • Dave King
  • andy
  • Chris Moyer
  • ruben
  • Spike Gronim
  • Daniel Norberg
  • Justin Riley
  • Milan Cermak timtebeek
  • unknown
  • Yotam Gingold
  • Brian Oldfield

We processed 21 pull requests for this release from 40 different contributors. Here are the github user id’s for all of the pull request authors:

  • milancermak
  • jsabo
  • gituliar
  • rdodev
  • marccohen
  • tpodowd
  • trun
  • jallum
  • binkert
  • ormsbee
  • timtebeek

boto v2.1.1

The 2.1.1 release fixes one serious issue with the RDS module.

https://github.com/boto/boto/issues/382

boto v2.1.0

The 2.1.0 release of boto is now available on PyPI and Google Code.

You can view a list of issues that have been closed in this release at https://github.com/boto/boto/issues?milestone=4&state=closed)

You can get a comprehensive list of all commits made between the 2.0 release and the 2.1.0 release at https://github.com/boto/boto/compare/033457f30d...a0a1fd54ef.

Some highlights of this release:

  • Server-side encryption now supported in S3.
  • Better support for VPC in EC2.
  • Support for combiner in StreamingStep for EMR.
  • Support for CloudFormations.
  • Support for streaming uploads to Google Storage.
  • Support for generating signed URL’s in CloudFront.
  • MTurk connection now uses HTTPS by default, like all other Connection objects.
  • You can now PUT multiple data points to CloudWatch in one call.
  • CloudWatch Dimension object now correctly supports multiple values for same dimension name.
  • Lots of documentation fixes/additions

There were 235 commits in this release from 35 different authors. The authors are listed below, in no particular order:

  • Erick Fejta
  • Joel Barciauskas
  • Matthew Tai
  • Hyunjung Park
  • Mitch Garnaat
  • Victor Trac
  • Andy Grimm
  • ZerothAngel
  • Dan Lecocq
  • jmallen
  • Greg Taylor
  • Brian Grossman
  • Marc Brinkmann
  • Hunter Blanks
  • Steve Johnson
  • Keith Fitzgerald
  • Kamil Klimkiewicz
  • Eddie Hebert
  • garnaat
  • Samuel Lucidi
  • Kazuhiro Ogura
  • David Arthur
  • Michael Budde
  • Vineeth Pillai
  • Trevor Pounds
  • Mike Schwartz
  • Ryan Brown
  • Mark
  • Chetan Sarva
  • Dan Callahan
  • INADA Naoki
  • Mitchell Hashimoto
  • Chris Moyer
  • Riobard
  • Ted Romer
  • Justin Riley
  • Brian Beach
  • Simon Ratner

We processed 60 pull requests for this release from 40 different contributors. Here are the github user id’s for all of the pull request authors:

  • jtriley
  • mbr
  • jbarciauskas
  • hyunjung
  • bugi
  • ryansb
  • gtaylor
  • ehazlett
  • secretmike
  • riobard
  • simonratner
  • irskep
  • sanbornm
  • methane
  • jumping
  • mansam
  • miGlanz
  • dlecocq
  • fdr
  • mitchellh
  • ehebert
  • memory
  • hblanks
  • mbudde
  • ZerothAngel
  • goura
  • natedub
  • tpounds
  • bwbeach
  • mumrah
  • chetan
  • jmallen
  • a13m
  • mtai
  • fejta
  • jibs
  • callahad
  • vineethrp
  • JDrosdeck
  • gholms

If you are trying to reconcile that data (i.e. 35 different authors and 40 users with pull requests), well so am I. I’m just reporting on the data that I get from the Github api 8^)

Release Notes for boto 2.0

Highlights

There have been many, many changes since the 2.0b4 release. This overview highlights some of those changes.

  • Fix connection pooling bug: don’t close before reading.
  • Added AddInstanceGroup and ModifyInstanceGroup to boto.emr
  • Merge pull request #246 from chetan/multipart_s3put
  • AddInstanceGroupsResponse class to boto.emr.emrobject.
  • Removed extra print statement
  • Merge pull request #244 from ryansb/master
  • Added add_instance_groups function to boto.emr.connection. Built some helper methods for it, and added AddInstanceGroupsResponse class to boto.emr.emrobject.
  • Added a new class, InstanceGroup, with just a __init__ and __repr__.
  • Adding support for GetLoginProfile request to IAM. Removing commented lines in connection.py. Fixes GoogleCode issue 532.
  • Fixed issue #195
  • Added correct sax reader for boto.emr.emrobject.BootstrapAction
  • Fixed a typo bug in ConsoleOutput sax parsing and some PEP8 cleanup in connection.py.
  • Added initial support for generating a registration url for the aws marketplace
  • Fix add_record and del_record to support multiple values, like change_record does
  • Add support to accept SecurityGroupId as a parameter for ec2 run instances. This is required to create EC2 instances under VPC security groups
  • Added support for aliases to the add_change method of ResourceRecordSets.
  • Resign each request in a retry situation. Some services are starting to incorporate replay detection algorithms and the boto approach of simply re-trying the original request triggers them. Also a small bug fix to roboto and added a delay in the ec2 test to wait for consistency.
  • Fixed a problem with InstanceMonitoring parameter of LaunchConfigurations for autoscale module.
  • Route 53 Alias Resource Record Sets
  • Fixed App Engine support
  • Fixed incorrect host on App Engine
  • Fixed issue 199 on github.
  • First pass at put_metric_data
  • Changed boto.s3.Bucket.set_acl_xml() to ISO-8859-1 encode the Unicode ACL text before sending over HTTP connection.
  • Added GetQualificationScore for mturk.
  • Added UpdateQualificationScore for mturk
  • import_key_pair base64 fix
  • Fixes for ses send_email method better handling of exceptions
  • Add optional support for SSL server certificate validation.
  • Specify a reasonable socket timeout for httplib
  • Support for ap-northeast-1 region
  • Close issue #153
  • Close issue #154
  • we must POST autoscale user-data, not GET. otherwise a HTTP 505 error is returned from AWS. see: http://groups.google.com/group/boto-dev/browse_thread/thread/d5eb79c97ea8eecf?pli=1
  • autoscale userdata needs to be base64 encoded.
  • Use the unversioned streaming jar symlink provided by EMR
  • Updated lss3 to allow for prefix based listing (more like actual ls)
  • Deal with the groupSet element that appears in the instanceSet element in the DescribeInstances response.
  • Add a change_record command to bin/route53
  • Incorporating a patch from AWS to allow security groups to be tagged.
  • Fixed an issue with extra headers in generated URLs. Fixes http://code.google.com/p/boto/issues/detail?id=499
  • Incorporating a patch to handle obscure bug in apache/fastcgi. See http://goo.gl/0Tdax.
  • Reorganizing the existing test code. Part of a long-term project to completely revamp and improve boto tests.
  • Fixed an invalid parameter bug (ECS) #102
  • Adding initial cut at s3 website support.

Stats

  • 465 commits since boto 2.0b4
  • 70 authors
  • 111 Pull requests from 64 different authors

Contributors (in order of last commits)

  • Mitch Garnaat
  • Chris Moyer
  • Garrett Holmstrom
  • Justin Riley
  • Steve Johnson
  • Sean Talts
  • Brian Beach
  • Ryan Brown
  • Chetan Sarva
  • spenczar
  • Jonathan Drosdeck
  • garnaat
  • Nathaniel Moseley
  • Bradley Ayers
  • jibs
  • Kenneth Falck
  • chirag
  • Sean O’Connor
  • Scott Moser
  • Vineeth Pillai
  • Greg Taylor
  • root
  • darktable
  • flipkin
  • brimcfadden
  • Samuel Lucidi
  • Terence Honles
  • Mike Schwartz
  • Waldemar Kornewald
  • Lucas Hrabovsky
  • thaDude
  • Vinicius Ruan Cainelli
  • David Marin
  • Stanislav Ievlev
  • Victor Trac
  • Dan Fairs
  • David Pisoni
  • Matt Robenolt
  • Matt Billenstein
  • rgrp
  • vikalp
  • Christoph Kern
  • Gabriel Monroy
  • Ben Burry
  • Hinnerk
  • Jann Kleen
  • Louis R. Marascio
  • Matt Singleton
  • David Park
  • Nick Tarleton
  • Cory Mintz
  • Robert Mela
  • rlotun
  • John Walsh
  • Keith Fitzgerald
  • Pierre Riteau
  • ryancustommade
  • Fabian Topfstedt
  • Michael Thompson
  • sanbornm
  • Seth Golub
  • Jon Colverson
  • Steve Howard
  • Roberto Gaiser
  • James Downs
  • Gleicon Moraes
  • Blake Maltby
  • Mac Morgan
  • Rytis Sileika
  • winhamwr

Major changes for release 2.0b1

  • Support for versioning in S3
  • Support for MFA Delete in S3
  • Support for Elastic Map Reduce
  • Support for Simple Notification Service
  • Support for Google Storage
  • Support for Consistent Reads and Conditional Puts in SimpleDB
  • Significant updates and improvements to Mechanical Turk (mturk) module
  • Support for Windows Bundle Tasks in EC2
  • Support for Reduced Redundancy Storage (RRS) in S3
  • Support for Cluster Computing instances and Placement Groups in EC2

Getting Started with Boto

This tutorial will walk you through installing and configuring boto, as well how to use it to make API calls.

This tutorial assumes you are familiar with Python & that you have registered for an Amazon Web Services account. You’ll need retrieve your Access Key ID and Secret Access Key from the web-based console.

Installing Boto

You can use pip to install the latest released version of boto:

pip install boto

If you want to install boto from source:

git clone git://github.com/boto/boto.git
cd boto
python setup.py install

Note

For most services, this is enough to get going. However, to support everything Boto ships with, you should additionally run pip install -r requirements.txt.

This installs all additional, non-stdlib modules, enabling use of things like boto.cloudsearch, boto.manage & boto.mashups, as well as covering everything needed for the test suite.

Using Virtual Environments

Another common way to install boto is to use a virtualenv, which provides isolated environments. First, install the virtualenv Python package:

pip install virtualenv

Next, create a virtual environment by using the virtualenv command and specifying where you want the virtualenv to be created (you can specify any directory you like, though this example allows for compatibility with virtualenvwrapper):

mkdir ~/.virtualenvs
virtualenv ~/.virtualenvs/boto

You can now activate the virtual environment:

source ~/.virtualenvs/boto/bin/activate

Now, any usage of python or pip (within the current shell) will default to the new, isolated version within your virtualenv.

You can now install boto into this virtual environment:

pip install boto

When you are done using boto, you can deactivate your virtual environment:

deactivate

If you are creating a lot of virtual environments, virtualenvwrapper is an excellent tool that lets you easily manage your virtual environments.

Configuring Boto Credentials

You have a few options for configuring boto (see Boto Config). For this tutorial, we’ll be using a configuration file. First, create a ~/.boto file with these contents:

[Credentials]
aws_access_key_id = YOURACCESSKEY
aws_secret_access_key = YOURSECRETKEY

boto supports a number of configuration values. For more information, see Boto Config. The above file, however, is all we need for now. You’re now ready to use boto.

Making Connections

boto provides a number of convenience functions to simplify connecting to a service. For example, to work with S3, you can run:

>>> import boto
>>> s3 = boto.connect_s3()

If you want to connect to a different region, you can import the service module and use the connect_to_region functions. For example, to create an EC2 client in ‘us-west-2’ region, you’d run the following:

>>> import boto.ec2
>>> ec2 = boto.ec2.connect_to_region('us-west-2')

Troubleshooting Connections

When calling the various connect_* functions, you might run into an error like this:

>>> import boto
>>> s3 = boto.connect_s3()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "boto/__init__.py", line 121, in connect_s3
    return S3Connection(aws_access_key_id, aws_secret_access_key, **kwargs)
  File "boto/s3/connection.py", line 171, in __init__
    validate_certs=validate_certs)
  File "boto/connection.py", line 548, in __init__
    host, config, self.provider, self._required_auth_capability())
  File "boto/auth.py", line 668, in get_auth_handler
    'Check your credentials' % (len(names), str(names)))
boto.exception.NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV1Handler'] Check your credentials

This is because boto cannot find credentials to use. Verify that you have created a ~/.boto file as shown above. You can also turn on debug logging to verify where your credentials are coming from:

>>> import boto
>>> boto.set_stream_logger('boto')
>>> s3 = boto.connect_s3()
2012-12-10 17:15:03,799 boto [DEBUG]:Using access key found in config file.
2012-12-10 17:15:03,799 boto [DEBUG]:Using secret key found in config file.

Interacting with AWS Services

Once you have a client for the specific service you want, there are methods on that object that will invoke API operations for that service. The following code demonstrates how to create a bucket and put an object in that bucket:

>>> import boto
>>> import time
>>> s3 = boto.connect_s3()

# Create a new bucket. Buckets must have a globally unique name (not just
# unique to your account).
>>> bucket = s3.create_bucket('boto-demo-%s' % int(time.time()))

# Create a new key/value pair.
>>> key = bucket.new_key('mykey')
>>> key.set_contents_from_string("Hello World!")

# Sleep to ensure the data is eventually there.
>>> time.sleep(2)

# Retrieve the contents of ``mykey``.
>>> print key.get_contents_as_string()
'Hello World!'

# Delete the key.
>>> key.delete()
# Delete the bucket.
>>> bucket.delete()

Each service supports a different set of commands. You’ll want to refer to the other guides & API references in this documentation, as well as referring to the official AWS API documentation.

Next Steps

For many of the services that boto supports, there are tutorials as well as detailed API documentation. If you are interested in a specific service, the tutorial for the service is a good starting point. For instance, if you’d like more information on S3, check out the S3 Tutorial and the S3 API reference.

An Introduction to boto’s EC2 interface

This tutorial focuses on the boto interface to the Elastic Compute Cloud from Amazon Web Services. This tutorial assumes that you have already downloaded and installed boto.

Creating a Connection

The first step in accessing EC2 is to create a connection to the service. The recommended way of doing this in boto is:

>>> import boto.ec2
>>> conn = boto.ec2.connect_to_region("us-west-2",
...    aws_access_key_id='<aws access key>',
...    aws_secret_access_key='<aws secret key>')

At this point the variable conn will point to an EC2Connection object. In this example, the AWS access key and AWS secret key are passed in to the method explicitly. Alternatively, you can set the boto config environment variables and then simply specify which region you want as follows:

>>> conn = boto.ec2.connect_to_region("us-west-2")

In either case, conn will point to an EC2Connection object which we will use throughout the remainder of this tutorial.

Launching Instances

Possibly, the most important and common task you’ll use EC2 for is to launch, stop and terminate instances. In its most primitive form, you can launch an instance as follows:

>>> conn.run_instances('<ami-image-id>')

This will launch an instance in the specified region with the default parameters. You will not be able to SSH into this machine, as it doesn’t have a security group set. See EC2 Security Groups for details on creating one.

Now, let’s say that you already have a key pair, want a specific type of instance, and you have your security group all setup. In this case we can use the keyword arguments to accomplish that:

>>> conn.run_instances(
        '<ami-image-id>',
        key_name='myKey',
        instance_type='c1.xlarge',
        security_groups=['your-security-group-here'])

The main caveat with the above call is that it is possible to request an instance type that is not compatible with the provided AMI (for example, the instance was created for a 64-bit instance and you choose a m1.small instance_type). For more details on the plethora of possible keyword parameters, be sure to check out boto’s EC2 API reference.

Stopping Instances

Once you have your instances up and running, you might wish to shut them down if they’re not in use. Please note that this will only de-allocate virtual hardware resources (as well as instance store drives), but won’t destroy your EBS volumes – this means you’ll pay nominal provisioned EBS storage fees even if your instance is stopped. To do this, you can do so as follows:

>>> conn.stop_instances(instance_ids=['instance-id-1','instance-id-2', ...])

This will request a ‘graceful’ stop of each of the specified instances. If you wish to request the equivalent of unplugging your instance(s), simply add force=True keyword argument to the call above. Please note that stop instance is not allowed with Spot instances.

Terminating Instances

Once you are completely done with your instance and wish to surrender both virtual hardware, root EBS volume and all other underlying components you can request instance termination. To do so you can use the call bellow:

>>> conn.terminate_instances(instance_ids=['instance-id-1','instance-id-2', ...])

Please use with care since once you request termination for an instance there is no turning back.

Checking What Instances Are Running

You can also get information on your currently running instances:

>>> reservations = conn.get_all_reservations()
>>> reservations
[Reservation:r-00000000]

A reservation corresponds to a command to start instances. You can see what instances are associated with a reservation:

>>> instances = reservations[0].instances
>>> instances
[Instance:i-00000000]

An instance object allows you get more meta-data available about the instance:

>>> inst = instances[0]
>>> inst.instance_type
u'c1.xlarge'
>>> inst.placement
u'us-west-2'

In this case, we can see that our instance is a c1.xlarge instance in the us-west-2 availability zone.

Checking Health Status Of Instances

You can also get the health status of your instances, including any scheduled events:

>>> statuses = conn.get_all_instance_status()
>>> statuses
[InstanceStatus:i-00000000]

An instance status object allows you to get information about impaired functionality or scheduled / system maintenance events:

>>> status = statuses[0]
>>> status.events
[Event:instance-reboot]
>>> event = status.events[0]
>>> event.description
u'Maintenance software update.'
>>> event.not_before
u'2011-12-11T04:00:00.000Z'
>>> event.not_after
u'2011-12-11T10:00:00.000Z'
>>> status.instance_status
Status:ok
>>> status.system_status
Status:ok
>>> status.system_status.details
{u'reachability': u'passed'}

This will by default include the health status only for running instances. If you wish to request the health status for all instances, simply add include_all_instances=True keyword argument to the call above.

Using Elastic Block Storage (EBS)

EBS Basics

EBS can be used by EC2 instances for permanent storage. Note that EBS volumes must be in the same availability zone as the EC2 instance you wish to attach it to.

To actually create a volume you will need to specify a few details. The following example will create a 50GB EBS in one of the us-west-2 availability zones:

>>> vol = conn.create_volume(50, "us-west-2")
>>> vol
Volume:vol-00000000

You can check that the volume is now ready and available:

>>> curr_vol = conn.get_all_volumes([vol.id])[0]
>>> curr_vol.status
u'available'
>>> curr_vol.zone
u'us-west-2'

We can now attach this volume to the EC2 instance we created earlier, making it available as a new device:

>>> conn.attach_volume (vol.id, inst.id, "/dev/sdx")
u'attaching'

You will now have a new volume attached to your instance. Note that with some Linux kernels, /dev/sdx may get translated to /dev/xvdx. This device can now be used as a normal block device within Linux.

Working With Snapshots

Snapshots allow you to make point-in-time snapshots of an EBS volume for future recovery. Snapshots allow you to create incremental backups, and can also be used to instantiate multiple new volumes. Snapshots can also be used to move EBS volumes across availability zones or making backups to S3.

Creating a snapshot is easy:

>>> snapshot = conn.create_snapshot(vol.id, 'My snapshot')
>>> snapshot
Snapshot:snap-00000000

Once you have a snapshot, you can create a new volume from it. Volumes are created lazily from snapshots, which means you can start using such a volume straight away:

>>> new_vol = snapshot.create_volume('us-west-2')
>>> conn.attach_volume (new_vol.id, inst.id, "/dev/sdy")
u'attaching'

If you no longer need a snapshot, you can also easily delete it:

>>> conn.delete_snapshot(snapshot.id)
True

Working With Launch Configurations

Launch Configurations allow you to create a re-usable set of properties for an instance. These are used with AutoScaling groups to produce consistent repeatable instances sets.

Creating a Launch Configuration is easy:

>>> conn = boto.connect_autoscale()
>>> config = LaunchConfiguration(name='foo', image_id='ami-abcd1234', key_name='foo.pem')
>>> conn.create_launch_configuration(config)

Once you have a launch configuration, you can list you current configurations:

>>> conn = boto.connect_autoscale()
>>> config = conn.get_all_launch_configurations(names=['foo'])

If you no longer need a launch configuration, you can delete it:

>>> conn = boto.connect_autoscale()
>>> conn.delete_launch_configuration('foo')

Changed in version 2.27.0.

Note

If use_block_device_types=True is passed to the connection it will deserialize Launch Configurations with Block Device Mappings into a re-usable format with BlockDeviceType objects, similar to how AMIs are deserialized currently. Legacy behavior is to put them into a format that is incompatible with creating new Launch Configurations. This switch is in place to preserve backwards compatability, but its usage is the preferred format going forward.

If you would like to use the new format, you should use something like:

>>> conn = boto.connect_autoscale(use_block_device_types=True)
>>> config = conn.get_all_launch_configurations(names=['foo'])

EC2 Security Groups

Amazon defines a security group as:

“A security group is a named collection of access rules. These access rules
specify which ingress, i.e. incoming, network traffic should be delivered to your instance.”

To get a listing of all currently defined security groups:

>>> rs = conn.get_all_security_groups()
>>> print rs
[SecurityGroup:appserver, SecurityGroup:default, SecurityGroup:vnc, SecurityGroup:webserver]

Each security group can have an arbitrary number of rules which represent different network ports which are being enabled. To find the rules for a particular security group, use the rules attribute:

>>> sg = rs[1]
>>> sg.name
u'default'
>>> sg.rules
[IPPermissions:tcp(0-65535),
 IPPermissions:udp(0-65535),
 IPPermissions:icmp(-1--1),
 IPPermissions:tcp(22-22),
 IPPermissions:tcp(80-80)]

In addition to listing the available security groups you can also create a new security group. I’ll follow through the “Three Tier Web Service” example included in the EC2 Developer’s Guide for an example of how to create security groups and add rules to them.

First, let’s create a group for our Apache web servers that allows HTTP access to the world:

>>> web = conn.create_security_group('apache', 'Our Apache Group')
>>> web
SecurityGroup:apache
>>> web.authorize('tcp', 80, 80, '0.0.0.0/0')
True

The first argument is the ip protocol which can be one of; tcp, udp or icmp. The second argument is the FromPort or the beginning port in the range, the third argument is the ToPort or the ending port in the range and the last argument is the CIDR IP range to authorize access to.

Next we create another group for the app servers:

>>> app = conn.create_security_group('appserver', 'The application tier')

We then want to grant access between the web server group and the app server group. So, rather than specifying an IP address as we did in the last example, this time we will specify another SecurityGroup object.:

>>> app.authorize(src_group=web)
True

Now, to verify that the web group now has access to the app servers, we want to temporarily allow SSH access to the web servers from our computer. Let’s say that our IP address is 192.168.1.130 as it is in the EC2 Developer Guide. To enable that access:

>>> web.authorize(ip_protocol='tcp', from_port=22, to_port=22, cidr_ip='192.168.1.130/32')
True

Now that this access is authorized, we could ssh into an instance running in the web group and then try to telnet to specific ports on servers in the appserver group, as shown in the EC2 Developer’s Guide. When this testing is complete, we would want to revoke SSH access to the web server group, like this:

>>> web.rules
[IPPermissions:tcp(80-80),
 IPPermissions:tcp(22-22)]
>>> web.revoke('tcp', 22, 22, cidr_ip='192.168.1.130/32')
True
>>> web.rules
[IPPermissions:tcp(80-80)]

An Introduction to boto’s Elastic Mapreduce interface

This tutorial focuses on the boto interface to Elastic Mapreduce from Amazon Web Services. This tutorial assumes that you have already downloaded and installed boto.

Creating a Connection

The first step in accessing Elastic Mapreduce is to create a connection to the service. There are two ways to do this in boto. The first is:

>>> from boto.emr.connection import EmrConnection
>>> conn = EmrConnection('<aws access key>', '<aws secret key>')

At this point the variable conn will point to an EmrConnection object. In this example, the AWS access key and AWS secret key are passed in to the method explicitly. Alternatively, you can set the environment variables:

AWS_ACCESS_KEY_ID - Your AWS Access Key ID AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key

and then call the constructor without any arguments, like this:

>>> conn = EmrConnection()

There is also a shortcut function in boto that makes it easy to create EMR connections:

>>> import boto.emr
>>> conn = boto.emr.connect_to_region('us-west-2')

In either case, conn points to an EmrConnection object which we will use throughout the remainder of this tutorial.

Creating Streaming JobFlow Steps

Upon creating a connection to Elastic Mapreduce you will next want to create one or more jobflow steps. There are two types of steps, streaming and custom jar, both of which have a class in the boto Elastic Mapreduce implementation.

Creating a streaming step that runs the AWS wordcount example, itself written in Python, can be accomplished by:

>>> from boto.emr.step import StreamingStep
>>> step = StreamingStep(name='My wordcount example',
...                      mapper='s3n://elasticmapreduce/samples/wordcount/wordSplitter.py',
...                      reducer='aggregate',
...                      input='s3n://elasticmapreduce/samples/wordcount/input',
...                      output='s3n://<my output bucket>/output/wordcount_output')

where <my output bucket> is a bucket you have created in S3.

Note that this statement does not run the step, that is accomplished later when we create a jobflow.

Additional arguments of note to the streaming jobflow step are cache_files, cache_archive and step_args. The options cache_files and cache_archive enable you to use the Hadoops distributed cache to share files amongst the instances that run the step. The argument step_args allows one to pass additional arguments to Hadoop streaming, for example modifications to the Hadoop job configuration.

Creating Custom Jar Job Flow Steps

The second type of jobflow step executes tasks written with a custom jar. Creating a custom jar step for the AWS CloudBurst example can be accomplished by:

>>> from boto.emr.step import JarStep
>>> step = JarStep(name='Coudburst example',
...                jar='s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar',
...                step_args=['s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br',
...                           's3n://elasticmapreduce/samples/cloudburst/input/100k.br',
...                           's3n://<my output bucket>/output/cloudfront_output',
...                            36, 3, 0, 1, 240, 48, 24, 24, 128, 16])

Note that this statement does not actually run the step, that is accomplished later when we create a jobflow. Also note that this JarStep does not include a main_class argument since the jar MANIFEST.MF has a Main-Class entry.

Creating JobFlows

Once you have created one or more jobflow steps, you will next want to create and run a jobflow. Creating a jobflow that executes either of the steps we created above can be accomplished by:

>>> import boto.emr
>>> conn = boto.emr.connect_to_region('us-west-2')
>>> jobid = conn.run_jobflow(name='My jobflow',
...                          log_uri='s3://<my log uri>/jobflow_logs',
...                          steps=[step])

The method will not block for the completion of the jobflow, but will immediately return. The status of the jobflow can be determined by:

>>> status = conn.describe_jobflow(jobid)
>>> status.state
u'STARTING'

One can then use this state to block for a jobflow to complete. Valid jobflow states currently defined in the AWS API are COMPLETED, FAILED, TERMINATED, RUNNING, SHUTTING_DOWN, STARTING and WAITING.

In some cases you may not have built all of the steps prior to running the jobflow. In these cases additional steps can be added to a jobflow by running:

>>> conn.add_jobflow_steps(jobid, [second_step])

If you wish to add additional steps to a running jobflow you may want to set the keep_alive parameter to True in run_jobflow so that the jobflow does not automatically terminate when the first step completes.

The run_jobflow method has a number of important parameters that are worth investigating. They include parameters to change the number and type of EC2 instances on which the jobflow is executed, set a SSH key for manual debugging and enable AWS console debugging.

Terminating JobFlows

By default when all the steps of a jobflow have finished or failed the jobflow terminates. However, if you set the keep_alive parameter to True or just want to halt the execution of a jobflow early you can terminate a jobflow by:

>>> import boto.emr
>>> conn = boto.emr.connect_to_region('us-west-2')
>>> conn.terminate_jobflow('<jobflow id>')

An Introduction to boto’s Autoscale interface

This tutorial focuses on the boto interface to the Autoscale service. This assumes you are familiar with boto’s EC2 interface and concepts.

Autoscale Concepts

The AWS Autoscale service is comprised of three core concepts:

  1. Autoscale Group (AG): An AG can be viewed as a collection of criteria for maintaining or scaling a set of EC2 instances over one or more availability zones. An AG is limited to a single region.
  2. Launch Configuration (LC): An LC is the set of information needed by the AG to launch new instances - this can encompass image ids, startup data, security groups and keys. Only one LC is attached to an AG.
  3. Triggers: A trigger is essentially a set of rules for determining when to scale an AG up or down. These rules can encompass a set of metrics such as average CPU usage across instances, or incoming requests, a threshold for when an action will take place, as well as parameters to control how long to wait after a threshold is crossed.

Creating a Connection

The first step in accessing autoscaling is to create a connection to the service. There are two ways to do this in boto. The first is:

>>> from boto.ec2.autoscale import AutoScaleConnection
>>> conn = AutoScaleConnection('<aws access key>', '<aws secret key>')
A Note About Regions and Endpoints

Like EC2 the Autoscale service has a different endpoint for each region. By default the US endpoint is used. To choose a specific region, instantiate the AutoScaleConnection object with that region’s endpoint.

>>> import boto.ec2.autoscale
>>> autoscale = boto.ec2.autoscale.connect_to_region('eu-west-1')

Alternatively, edit your boto.cfg with the default Autoscale endpoint to use:

[Boto]
autoscale_endpoint = autoscaling.eu-west-1.amazonaws.com
Getting Existing AutoScale Groups

To retrieve existing autoscale groups:

>>> conn.get_all_groups()

You will get back a list of AutoScale group objects, one for each AG you have.

Creating Autoscaling Groups

An Autoscaling group has a number of parameters associated with it.

  1. Name: The name of the AG.
  2. Availability Zones: The list of availability zones it is defined over.
  3. Minimum Size: Minimum number of instances running at one time.
  4. Maximum Size: Maximum number of instances running at one time.
  5. Launch Configuration (LC): A set of instructions on how to launch an instance.
  6. Load Balancer: An optional ELB load balancer to use. See the ELB tutorial for information on how to create a load balancer.

For the purposes of this tutorial, let’s assume we want to create one autoscale group over the us-east-1a and us-east-1b availability zones. We want to have two instances in each availability zone, thus a minimum size of 4. For now we won’t worry about scaling up or down - we’ll introduce that later when we talk about triggers. Thus we’ll set a maximum size of 4 as well. We’ll also associate the AG with a load balancer which we assume we’ve already created, called ‘my_lb’.

Our LC tells us how to start an instance. This will at least include the image id to use, security_group, and key information. We assume the image id, key name and security groups have already been defined elsewhere - see the EC2 tutorial for information on how to create these.

>>> from boto.ec2.autoscale import LaunchConfiguration
>>> from boto.ec2.autoscale import AutoScalingGroup
>>> lc = LaunchConfiguration(name='my-launch_config', image_id='my-ami',
                             key_name='my_key_name',
                             security_groups=['my_security_groups'])
>>> conn.create_launch_configuration(lc)

We now have created a launch configuration called ‘my-launch-config’. We are now ready to associate it with our new autoscale group.

>>> ag = AutoScalingGroup(group_name='my_group', load_balancers=['my-lb'],
                          availability_zones=['us-east-1a', 'us-east-1b'],
                          launch_config=lc, min_size=4, max_size=8,
                          connection=conn)
>>> conn.create_auto_scaling_group(ag)

We now have a new autoscaling group defined! At this point instances should be starting to launch. To view activity on an autoscale group:

>>> ag.get_activities()
 [Activity:Launching a new EC2 instance status:Successful progress:100,
  ...]

or alternatively:

>>> conn.get_all_activities(ag)

This autoscale group is fairly useful in that it will maintain the minimum size without breaching the maximum size defined. That means if one instance crashes, the autoscale group will use the launch configuration to start a new one in an attempt to maintain its minimum defined size. It knows instance health using the health check defined on its associated load balancer.

Scaling a Group Up or Down

It can also be useful to scale a group up or down depending on certain criteria. For example, if the average CPU utilization of the group goes above 70%, you may want to scale up the number of instances to deal with demand. Likewise, you might want to scale down if usage drops again. These rules for how to scale are defined by Scaling Policies, and the rules for when to scale are defined by CloudWatch Metric Alarms.

For example, let’s configure scaling for the above group based on CPU utilization. We’ll say it should scale up if the average CPU usage goes above 70% and scale down if it goes below 40%.

Firstly, define some Scaling Policies. These tell Auto Scaling how to scale the group (but not when to do it, we’ll specify that later).

We need one policy for scaling up and one for scaling down.

>>> from boto.ec2.autoscale import ScalingPolicy
>>> scale_up_policy = ScalingPolicy(
            name='scale_up', adjustment_type='ChangeInCapacity',
            as_name='my_group', scaling_adjustment=1, cooldown=180)
>>> scale_down_policy = ScalingPolicy(
            name='scale_down', adjustment_type='ChangeInCapacity',
            as_name='my_group', scaling_adjustment=-1, cooldown=180)

The policy objects are now defined locally. Let’s submit them to AWS.

>>> conn.create_scaling_policy(scale_up_policy)
>>> conn.create_scaling_policy(scale_down_policy)

Now that the polices have been digested by AWS, they have extra properties that we aren’t aware of locally. We need to refresh them by requesting them back again.

>>> scale_up_policy = conn.get_all_policies(
            as_group='my_group', policy_names=['scale_up'])[0]
>>> scale_down_policy = conn.get_all_policies(
            as_group='my_group', policy_names=['scale_down'])[0]

Specifically, we’ll need the Amazon Resource Name (ARN) of each policy, which will now be a property of our ScalingPolicy objects.

Next we’ll create CloudWatch alarms that will define when to run the Auto Scaling Policies.

>>> import boto.ec2.cloudwatch
>>> cloudwatch = boto.ec2.cloudwatch.connect_to_region('us-west-2')

It makes sense to measure the average CPU usage across the whole Auto Scaling Group, rather than individual instances. We express that as CloudWatch Dimensions.

>>> alarm_dimensions = {"AutoScalingGroupName": 'my_group'}

Create an alarm for when to scale up, and one for when to scale down.

>>> from boto.ec2.cloudwatch import MetricAlarm
>>> scale_up_alarm = MetricAlarm(
            name='scale_up_on_cpu', namespace='AWS/EC2',
            metric='CPUUtilization', statistic='Average',
            comparison='>', threshold='70',
            period='60', evaluation_periods=2,
            alarm_actions=[scale_up_policy.policy_arn],
            dimensions=alarm_dimensions)
>>> cloudwatch.create_alarm(scale_up_alarm)
>>> scale_down_alarm = MetricAlarm(
            name='scale_down_on_cpu', namespace='AWS/EC2',
            metric='CPUUtilization', statistic='Average',
            comparison='<', threshold='40',
            period='60', evaluation_periods=2,
            alarm_actions=[scale_down_policy.policy_arn],
            dimensions=alarm_dimensions)
>>> cloudwatch.create_alarm(scale_down_alarm)

Auto Scaling will now create a new instance if the existing cluster averages more than 70% CPU for two minutes. Similarly, it will terminate an instance when CPU usage sits below 40%. Auto Scaling will not add or remove instances beyond the limits of the Scaling Group’s ‘max_size’ and ‘min_size’ properties.

To retrieve the instances in your autoscale group:

>>> import boto.ec2
>>> ec2 = boto.ec2.connect_to_region('us-west-2)
>>> group = conn.get_all_groups(names=['my_group'])[0]
>>> instance_ids = [i.instance_id for i in group.instances]
>>> instances = ec2.get_only_instances(instance_ids)

To delete your autoscale group, we first need to shutdown all the instances:

>>> ag.shutdown_instances()

Once the instances have been shutdown, you can delete the autoscale group:

>>> ag.delete()

You can also delete your launch configuration:

>>> lc.delete()

CloudFront

This new boto module provides an interface to Amazon’s Content Service, CloudFront.

Warning

This module is not well tested. Paging of distributions is not yet supported. CNAME support is completely untested. Use with caution. Feedback and bug reports are greatly appreciated.

Creating a CloudFront connection

If you’ve placed your credentials in your $HOME/.boto config file then you can simply create a CloudFront connection using:

>>> import boto
>>> c = boto.connect_cloudfront()

If you do not have this file you will need to specify your AWS access key and secret access key:

>>> import boto
>>> c = boto.connect_cloudfront('your-aws-access-key-id', 'your-aws-secret-access-key')

Working with CloudFront Distributions

Create a new boto.cloudfront.distribution.Distribution:

>>> origin = boto.cloudfront.origin.S3Origin('mybucket.s3.amazonaws.com')
>>> distro = c.create_distribution(origin=origin, enabled=False, comment='My new Distribution')
>>> d.domain_name
u'd2oxf3980lnb8l.cloudfront.net'
>>> d.id
u'ECH69MOIW7613'
>>> d.status
u'InProgress'
>>> d.config.comment
u'My new distribution'
>>> d.config.origin
<S3Origin: mybucket.s3.amazonaws.com>
>>> d.config.caller_reference
u'31b8d9cf-a623-4a28-b062-a91856fac6d0'
>>> d.config.enabled
False

Note that a new caller reference is created automatically, using uuid.uuid4(). The boto.cloudfront.distribution.Distribution, boto.cloudfront.distribution.DistributionConfig and boto.cloudfront.distribution.DistributionSummary objects are defined in the boto.cloudfront.distribution module.

To get a listing of all current distributions:

>>> rs = c.get_all_distributions()
>>> rs
[<boto.cloudfront.distribution.DistributionSummary instance at 0xe8d4e0>,
 <boto.cloudfront.distribution.DistributionSummary instance at 0xe8d788>]

This returns a list of boto.cloudfront.distribution.DistributionSummary objects. Note that paging is not yet supported! To get a boto.cloudfront.distribution.DistributionObject from a boto.cloudfront.distribution.DistributionSummary object:

>>> ds = rs[1]
>>> distro = ds.get_distribution()
>>> distro.domain_name
u'd2oxf3980lnb8l.cloudfront.net'

To change a property of a distribution object:

>>> distro.comment
u'My new distribution'
>>> distro.update(comment='This is a much better comment')
>>> distro.comment
'This is a much better comment'

You can also enable/disable a distribution using the following convenience methods:

>>> distro.enable()  # just calls distro.update(enabled=True)

or:

>>> distro.disable()  # just calls distro.update(enabled=False)

The only attributes that can be updated for a Distribution are comment, enabled and cnames.

To delete a boto.cloudfront.distribution.Distribution:

>>> distro.delete()

Invalidating CloudFront Distribution Paths

Invalidate a list of paths in a CloudFront distribution:

>>> paths = ['/path/to/file1.html', '/path/to/file2.html', ...]
>>> inval_req = c.create_invalidation_request(u'ECH69MOIW7613', paths)
>>> print inval_req
<InvalidationBatch: IFCT7K03VUETK>
>>> print inval_req.id
u'IFCT7K03VUETK'
>>> print inval_req.paths
[u'/path/to/file1.html', u'/path/to/file2.html', ..]

Warning

Each CloudFront invalidation request can only specify up to 1000 paths. If you need to invalidate more than 1000 paths you will need to split up the paths into groups of 1000 or less and create multiple invalidation requests.

This will return a boto.cloudfront.invalidation.InvalidationBatch object representing the invalidation request. You can also fetch a single invalidation request for a given distribution using invalidation_request_status:

>>> inval_req = c.invalidation_request_status(u'ECH69MOIW7613', u'IFCT7K03VUETK')
>>> print inval_req
<InvalidationBatch: IFCT7K03VUETK>

The first parameter is the CloudFront distribution id the request belongs to and the second parameter is the invalidation request id.

It’s also possible to get all invalidations for a given CloudFront distribution:

>>> invals = c.get_invalidation_requests(u'ECH69MOIW7613')
>>> print invals
<boto.cloudfront.invalidation.InvalidationListResultSet instance at 0x15d28d0>

This will return an instance of boto.cloudfront.invalidation.InvalidationListResultSet which is an iterable object that contains a list of boto.cloudfront.invalidation.InvalidationSummary objects that describe each invalidation request and its status:

>>> for inval in invals:
>>>     print 'Object: %s, ID: %s, Status: %s' % (inval, inval.id, inval.status)
Object: <InvalidationSummary: ICXT2K02SUETK>, ID: ICXT2K02SUETK, Status: Completed
Object: <InvalidationSummary: ITV9SV0PDNY1Y>, ID: ITV9SV0PDNY1Y, Status: Completed
Object: <InvalidationSummary: I1X3F6N0PLGJN5>, ID: I1X3F6N0PLGJN5, Status: Completed
Object: <InvalidationSummary: I1F3G9N0ZLGKN2>, ID: I1F3G9N0ZLGKN2, Status: Completed
...

Simply iterating over the boto.cloudfront.invalidation.InvalidationListResultSet object will automatically paginate the results on-the-fly as needed by repeatedly requesting more results from CloudFront until there are none left.

If you wish to paginate the results manually you can do so by specifying the max_items option when calling get_invalidation_requests:

>>> invals = c.get_invalidation_requests(u'ECH69MOIW7613', max_items=2)
>>> print len(list(invals))
2
>>> for inval in invals:
>>>     print 'Object: %s, ID: %s, Status: %s' % (inval, inval.id, inval.status)
Object: <InvalidationSummary: ICXT2K02SUETK>, ID: ICXT2K02SUETK, Status: Completed
Object: <InvalidationSummary: ITV9SV0PDNY1Y>, ID: ITV9SV0PDNY1Y, Status: Completed

In this case, iterating over the boto.cloudfront.invalidation.InvalidationListResultSet object will only make a single request to CloudFront and only max_items invalidation requests are returned by the iterator. To get the next “page” of results pass the next_marker attribute of the previous boto.cloudfront.invalidation.InvalidationListResultSet object as the marker option to the next call to get_invalidation_requests:

>>> invals = c.get_invalidation_requests(u'ECH69MOIW7613', max_items=10, marker=invals.next_marker)
>>> print len(list(invals))
2
>>> for inval in invals:
>>>     print 'Object: %s, ID: %s, Status: %s' % (inval, inval.id, inval.status)
Object: <InvalidationSummary: I1X3F6N0PLGJN5>, ID: I1X3F6N0PLGJN5, Status: Completed
Object: <InvalidationSummary: I1F3G9N0ZLGKN2>, ID: I1F3G9N0ZLGKN2, Status: Completed

You can get the boto.cloudfront.invalidation.InvalidationBatch object representing the invalidation request pointed to by a boto.cloudfront.invalidation.InvalidationSummary object using:

>>> inval_req = inval.get_invalidation_request()
>>> print inval_req
<InvalidationBatch: IFCT7K03VUETK>

Similarly you can get the parent boto.cloudfront.distribution.Distribution object for the invalidation request from a boto.cloudfront.invalidation.InvalidationSummary object using:

>>> dist = inval.get_distribution()
>>> print dist
<boto.cloudfront.distribution.Distribution instance at 0x304a7e8>

An Introduction to boto’s SimpleDB interface

This tutorial focuses on the boto interface to AWS’ SimpleDB. This tutorial assumes that you have boto already downloaded and installed.

Note

If you’re starting a new application, you might want to consider using DynamoDB2 instead, as it has a more comprehensive feature set & has guaranteed performance throughput levels.

Creating a Connection

The first step in accessing SimpleDB is to create a connection to the service. To do so, the most straight forward way is the following:

>>> import boto.sdb
>>> conn = boto.sdb.connect_to_region(
...     'us-west-2',
...     aws_access_key_id='<YOUR_AWS_KEY_ID>',
...     aws_secret_access_key='<YOUR_AWS_SECRET_KEY>')
>>> conn
SDBConnection:sdb.amazonaws.com
>>>

Bear in mind that if you have your credentials in boto config in your home directory, the two keyword arguments in the call above are not needed. Also important to note is that just as any other AWS service, SimpleDB is region-specific and as such you might want to specify which region to connect to, by default, it’ll connect to the US-EAST-1 region.

Creating Domains

Arguably, once you have your connection established, you’ll want to create one or more dmains. Creating new domains is a fairly straight forward operation. To do so, you can proceed as follows:

>>> conn.create_domain('test-domain')
Domain:test-domain
>>>
>>> conn.create_domain('test-domain-2')
Domain:test-domain
>>>

Please note that SimpleDB, unlike its newest sibling DynamoDB, is truly and completely schema-less. Thus, there’s no need specify domain keys or ranges.

Listing All Domains

Unlike DynamoDB or other database systems, SimpleDB uses the concept of ‘domains’ instead of tables. So, to list all your domains for your account in a region, you can simply do as follows:

>>> domains = conn.get_all_domains()
>>> domains
[Domain:test-domain, Domain:test-domain-2]
>>>

The get_all_domains() method returns a boto.resultset.ResultSet containing all boto.sdb.domain.Domain objects associated with this connection’s Access Key ID for that region.

Retrieving a Domain (by name)

If you wish to retrieve a specific domain whose name is known, you can do so as follows:

>>> dom = conn.get_domain('test-domain')
>>> dom
Domain:test-domain
>>>

The get_domain call has an optional validate parameter, which defaults to True. This will make sure to raise an exception if the domain you are looking for doesn’t exist. If you set it to false, it will return a Domain object blindly regardless of its existence.

Getting Domain Metadata

There are times when you might want to know your domains’ machine usage, aprox. item count and other such data. To this end, boto offers a simple and convenient way to do so as shown below:

>>> domain_meta = conn.domain_metadata(dom)
>>> domain_meta
<boto.sdb.domain.DomainMetaData instance at 0x23cd440>
>>> dir(domain_meta)
['BoxUsage', 'DomainMetadataResponse', 'DomainMetadataResult', 'RequestId', 'ResponseMetadata',
'__doc__', '__init__', '__module__', 'attr_name_count', 'attr_names_size', 'attr_value_count', 'attr_values_size',
'domain', 'endElement', 'item_count', 'item_names_size', 'startElement', 'timestamp']
>>> domain_meta.item_count
0
>>>

Please bear in mind that while in the example above we used a previously retrieved domain object as the parameter, you can retrieve the domain metadata via its name (string).

Adding Items (and attributes)

Once you have your domain setup, presumably, you’ll want to start adding items to it. In its most straight forward form, you need to provide a name for the item – think of it as a record id – and a collection of the attributes you want to store in the item (often a Dictionary-like object). So, adding an item to a domain looks as follows:

>>> item_name = 'ABC_123'
>>> item_attrs = {'Artist': 'The Jackson 5', 'Genera':'Pop'}
>>> dom.put_attributes(item_name, item_attrs)
True
>>>

Now let’s check if it worked:

>>> domain_meta = conn.domain_metadata(dom)
>>> domain_meta.item_count
1
>>>

Batch Adding Items (and attributes)

You can also add a number of items at the same time in a similar fashion. All you have to provide to the batch_put_attributes() method is a Dictionary-like object with your items and their respective attributes, as follows:

>>> items = {'item1':{'attr1':'val1'},'item2':{'attr2':'val2'}}
>>> dom.batch_put_attributes(items)
True
>>>

Now, let’s check the item count once again:

>>> domain_meta = conn.domain_metadata(dom)
>>> domain_meta.item_count
3
>>>

A few words of warning: both batch_put_attributes() and put_item(), by default, will overwrite the values of the attributes if both the item and attribute already exist. If the item exists, but not the attributes, it will append the new attributes to the attribute list of that item. If you do not wish these methods to behave in that manner, simply supply them with a ‘replace=False’ parameter.

Retrieving Items

To retrieve an item along with its attributes is a fairly straight forward operation and can be accomplished as follows:

>>> dom.get_item('item1')
{u'attr1': u'val1'}
>>>

Since SimpleDB works in an “eventual consistency” manner, we can also request a forced consistent read (though this will invariably adversely affect read performance). The way to accomplish that is as shown below:

>>> dom.get_item('item1', consistent_read=True)
{u'attr1': u'val1'}
>>>

Retrieving One or More Items

Another way to retrieve items is through boto’s select() method. This method, at the bare minimum, requires a standard SQL select query string and you would do something along the lines of:

>>> query = 'select * from `test-domain` where attr1="val1"'
>>> rs = dom.select(query)
>>> for j in rs:
...   print 'o hai'
...
o hai
>>>

This method returns a ResultSet collection you can iterate over.

Updating Item Attributes

The easiest way to modify an item’s attributes is by manipulating the item’s attributes and then saving those changes. For example:

>>> item = dom.get_item('item1')
>>> item['attr1'] = 'val_changed'
>>> item.save()

Deleting Items (and its attributes)

Deleting an item is a very simple operation. All you are required to provide is either the name of the item or an item object to the delete_item() method, boto will take care of the rest:

>>>dom.delete_item(item)
>>>True

Deleting Domains

To delete a domain and all items under it (i.e. be very careful), you can do it as follows:

>>> conn.delete_domain('test-domain')
True
>>>

An Introduction to boto’s DynamoDB interface

This tutorial focuses on the boto interface to AWS’ DynamoDB. This tutorial assumes that you have boto already downloaded and installed.

Warning

This tutorial covers the ORIGINAL release of DynamoDB. It has since been supplanted by a second major version & an updated API to talk to the new version. The documentation for the new version of DynamoDB (& boto’s support for it) is at DynamoDB v2.

Creating a Connection

The first step in accessing DynamoDB is to create a connection to the service. To do so, the most straight forward way is the following:

>>> import boto.dynamodb
>>> conn = boto.dynamodb.connect_to_region(
        'us-west-2',
        aws_access_key_id='<YOUR_AWS_KEY_ID>',
        aws_secret_access_key='<YOUR_AWS_SECRET_KEY>')
>>> conn
<boto.dynamodb.layer2.Layer2 object at 0x3fb3090>

Bear in mind that if you have your credentials in boto config in your home directory, the two keyword arguments in the call above are not needed. More details on configuration can be found in Boto Config.

The boto.dynamodb.connect_to_region() function returns a boto.dynamodb.layer2.Layer2 instance, which is a high-level API for working with DynamoDB. Layer2 is a set of abstractions that sit atop the lower level boto.dynamodb.layer1.Layer1 API, which closely mirrors the Amazon DynamoDB API. For the purpose of this tutorial, we’ll just be covering Layer2.

Listing Tables

Now that we have a DynamoDB connection object, we can then query for a list of existing tables in that region:

>>> conn.list_tables()
['test-table', 'another-table']

Creating Tables

DynamoDB tables are created with the Layer2.create_table method. While DynamoDB’s items (a rough equivalent to a relational DB’s row) don’t have a fixed schema, you do need to create a schema for the table’s hash key element, and the optional range key element. This is explained in greater detail in DynamoDB’s Data Model documentation.

We’ll start by defining a schema that has a hash key and a range key that are both strings:

>>> message_table_schema = conn.create_schema(
        hash_key_name='forum_name',
        hash_key_proto_value=str,
        range_key_name='subject',
        range_key_proto_value=str
    )

The next few things to determine are table name and read/write throughput. We’ll defer explaining throughput to the DynamoDB’s Provisioned Throughput docs.

We’re now ready to create the table:

>>> table = conn.create_table(
        name='messages',
        schema=message_table_schema,
        read_units=10,
        write_units=10
    )
>>> table
Table(messages)

This returns a boto.dynamodb.table.Table instance, which provides simple ways to create (put), update, and delete items.

Getting a Table

To retrieve an existing table, use Layer2.get_table:

>>> conn.list_tables()
['test-table', 'another-table', 'messages']
>>> table = conn.get_table('messages')
>>> table
Table(messages)

Layer2.get_table, like Layer2.create_table, returns a boto.dynamodb.table.Table instance.

Keep in mind that Layer2.get_table will make an API call to retrieve various attributes of the table including the creation time, the read and write capacity, and the table schema. If you already know the schema, you can save an API call and create a boto.dynamodb.table.Table object without making any calls to Amazon DynamoDB:

>>> table = conn.table_from_schema(
    name='messages',
    schema=message_table_schema)

If you do this, the following fields will have None values:

  • create_time
  • status
  • read_units
  • write_units

In addition, the item_count and size_bytes will be 0. If you create a table object directly from a schema object and decide later that you need to retrieve any of these additional attributes, you can use the Table.refresh method:

>>> from boto.dynamodb.schema import Schema
>>> table = conn.table_from_schema(
        name='messages',
        schema=Schema.create(hash_key=('forum_name', 'S'),
                             range_key=('subject', 'S')))
>>> print table.write_units
None
>>> # Now we decide we need to know the write_units:
>>> table.refresh()
>>> print table.write_units
10

The recommended best practice is to retrieve a table object once and use that object for the duration of your application. So, for example, instead of this:

class Application(object):
    def __init__(self, layer2):
        self._layer2 = layer2

    def retrieve_item(self, table_name, key):
        return self._layer2.get_table(table_name).get_item(key)

You can do something like this instead:

class Application(object):
    def __init__(self, layer2):
        self._layer2 = layer2
        self._tables_by_name = {}

    def retrieve_item(self, table_name, key):
        table = self._tables_by_name.get(table_name)
        if table is None:
            table = self._layer2.get_table(table_name)
            self._tables_by_name[table_name] = table
        return table.get_item(key)

Describing Tables

To get a complete description of a table, use Layer2.describe_table:

>>> conn.list_tables()
['test-table', 'another-table', 'messages']
>>> conn.describe_table('messages')
{
    'Table': {
        'CreationDateTime': 1327117581.624,
        'ItemCount': 0,
        'KeySchema': {
            'HashKeyElement': {
                'AttributeName': 'forum_name',
                'AttributeType': 'S'
            },
            'RangeKeyElement': {
                'AttributeName': 'subject',
                'AttributeType': 'S'
            }
        },
        'ProvisionedThroughput': {
            'ReadCapacityUnits': 10,
            'WriteCapacityUnits': 10
        },
        'TableName': 'messages',
        'TableSizeBytes': 0,
        'TableStatus': 'ACTIVE'
    }
}

Adding Items

Continuing on with our previously created messages table, adding an:

>>> table = conn.get_table('messages')
>>> item_data = {
        'Body': 'http://url_to_lolcat.gif',
        'SentBy': 'User A',
        'ReceivedTime': '12/9/2011 11:36:03 PM',
    }
>>> item = table.new_item(
        # Our hash key is 'forum'
        hash_key='LOLCat Forum',
        # Our range key is 'subject'
        range_key='Check this out!',
        # This has the
        attrs=item_data
    )

The Table.new_item method creates a new boto.dynamodb.item.Item instance with your specified hash key, range key, and attributes already set. Item is a dict sub-class, meaning you can edit your data as such:

item['a_new_key'] = 'testing'
del item['a_new_key']

After you are happy with the contents of the item, use Item.put to commit it to DynamoDB:

>>> item.put()

Retrieving Items

Now, let’s check if it got added correctly. Since DynamoDB works under an ‘eventual consistency’ mode, we need to specify that we wish a consistent read, as follows:

>>> table = conn.get_table('messages')
>>> item = table.get_item(
        # Your hash key was 'forum_name'
        hash_key='LOLCat Forum',
        # Your range key was 'subject'
        range_key='Check this out!'
    )
>>> item
{
    # Note that this was your hash key attribute (forum_name)
    'forum_name': 'LOLCat Forum',
    # This is your range key attribute (subject)
    'subject': 'Check this out!'
    'Body': 'http://url_to_lolcat.gif',
    'ReceivedTime': '12/9/2011 11:36:03 PM',
    'SentBy': 'User A',
}

Updating Items

To update an item’s attributes, simply retrieve it, modify the value, then Item.put it again:

>>> table = conn.get_table('messages')
>>> item = table.get_item(
        hash_key='LOLCat Forum',
        range_key='Check this out!'
    )
>>> item['SentBy'] = 'User B'
>>> item.put()

Working with Decimals

To avoid the loss of precision, you can stipulate that the decimal.Decimal type be used for numeric values:

>>> import decimal
>>> conn.use_decimals()
>>> table = conn.get_table('messages')
>>> item = table.new_item(
        hash_key='LOLCat Forum',
        range_key='Check this out!'
    )
>>> item['decimal_type'] = decimal.Decimal('1.12345678912345')
>>> item.put()
>>> print table.get_item('LOLCat Forum', 'Check this out!')
{u'forum_name': 'LOLCat Forum', u'decimal_type': Decimal('1.12345678912345'),
 u'subject': 'Check this out!'}

You can enable the usage of decimal.Decimal by using either the use_decimals method, or by passing in the Dynamizer class for the dynamizer param:

>>> from boto.dynamodb.types import Dynamizer
>>> conn = boto.dynamodb.connect_to_region(dynamizer=Dynamizer)

This mechanism can also be used if you want to customize the encoding/decoding process of DynamoDB types.

Deleting Items

To delete items, use the Item.delete method:

>>> table = conn.get_table('messages')
>>> item = table.get_item(
        hash_key='LOLCat Forum',
        range_key='Check this out!'
    )
>>> item.delete()

Deleting Tables

Warning

Deleting a table will also permanently delete all of its contents without prompt. Use carefully.

There are two easy ways to delete a table. Through your top-level Layer2 object:

>>> conn.delete_table(table)

Or by getting the table, then using Table.delete:

>>> table = conn.get_table('messages')
>>> table.delete()

An Introduction to boto’s RDS interface

This tutorial focuses on the boto interface to the Relational Database Service from Amazon Web Services. This tutorial assumes that you have boto already downloaded and installed, and that you wish to setup a MySQL instance in RDS.

Warning

This tutorial covers the ORIGINAL module for RDS. It has since been supplanted by a second major version & an updated API complete with all service operations. The documentation for the new version of boto’s support for RDS is at RDS v2.

Creating a Connection

The first step in accessing RDS is to create a connection to the service. The recommended method of doing this is as follows:

>>> import boto.rds
>>> conn = boto.rds.connect_to_region(
...     "us-west-2",
...     aws_access_key_id='<aws access key'>,
...     aws_secret_access_key='<aws secret key>')

At this point the variable conn will point to an RDSConnection object in the US-WEST-2 region. Bear in mind that just as any other AWS service, RDS is region-specific. In this example, the AWS access key and AWS secret key are passed in to the method explicitly. Alternatively, you can set the environment variables:

  • AWS_ACCESS_KEY_ID - Your AWS Access Key ID
  • AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key

and then simply call:

>>> import boto.rds
>>> conn = boto.rds.connect_to_region("us-west-2")

In either case, conn will point to an RDSConnection object which we will use throughout the remainder of this tutorial.

Starting an RDS Instance

Creating a DB instance is easy. You can do so as follows:

>>> db = conn.create_dbinstance("db-master-1", 10, 'db.m1.small', 'root', 'hunter2')

This example would create a DB identified as db-master-1 with 10GB of storage. This instance would be running on db.m1.small type, with the login name being root, and the password hunter2.

To check on the status of your RDS instance, you will have to query the RDS connection again:

>>> instances = conn.get_all_dbinstances("db-master-1")
>>> instances
[DBInstance:db-master-1]
>>> db = instances[0]
>>> db.status
u'available'
>>> db.endpoint
(u'db-master-1.aaaaaaaaaa.us-west-2.rds.amazonaws.com', 3306)

Creating a Security Group

Before you can actually connect to this RDS service, you must first create a security group. You can add a CIDR range or an EC2 security group to your DB security group

>>> sg = conn.create_dbsecurity_group('web_servers', 'Web front-ends')
>>> sg.authorize(cidr_ip='10.3.2.45/32')
True

You can then associate this security group with your RDS instance:

>>> db.modify(security_groups=[sg])

Connecting to your New Database

Once you have reached this step, you can connect to your RDS instance as you would with any other MySQL instance:

>>> db.endpoint
(u'db-master-1.aaaaaaaaaa.us-west-2.rds.amazonaws.com', 3306)

% mysql -h db-master-1.aaaaaaaaaa.us-west-2.rds.amazonaws.com -u root -phunter2
mysql>

Making a backup

You can also create snapshots of your database very easily:

>>> db.snapshot('db-master-1-2013-02-05')
DBSnapshot:db-master-1-2013-02-05

Once this snapshot is complete, you can create a new database instance from it:

>>> db2 = conn.restore_dbinstance_from_dbsnapshot(
...    'db-master-1-2013-02-05',
...    'db-restored-1',
...    'db.m1.small',
...    'us-west-2')

An Introduction to boto’s SQS interface

This tutorial focuses on the boto interface to the Simple Queue Service from Amazon Web Services. This tutorial assumes that you have boto already downloaded and installed.

Creating a Connection

The first step in accessing SQS is to create a connection to the service. The recommended method of doing this is as follows:

>>> import boto.sqs
>>> conn = boto.sqs.connect_to_region(
...     "us-west-2",
...     aws_access_key_id='<aws access key>',
...     aws_secret_access_key='<aws secret key>')

At this point the variable conn will point to an SQSConnection object in the US-WEST-2 region. Bear in mind that just as any other AWS service, SQS is region-specific. In this example, the AWS access key and AWS secret key are passed in to the method explicitly. Alternatively, you can set the environment variables:

  • AWS_ACCESS_KEY_ID - Your AWS Access Key ID
  • AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key

and then simply call:

>>> import boto.sqs
>>> conn = boto.sqs.connect_to_region("us-west-2")

In either case, conn will point to an SQSConnection object which we will use throughout the remainder of this tutorial.

Creating a Queue

Once you have a connection established with SQS, you will probably want to create a queue. In its simplest form, that can be accomplished as follows:

>>> q = conn.create_queue('myqueue')

The create_queue method will create (and return) the requested queue if it does not exist or will return the existing queue if it does. There is an optional parameter to create_queue called visibility_timeout. This basically controls how long a message will remain invisible to other queue readers once it has been read (see SQS documentation for more detailed explanation). If this is not explicitly specified the queue will be created with whatever default value SQS provides (currently 30 seconds). If you would like to specify another value, you could do so like this:

>>> q = conn.create_queue('myqueue', 120)

This would establish a default visibility timeout for this queue of 120 seconds. As you will see later on, this default value for the queue can also be overridden each time a message is read from the queue. If you want to check what the default visibility timeout is for a queue:

>>> q.get_timeout()
30

Listing all Queues

To retrieve a list of the queues for your account in the current region:

>>> conn.get_all_queues()
[
    Queue(https://queue.amazonaws.com/411358162645/myqueue),
    Queue(https://queue.amazonaws.com/411358162645/another_queue),
    Queue(https://queue.amazonaws.com/411358162645/another_queue2)
]

This will leave you with a list of all of your boto.sqs.queue.Queue instances. Alternatively, if you wanted to only list the queues that started with 'another':

>>> conn.get_all_queues(prefix='another')
[
    Queue(https://queue.amazonaws.com/411358162645/another_queue),
    Queue(https://queue.amazonaws.com/411358162645/another_queue2)
]

Getting a Queue (by name)

If you wish to explicitly retrieve an existing queue and the name of the queue is known, you can retrieve the queue as follows:

>>> my_queue = conn.get_queue('myqueue')
Queue(https://queue.amazonaws.com/411358162645/myqueue)

This leaves you with a single boto.sqs.queue.Queue, which abstracts the SQS Queue named ‘myqueue’.

Writing Messages

Once you have a queue setup, presumably you will want to write some messages to it. SQS doesn’t care what kind of information you store in your messages or what format you use to store it. As long as the amount of data per message is less than or equal to 256Kb, SQS won’t complain.

So, first we need to create a Message object:

>>> from boto.sqs.message import Message
>>> m = Message()
>>> m.set_body('This is my first message.')
>>> q.write(m)

The write method will return the Message object. The id and md5 attribute of the Message object will be updated with the values of the message that was written to the queue.

Arbitrary message attributes can be defined by setting a simple dictionary of values on the message object:

>>> m = Message()
>>> m.message_attributes = {
...     "name1": {
...         "data_type": "String",
...         "string_value": "I am a string"
...     },
...     "name2": {
...         "data_type": "Number",
...         "string_value": "12"
...     }
... }

Note that by default, these arbitrary attributes are not returned when you request messages from a queue. Instead, you must request them via the message_attributes parameter (see below).

If the message cannot be written an SQSError exception will be raised.

Writing Messages (Custom Format)

The technique above will work only if you use boto’s default Message payload format; however, you may have a lot of specific requirements around the format of the message data. For example, you may want to store one big string or you might want to store something that looks more like RFC822 messages or you might want to store a binary payload such as pickled Python objects.

The way boto deals with this issue is to define a simple Message object that treats the message data as one big string which you can set and get. If that Message object meets your needs, you’re good to go. However, if you need to incorporate different behavior in your message or handle different types of data you can create your own Message class. You just need to register that class with the boto queue object so that it knows that, when you read a message from the queue, it should create one of your message objects rather than the default boto Message object. To register your message class, you would:

>>> import MyMessage
>>> q.set_message_class(MyMessage)
>>> m = MyMessage()
>>> m.set_body('This is my first message.')
>>> q.write(m)

where MyMessage is the class definition for your message class. Your message class should subclass the boto Message because there is a small bit of Python magic happening in the __setattr__ method of the boto Message class.

Reading Messages

So, now we have a message in our queue. How would we go about reading it? Here’s one way:

>>> rs = q.get_messages()
>>> len(rs)
1
>>> m = rs[0]
>>> m.get_body()
u'This is my first message'

The get_messages method also returns a ResultSet object as described above. In addition to the special attributes that we already talked about the ResultSet object also contains any results returned by the request. To get at the results you can treat the ResultSet as a sequence object (e.g. a list). We can check the length (how many results) and access particular items within the list using the slice notation familiar to Python programmers.

At this point, we have read the message from the queue and SQS will make sure that this message remains invisible to other readers of the queue until the visibility timeout period for the queue expires. If you delete the message before the timeout period expires then no one else will ever see the message again. However, if you don’t delete it (maybe because your reader crashed or failed in some way, for example) it will magically reappear in my queue for someone else to read. If you aren’t happy with the default visibility timeout defined for the queue, you can override it when you read a message:

>>> q.get_messages(visibility_timeout=60)

This means that regardless of what the default visibility timeout is for the queue, this message will remain invisible to other readers for 60 seconds.

The get_messages method can also return more than a single message. By passing a num_messages parameter (defaults to 1) you can control the maximum number of messages that will be returned by the method. To show this feature off, first let’s load up a few more messages.

>>> for i in range(1, 11):
...   m = Message()
...   m.set_body('This is message %d' % i)
...   q.write(m)
...
>>> rs = q.get_messages(10)
>>> len(rs)
10

Don’t be alarmed if the length of the result set returned by the get_messages call is less than 10. Sometimes it takes some time for new messages to become visible in the queue. Give it a minute or two and they will all show up.

If you want a slightly simpler way to read messages from a queue, you can use the read method. It will either return the message read or it will return None if no messages were available. You can also pass a visibility_timeout parameter to read, if you desire:

>>> m = q.read(60)
>>> m.get_body()
u'This is my first message'

Reading Message Attributes

By default, no arbitrary message attributes are returned when requesting messages. You can change this behavior by specifying the names of attributes you wish to have returned:

>>> rs = queue.get_messages(message_attributes=['name1', 'name2'])
>>> print rs[0].message_attributes['name1']['string_value']

‘I am a string’

A special value of All or .* may be passed to return all available message attributes.

Deleting Messages and Queues

As stated above, messages are never deleted by the queue unless explicitly told to do so. To remove a message from a queue:

>>> q.delete_message(m)
[]

If I want to delete the entire queue, I would use:

>>> conn.delete_queue(q)

This will delete the queue, even if there are still messages within the queue.

Additional Information

The above tutorial covers the basic operations of creating queues, writing messages, reading messages, deleting messages, and deleting queues. There are a few utility methods in boto that might be useful as well. For example, to count the number of messages in a queue:

>>> q.count()
10

Removing all messages in a queue is as simple as calling purge:

>>> q.purge()

Be REAL careful with that one! Finally, if you want to dump all of the messages in a queue to a local file:

>>> q.dump('messages.txt', sep='\n------------------\n')

This will read all of the messages in the queue and write the bodies of each of the messages to the file messages.txt. The optional sep argument is a separator that will be printed between each message body in the file.

Simple Email Service Tutorial

This tutorial focuses on the boto interface to AWS’ Simple Email Service (SES). This tutorial assumes that you have boto already downloaded and installed.

Creating a Connection

The first step in accessing SES is to create a connection to the service. To do so, the most straight forward way is the following:

>>> import boto.ses
>>> conn = boto.ses.connect_to_region(
        'us-west-2',
        aws_access_key_id='<YOUR_AWS_KEY_ID>',
        aws_secret_access_key='<YOUR_AWS_SECRET_KEY>')
>>> conn
SESConnection:email.us-west-2.amazonaws.com

Bear in mind that if you have your credentials in boto config in your home directory, the two keyword arguments in the call above are not needed. More details on configuration can be found in Boto Config.

The boto.ses.connect_to_region() functions returns a boto.ses.connection.SESConnection instance, which is the boto API for working with SES.

Notes on Sending

It is important to keep in mind that while emails appear to come “from” the address that you specify via Reply-To, the sending is done through Amazon. Some clients do pick up on this disparity, and leave a note on emails.

Verifying a Sender Email Address

Before you can send email “from” an address, you must prove that you have access to the account. When you send a validation request, an email is sent to the address with a link in it. Clicking on the link validates the address and adds it to your SES account. Here’s how to send the validation email:

>>> conn.verify_email_address('some@address.com')
{
    'VerifyEmailAddressResponse': {
        'ResponseMetadata': {
            'RequestId': '4a974fd5-56c2-11e1-ad4c-c1f08c91d554'
        }
    }
}

After a short amount of time, you’ll find an email with the validation link inside. Click it, and this address may be used to send emails.

Listing Verified Addresses

If you’d like to list the addresses that are currently verified on your SES account, use list_verified_email_addresses:

>>> conn.list_verified_email_addresses()
{
    'ListVerifiedEmailAddressesResponse': {
        'ListVerifiedEmailAddressesResult': {
            'VerifiedEmailAddresses': [
                'some@address.com',
                'another@address.com'
            ]
        },
        'ResponseMetadata': {
            'RequestId': '2ab45c18-56c3-11e1-be66-ffd2a4549d70'
        }
    }
}

Deleting a Verified Address

In the event that you’d like to remove an email address from your account, use delete_verified_email_address:

>>> conn.delete_verified_email_address('another@address.com')

Sending an Email

Sending an email is done via send_email:

>>> conn.send_email(
        'some@address.com',
        'Your subject',
        'Body here',
        ['recipient-address-1@gmail.com'])
{
    'SendEmailResponse': {
        'ResponseMetadata': {
            'RequestId': '4743c2b7-56c3-11e1-bccd-c99bd68002fd'
        },
        'SendEmailResult': {
            'MessageId': '000001357a177192-7b894025-147a-4705-8455-7c880b0c8270-000000'
        }
    }
}

If you’re wanting to send a multipart MIME email, see the reference for send_raw_email, which is a bit more of a low-level alternative.

Checking your Send Quota

Staying within your quota is critical, since the upper limit is a hard cap. Once you have hit your quota, no further email may be sent until enough time elapses to where your 24 hour email count (rolling continuously) is within acceptable ranges. Use get_send_quota:

>>> conn.get_send_quota()
{
    'GetSendQuotaResponse': {
        'GetSendQuotaResult': {
            'Max24HourSend': '100000.0',
            'SentLast24Hours': '181.0',
            'MaxSendRate': '28.0'
        },
        'ResponseMetadata': {
            'RequestId': u'8a629245-56c4-11e1-9c53-9d5f4d2cc8d3'
        }
    }
}

Checking your Send Statistics

In order to fight spammers and ensure quality mail is being sent from SES, Amazon tracks bounces, rejections, and complaints. This is done via get_send_statistics. Please be warned that the output is extremely verbose, to the point where we’ll just show a short excerpt here:

>>> conn.get_send_statistics()
{
    'GetSendStatisticsResponse': {
        'GetSendStatisticsResult': {
            'SendDataPoints': [
                {
                    'Complaints': '0',
                    'Timestamp': '2012-02-13T05:02:00Z',
                    'DeliveryAttempts': '8',
                    'Bounces': '0',
                    'Rejects': '0'
                },
                {
                    'Complaints': '0',
                    'Timestamp': '2012-02-13T05:17:00Z',
                    'DeliveryAttempts': '12',
                    'Bounces': '0',
                    'Rejects': '0'
                }
            ]
        }
    }
}

Amazon Simple Workflow Tutorial

This tutorial focuses on boto’s interface to AWS SimpleWorkflow service.

What is a workflow?

A workflow is a sequence of multiple activities aimed at accomplishing a well-defined objective. For instance, booking an airline ticket as a workflow may encompass multiple activities, such as selection of itinerary, submission of personal details, payment validation and booking confirmation.

Except for the start and completion of a workflow, each step has a well-defined predecessor and successor. With that
  • on successful completion of an activity the workflow can progress with its execution,
  • when one of workflow’s activities fails it can be retried,
  • and when it keeps failing repeatedly the workflow may regress to the previous step to gather alternative inputs or it may simply fail at that stage.

Why use workflows?

Modelling an application on a workflow provides a useful abstraction layer for writing highly-reliable programs for distributed systems, as individual responsibilities can be delegated to a set of redundant, independent and non-critical processing units.

How does Amazon SWF help you accomplish this?

Amazon SimpleWorkflow service defines an interface for workflow orchestration and provides state persistence for workflow executions.

Amazon SWF applications involve communication between the following entities:
  • The Amazon Simple Workflow Service - providing centralized orchestration and workflow state persistence,
  • Workflow Executors - some entity starting workflow executions, typically through an action taken by a user or from a cronjob.
  • Deciders - a program codifying the business logic, i.e. a set of instructions and decisions. Deciders take decisions based on initial set of conditions and outcomes from activities.
  • Activity Workers - their objective is very straightforward: to take inputs, execute the tasks and return a result to the Service.

The Workflow Executor contacts SWF Service and requests instantiation of a workflow. A new workflow is created and its state is stored in the service. The next time a decider contacts SWF service to ask for a decision task, it will be informed about a new workflow execution is taking place and it will be asked to advise SWF service on what the next steps should be. The decider then instructs the service to dispatch specific tasks to activity workers. At the next activity worker poll, the task is dispatched, then executed and the results reported back to the SWF, which then passes them onto the deciders. This exchange keeps happening repeatedly until the decider is satisfied and instructs the service to complete the execution.

Prerequisites

You need a valid access and secret key. The examples below assume that you have exported them to your environment, as follows:

bash$ export AWS_ACCESS_KEY_ID=<your access key>
bash$ export AWS_SECRET_ACCESS_KEY=<your secret key>

Before workflows and activities can be used, they have to be registered with SWF service:

# register.py
import boto.swf.layer2 as swf
from boto.swf.exceptions import SWFTypeAlreadyExistsError, SWFDomainAlreadyExistsError
DOMAIN = 'boto_tutorial'
VERSION = '1.0'

registerables = []
registerables.append(swf.Domain(name=DOMAIN))
for workflow_type in ('HelloWorkflow', 'SerialWorkflow', 'ParallelWorkflow', 'SubWorkflow'):
    registerables.append(swf.WorkflowType(domain=DOMAIN, name=workflow_type, version=VERSION, task_list='default'))

for activity_type in ('HelloWorld', 'ActivityA', 'ActivityB', 'ActivityC'):
    registerables.append(swf.ActivityType(domain=DOMAIN, name=activity_type, version=VERSION, task_list='default'))

for swf_entity in registerables:
    try:
        swf_entity.register()
        print swf_entity.name, 'registered successfully'
    except (SWFDomainAlreadyExistsError, SWFTypeAlreadyExistsError):
        print swf_entity.__class__.__name__, swf_entity.name, 'already exists'

Execution of the above should produce no errors.

bash$ python -i register.py
Domain boto_tutorial already exists
WorkflowType HelloWorkflow already exists
SerialWorkflow registered successfully
ParallelWorkflow registered successfully
ActivityType HelloWorld already exists
ActivityA registered successfully
ActivityB registered successfully
ActivityC registered successfully
>>>

HelloWorld

This example is an implementation of a minimal Hello World workflow. Its execution should unfold as follows:

  1. A workflow execution is started.
  2. The SWF service schedules the initial decision task.
  3. A decider polls for decision tasks and receives one.
  4. The decider requests scheduling of an activity task.
  5. The SWF service schedules the greeting activity task.
  6. An activity worker polls for activity task and receives one.
  7. The worker completes the greeting activity.
  8. The SWF service schedules a decision task to inform about work outcome.
  9. The decider polls and receives a new decision task.
  10. The decider schedules workflow completion.
  11. The workflow execution finishes.

Workflow logic is encoded in the decider:

# hello_decider.py
import boto.swf.layer2 as swf

DOMAIN = 'boto_tutorial'
ACTIVITY = 'HelloWorld'
VERSION = '1.0'
TASKLIST = 'default'

class HelloDecider(swf.Decider):

    domain = DOMAIN
    task_list = TASKLIST
    version = VERSION

    def run(self):
        history = self.poll()
        if 'events' in history:
            # Find workflow events not related to decision scheduling.
            workflow_events = [e for e in history['events']
                if not e['eventType'].startswith('Decision')]
            last_event = workflow_events[-1]

            decisions = swf.Layer1Decisions()
            if last_event['eventType'] == 'WorkflowExecutionStarted':
                decisions.schedule_activity_task('saying_hi', ACTIVITY, VERSION, task_list=TASKLIST)
            elif last_event['eventType'] == 'ActivityTaskCompleted':
                decisions.complete_workflow_execution()
            self.complete(decisions=decisions)
            return True

The activity worker is responsible for printing the greeting message when the activity task is dispatched to it by the service:

import boto.swf.layer2 as swf

DOMAIN = 'boto_tutorial'
VERSION = '1.0'
TASKLIST = 'default'

class HelloWorker(swf.ActivityWorker):

    domain = DOMAIN
    version = VERSION
    task_list = TASKLIST

    def run(self):
        activity_task = self.poll()
        if 'activityId' in activity_task:
            print 'Hello, World!'
            self.complete()
            return True

With actors implemented we can spin up a workflow execution:

$ python
>>> import boto.swf.layer2 as swf
>>> execution = swf.WorkflowType(name='HelloWorkflow', domain='boto_tutorial', version='1.0', task_list='default').start()
>>>

From separate terminals run an instance of a worker and a decider to carry out a workflow execution (the worker and decider may run from two independent machines).

$ python -i hello_decider.py
>>> while HelloDecider().run(): pass
...
$ python -i hello_worker.py
>>> while HelloWorker().run(): pass
...
Hello, World!

Great. Now, to see what just happened, go back to the original terminal from which the execution was started, and read its history.

>>> execution.history()
[{'eventId': 1,
  'eventTimestamp': 1381095173.2539999,
  'eventType': 'WorkflowExecutionStarted',
  'workflowExecutionStartedEventAttributes': {'childPolicy': 'TERMINATE',
                                              'executionStartToCloseTimeout': '3600',
                                              'parentInitiatedEventId': 0,
                                              'taskList': {'name': 'default'},
                                              'taskStartToCloseTimeout': '300',
                                              'workflowType': {'name': 'HelloWorkflow',
                                                               'version': '1.0'}}},
 {'decisionTaskScheduledEventAttributes': {'startToCloseTimeout': '300',
                                           'taskList': {'name': 'default'}},
  'eventId': 2,
  'eventTimestamp': 1381095173.2539999,
  'eventType': 'DecisionTaskScheduled'},
 {'decisionTaskStartedEventAttributes': {'scheduledEventId': 2},
  'eventId': 3,
  'eventTimestamp': 1381095177.5439999,
  'eventType': 'DecisionTaskStarted'},
 {'decisionTaskCompletedEventAttributes': {'scheduledEventId': 2,
                                           'startedEventId': 3},
  'eventId': 4,
  'eventTimestamp': 1381095177.855,
  'eventType': 'DecisionTaskCompleted'},
 {'activityTaskScheduledEventAttributes': {'activityId': 'saying_hi',
                                           'activityType': {'name': 'HelloWorld',
                                                            'version': '1.0'},
                                           'decisionTaskCompletedEventId': 4,
                                           'heartbeatTimeout': '600',
                                           'scheduleToCloseTimeout': '3900',
                                           'scheduleToStartTimeout': '300',
                                           'startToCloseTimeout': '3600',
                                           'taskList': {'name': 'default'}},
  'eventId': 5,
  'eventTimestamp': 1381095177.855,
  'eventType': 'ActivityTaskScheduled'},
 {'activityTaskStartedEventAttributes': {'scheduledEventId': 5},
  'eventId': 6,
  'eventTimestamp': 1381095179.427,
  'eventType': 'ActivityTaskStarted'},
 {'activityTaskCompletedEventAttributes': {'scheduledEventId': 5,
                                           'startedEventId': 6},
  'eventId': 7,
  'eventTimestamp': 1381095179.6989999,
  'eventType': 'ActivityTaskCompleted'},
 {'decisionTaskScheduledEventAttributes': {'startToCloseTimeout': '300',
                                           'taskList': {'name': 'default'}},
  'eventId': 8,
  'eventTimestamp': 1381095179.6989999,
  'eventType': 'DecisionTaskScheduled'},
 {'decisionTaskStartedEventAttributes': {'scheduledEventId': 8},
  'eventId': 9,
  'eventTimestamp': 1381095179.7420001,
  'eventType': 'DecisionTaskStarted'},
 {'decisionTaskCompletedEventAttributes': {'scheduledEventId': 8,
                                           'startedEventId': 9},
  'eventId': 10,
  'eventTimestamp': 1381095180.026,
  'eventType': 'DecisionTaskCompleted'},
 {'eventId': 11,
  'eventTimestamp': 1381095180.026,
  'eventType': 'WorkflowExecutionCompleted',
  'workflowExecutionCompletedEventAttributes': {'decisionTaskCompletedEventId': 10}}]

Serial Activity Execution

The following example implements a basic workflow with activities executed one after another.

The business logic, i.e. the serial execution of activities, is encoded in the decider:

# serial_decider.py
import time
import boto.swf.layer2 as swf

class SerialDecider(swf.Decider):

    domain = 'boto_tutorial'
    task_list = 'default_tasks'
    version = '1.0'

    def run(self):
        history = self.poll()
        if 'events' in history:
            # Get a list of non-decision events to see what event came in last.
            workflow_events = [e for e in history['events']
                               if not e['eventType'].startswith('Decision')]
            decisions = swf.Layer1Decisions()
            # Record latest non-decision event.
            last_event = workflow_events[-1]
            last_event_type = last_event['eventType']
            if last_event_type == 'WorkflowExecutionStarted':
                # Schedule the first activity.
                decisions.schedule_activity_task('%s-%i' % ('ActivityA', time.time()),
                   'ActivityA', self.version, task_list='a_tasks')
            elif last_event_type == 'ActivityTaskCompleted':
                # Take decision based on the name of activity that has just completed.
                # 1) Get activity's event id.
                last_event_attrs = last_event['activityTaskCompletedEventAttributes']
                completed_activity_id = last_event_attrs['scheduledEventId'] - 1
                # 2) Extract its name.
                activity_data = history['events'][completed_activity_id]
                activity_attrs = activity_data['activityTaskScheduledEventAttributes']
                activity_name = activity_attrs['activityType']['name']
                # 3) Optionally, get the result from the activity.
                result = last_event['activityTaskCompletedEventAttributes'].get('result')

                # Take the decision.
                if activity_name == 'ActivityA':
                    decisions.schedule_activity_task('%s-%i' % ('ActivityB', time.time()),
                        'ActivityB', self.version, task_list='b_tasks', input=result)
                if activity_name == 'ActivityB':
                    decisions.schedule_activity_task('%s-%i' % ('ActivityC', time.time()),
                        'ActivityC', self.version, task_list='c_tasks', input=result)
                elif activity_name == 'ActivityC':
                    # Final activity completed. We're done.
                    decisions.complete_workflow_execution()

            self.complete(decisions=decisions)
            return True

The workers only need to know which task lists to poll.

# serial_worker.py
import time
import boto.swf.layer2 as swf

class MyBaseWorker(swf.ActivityWorker):

    domain = 'boto_tutorial'
    version = '1.0'
    task_list = None

    def run(self):
        activity_task = self.poll()
        if 'activityId' in activity_task:
            # Get input.
            # Get the method for the requested activity.
            try:
                print 'working on activity from tasklist %s at %i' % (self.task_list, time.time())
                self.activity(activity_task.get('input'))
            except Exception as error:
                self.fail(reason=str(error))
                raise error

            return True

    def activity(self, activity_input):
        raise NotImplementedError

class WorkerA(MyBaseWorker):
    task_list = 'a_tasks'
    def activity(self, activity_input):
        self.complete(result="Now don't be givin him sambuca!")

class WorkerB(MyBaseWorker):
    task_list = 'b_tasks'
    def activity(self, activity_input):
        self.complete()

class WorkerC(MyBaseWorker):
    task_list = 'c_tasks'
    def activity(self, activity_input):
        self.complete()

Spin up a workflow execution and run the decider:

$ python
>>> import boto.swf.layer2 as swf
>>> execution = swf.WorkflowType(name='SerialWorkflow', domain='boto_tutorial', version='1.0', task_list='default_tasks').start()
>>>
$ python -i serial_decider.py
>>> while SerialDecider().run(): pass
...

Run the workers. The activities will be executed in order:

$ python -i serial_worker.py
>>> while WorkerA().run(): pass
...
working on activity from tasklist a_tasks at 1382046291
$ python -i serial_worker.py
>>> while WorkerB().run(): pass
...
working on activity from tasklist b_tasks at 1382046541
$ python -i serial_worker.py
>>> while WorkerC().run(): pass
...
working on activity from tasklist c_tasks at 1382046560

Looks good. Now, do the following to inspect the state and history of the execution:

>>> execution.describe()
{'executionConfiguration': {'childPolicy': 'TERMINATE',
  'executionStartToCloseTimeout': '3600',
  'taskList': {'name': 'default_tasks'},
  'taskStartToCloseTimeout': '300'},
 'executionInfo': {'cancelRequested': False,
  'closeStatus': 'COMPLETED',
  'closeTimestamp': 1382046560.901,
  'execution': {'runId': '12fQ1zSaLmI5+lLXB8ux+8U+hLOnnXNZCY9Zy+ZvXgzhE=',
   'workflowId': 'SerialWorkflow-1.0-1382046514'},
  'executionStatus': 'CLOSED',
  'startTimestamp': 1382046514.994,
  'workflowType': {'name': 'SerialWorkflow', 'version': '1.0'}},
 'latestActivityTaskTimestamp': 1382046560.632,
 'openCounts': {'openActivityTasks': 0,
  'openChildWorkflowExecutions': 0,
  'openDecisionTasks': 0,
  'openTimers': 0}}
>>> execution.history()
...

Parallel Activity Execution

When activities are independent from one another, their execution may be scheduled in parallel.

The decider schedules all activities at once and marks progress until all activities are completed, at which point the workflow is completed.

# parallel_decider.py

import boto.swf.layer2 as swf
import time

SCHED_COUNT = 5

class ParallelDecider(swf.Decider):

    domain = 'boto_tutorial'
    task_list = 'default'
    def run(self):
        decision_task = self.poll()
        if 'events' in decision_task:
            decisions = swf.Layer1Decisions()
            # Decision* events are irrelevant here and can be ignored.
            workflow_events = [e for e in decision_task['events']
                               if not e['eventType'].startswith('Decision')]
            # Record latest non-decision event.
            last_event = workflow_events[-1]
            last_event_type = last_event['eventType']
            if last_event_type == 'WorkflowExecutionStarted':
                # At start, kickoff SCHED_COUNT activities in parallel.
                for i in range(SCHED_COUNT):
                    decisions.schedule_activity_task('activity%i' % i, 'ActivityA', '1.0',
                                                     task_list=self.task_list)
            elif last_event_type == 'ActivityTaskCompleted':
                # Monitor progress. When all activities complete, complete workflow.
                completed_count = sum([1 for a in decision_task['events']
                                       if a['eventType'] == 'ActivityTaskCompleted'])
                print '%i/%i' % (completed_count, SCHED_COUNT)
                if completed_count == SCHED_COUNT:
                    decisions.complete_workflow_execution()
            self.complete(decisions=decisions)
            return True

Again, the only bit of information a worker needs is which task list to poll.

# parallel_worker.py
import time
import boto.swf.layer2 as swf

class ParallelWorker(swf.ActivityWorker):

    domain = 'boto_tutorial'
    task_list = 'default'

    def run(self):
        """Report current time."""
        activity_task = self.poll()
        if 'activityId' in activity_task:
            print 'working on', activity_task['activityId']
            self.complete(result=str(time.time()))
            return True

Spin up a workflow execution and run the decider:

$ python -i parallel_decider.py
>>> execution = swf.WorkflowType(name='ParallelWorkflow', domain='boto_tutorial', version='1.0', task_list='default').start()
>>> while ParallelDecider().run(): pass
...
1/5
2/5
4/5
5/5

Run two or more workers to see how the service partitions work execution in parallel.

$ python -i parallel_worker.py
>>> while ParallelWorker().run(): pass
...
working on activity1
working on activity3
working on activity4
$ python -i parallel_worker.py
>>> while ParallelWorker().run(): pass
...
working on activity2
working on activity0

As seen above, the work was partitioned between the two running workers.

Sub-Workflows

Sometimes it’s desired or necessary to break the process up into multiple workflows.

Since the decider is stateless, it’s up to you to determine which workflow is being used and which action you would like to take.

import boto.swf.layer2 as swf

class SubWorkflowDecider(swf.Decider):

    domain = 'boto_tutorial'
    task_list = 'default'
    version = '1.0'

    def run(self):
        history = self.poll()
        events = []
        if 'events' in history:
            events = history['events']
            # Collect the entire history if there are enough events to become paginated
            while 'nextPageToken' in history:
                history = self.poll(next_page_token=history['nextPageToken'])
                if 'events' in history:
                    events = events + history['events']

            workflow_type = history['workflowType']['name']

            # Get all of the relevent events that have happened since the last decision task was started
            workflow_events = [e for e in events
                    if e['eventId'] > history['previousStartedEventId'] and
                    not e['eventType'].startswith('Decision')]

            decisions = swf.Layer1Decisions()

            for event in workflow_events:
                last_event_type = event['eventType']
                if last_event_type == 'WorkflowExecutionStarted':
                    if workflow_type == 'SerialWorkflow':
                        decisions.start_child_workflow_execution('SubWorkflow', self.version,
                            "subworkflow_1", task_list=self.task_list, input="sub_1")
                    elif workflow_type == 'SubWorkflow':
                        for i in range(2):
                            decisions.schedule_activity_task("activity_%d" % i, 'ActivityA', self.version, task_list='a_tasks')
                    else:
                        decisions.fail_workflow_execution(reason="Unknown workflow %s" % workflow_type)
                        break

                elif last_event_type == 'ChildWorkflowExecutionCompleted':
                    decisions.schedule_activity_task("activity_2", 'ActivityB', self.version, task_list='b_tasks')

                elif last_event_type == 'ActivityTaskCompleted':
                    attrs = event['activityTaskCompletedEventAttributes']
                    activity = events[attrs['scheduledEventId'] - 1]
                    activity_name = activity['activityTaskScheduledEventAttributes']['activityType']['name']

                    if activity_name == 'ActivityA':
                        completed_count = sum([1 for a in events if a['eventType'] == 'ActivityTaskCompleted'])
                        if completed_count == 2:
                            # Complete the child workflow
                            decisions.complete_workflow_execution()
                    elif activity_name == 'ActivityB':
                        # Complete the parent workflow
                        decisions.complete_workflow_execution()

            self.complete(decisions=decisions)
        return True

Misc

Some of these things are not obvious by reading the API documents, so hopefully they help you avoid some time-consuming pitfalls.

Decision Tasks

When first running deciders and activities, it may seem that the decider gets called for every event that an activity triggers; however, this is not the case. More than one event can happen between decision tasks. The decision task will contain a key previousStartedEventId that lets you know the eventId of the last DecisionTaskStarted event that was processed. Your script will need to handle all of the events that have happened since then, not just the last activity.

workflow_events = [e for e in events if e['eventId'] > decision_task['previousStartedEventId']]

You may also wish to still filter out tasks that start with ‘Decision’ or filter it in some other way that fulfills your needs. You will now have to iterate over the workflow_events list and respond to each event, as it may contain multiple events.

Filtering Events

When running many tasks in parallel, a common task is searching through the history to see how many events of a particular activity type started, completed, and/or failed. Some basic list comprehension makes this trivial.

def filter_completed_events(self, events, type):
    completed = [e for e in events if e['eventType'] == 'ActivityTaskCompleted']
    orig = [events[e['activityTaskCompletedEventAttributes']['scheduledEventId']-1] for e in completed]
    return [e for e in orig if e['activityTaskScheduledEventAttributes']['activityType']['name'] == type]

An Introduction to boto’s Cloudsearch interface

This tutorial focuses on the boto interface to AWS’ Cloudsearch. This tutorial assumes that you have boto already downloaded and installed.

Creating a Connection

The first step in accessing CloudSearch is to create a connection to the service.

The recommended method of doing this is as follows:

>>> import boto.cloudsearch
>>> conn = boto.cloudsearch.connect_to_region("us-west-2",
...             aws_access_key_id='<aws access key>',
...             aws_secret_access_key='<aws secret key>')

At this point, the variable conn will point to a CloudSearch connection object in the us-west-2 region. Available regions for cloudsearch can be found here. In this example, the AWS access key and AWS secret key are passed in to the method explicitly. Alternatively, you can set the environment variables:

  • AWS_ACCESS_KEY_ID - Your AWS Access Key ID
  • AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key

and then simply call:

>>> import boto.cloudsearch
>>> conn = boto.cloudsearch.connect_to_region("us-west-2")

In either case, conn will point to the Connection object which we will use throughout the remainder of this tutorial.

Creating a Domain

Once you have a connection established with the CloudSearch service, you will want to create a domain. A domain encapsulates the data that you wish to index, as well as indexes and metadata relating to it:

>>> from boto.cloudsearch.domain import Domain
>>> domain = Domain(conn, conn.create_domain('demo'))

This domain can be used to control access policies, indexes, and the actual document service, which you will use to index and search.

Setting access policies

Before you can connect to a document service, you need to set the correct access properties. For example, if you were connecting from 192.168.1.0, you could give yourself access as follows:

>>> our_ip = '192.168.1.0'

>>> # Allow our IP address to access the document and search services
>>> policy = domain.get_access_policies()
>>> policy.allow_search_ip(our_ip)
>>> policy.allow_doc_ip(our_ip)

You can use the allow_search_ip and allow_doc_ip methods to give different CIDR blocks access to searching and the document service respectively.

Creating index fields

Each domain can have up to twenty index fields which are indexed by the CloudSearch service. For each index field, you will need to specify whether it’s a text or integer field, as well as optionally a default value:

>>> # Create an 'text' index field called 'username'
>>> uname_field = domain.create_index_field('username', 'text')

>>> # Epoch time of when the user last did something
>>> time_field = domain.create_index_field('last_activity',
...                                        'uint',
...                                        default=0)

It is also possible to mark an index field as a facet. Doing so allows a search query to return categories into which results can be grouped, or to create drill-down categories:

>>> # But it would be neat to drill down into different countries
>>> loc_field = domain.create_index_field('location', 'text', facet=True)

Finally, you can also mark a snippet of text as being able to be returned directly in your search query by using the results option:

>>> # Directly insert user snippets in our results
>>> snippet_field = domain.create_index_field('snippet', 'text', result=True)

You can add up to 20 index fields in this manner:

>>> follower_field = domain.create_index_field('follower_count',
...                                            'uint',
...                                            default=0)

Adding Documents to the Index

Now, we can add some documents to our new search domain. First, you will need a document service object through which queries are sent:

>>> doc_service = domain.get_document_service()

For this example, we will use a pre-populated list of sample content for our import. You would normally pull such data from your database or another document store:

>>> users = [
    {
        'id': 1,
        'username': 'dan',
        'last_activity': 1334252740,
        'follower_count': 20,
        'location': 'USA',
        'snippet': 'Dan likes watching sunsets and rock climbing',
    },
    {
        'id': 2,
        'username': 'dankosaur',
        'last_activity': 1334252904,
        'follower_count': 1,
        'location': 'UK',
        'snippet': 'Likes to dress up as a dinosaur.',
    },
    {
        'id': 3,
        'username': 'danielle',
        'last_activity': 1334252969,
        'follower_count': 100,
        'location': 'DE',
        'snippet': 'Just moved to Germany!'
    },
    {
        'id': 4,
        'username': 'daniella',
        'last_activity': 1334253279,
        'follower_count': 7,
        'location': 'USA',
        'snippet': 'Just like Dan, I like to watch a good sunset, but heights scare me.',
    }
]

When adding documents to our document service, we will batch them together. You can schedule a document to be added by using the add method. Whenever you are adding a document, you must provide a unique ID, a version ID, and the actual document to be indexed. In this case, we are using the user ID as our unique ID. The version ID is used to determine which is the latest version of an object to be indexed. If you wish to update a document, you must use a higher version ID. In this case, we are using the time of the user’s last activity as a version number:

>>> for user in users:
>>>     doc_service.add(user['id'], user['last_activity'], user)

When you are ready to send the batched request to the document service, you can do with the commit method. Note that cloudsearch will charge per 1000 batch uploads. Each batch upload must be under 5MB:

>>> result = doc_service.commit()

The result is an instance of CommitResponse which will make the plain dictionary response a nice object (ie result.adds, result.deletes) and raise an exception for us if all of our documents weren’t actually committed.

If you wish to use the same document service connection after a commit, you must use clear_sdf to clear its internal cache.

Searching Documents

Now, let’s try performing a search. First, we will need a SearchServiceConnection:

>>> search_service = domain.get_search_service()

A standard search will return documents which contain the exact words being searched for:

>>> results = search_service.search(q="dan")
>>> results.hits
2
>>> map(lambda x: x['id'], results)
[u'1', u'4']

The standard search does not look at word order:

>>> results = search_service.search(q="dinosaur dress")
>>> results.hits
1
>>> map(lambda x: x['id'], results)
[u'2']

It’s also possible to do more complex queries using the bq argument (Boolean Query). When you are using bq, your search terms must be enclosed in single quotes:

>>> results = search_service.search(bq="'dan'")
>>> results.hits
2
>>> map(lambda x: x['id'], results)
[u'1', u'4']

When you are using boolean queries, it’s also possible to use wildcards to extend your search to all words which start with your search terms:

>>> results = search_service.search(bq="'dan*'")
>>> results.hits
4
>>> map(lambda x: x['id'], results)
[u'1', u'2', u'3', u'4']

The boolean query also allows you to create more complex queries. You can OR term together using “|”, AND terms together using “+” or a space, and you can remove words from the query using the “-” operator:

>>> results = search_service.search(bq="'watched|moved'")
>>> results.hits
2
>>> map(lambda x: x['id'], results)
[u'3', u'4']

By default, the search will return 10 terms but it is possible to adjust this by using the size argument as follows:

>>> results = search_service.search(bq="'dan*'", size=2)
>>> results.hits
4
>>> map(lambda x: x['id'], results)
[u'1', u'2']

It is also possible to offset the start of the search by using the start argument as follows:

>>> results = search_service.search(bq="'dan*'", start=2)
>>> results.hits
4
>>> map(lambda x: x['id'], results)
[u'3', u'4']

Ordering search results and rank expressions

If your search query is going to return many results, it is good to be able to sort them. You can order your search results by using the rank argument. You are able to sort on any fields which have the results option turned on:

>>> results = search_service.search(bq=query, rank=['-follower_count'])

You can also create your own rank expressions to sort your results according to other criteria, such as showing most recently active user, or combining the recency score with the text_relevance:

>>> domain.create_rank_expression('recently_active', 'last_activity')

>>> domain.create_rank_expression('activish',
...   'text_relevance + ((follower_count/(time() - last_activity))*1000)')

>>> results = search_service.search(bq=query, rank=['-recently_active'])

Viewing and Adjusting Stemming for a Domain

A stemming dictionary maps related words to a common stem. A stem is typically the root or base word from which variants are derived. For example, run is the stem of running and ran. During indexing, Amazon CloudSearch uses the stemming dictionary when it performs text-processing on text fields. At search time, the stemming dictionary is used to perform text-processing on the search request. This enables matching on variants of a word. For example, if you map the term running to the stem run and then search for running, the request matches documents that contain run as well as running.

To get the current stemming dictionary defined for a domain, use the get_stemming method:

>>> stems = domain.get_stemming()
>>> stems
{u'stems': {}}
>>>

This returns a dictionary object that can be manipulated directly to add additional stems for your search domain by adding pairs of term:stem to the stems dictionary:

>>> stems['stems']['running'] = 'run'
>>> stems['stems']['ran'] = 'run'
>>> stems
{u'stems': {u'ran': u'run', u'running': u'run'}}
>>>

This has changed the value locally. To update the information in Amazon CloudSearch, you need to save the data:

>>> stems.save()

You can also access certain CloudSearch-specific attributes related to the stemming dictionary defined for your domain:

>>> stems.status
u'RequiresIndexDocuments'
>>> stems.creation_date
u'2012-05-01T12:12:32Z'
>>> stems.update_date
u'2012-05-01T12:12:32Z'
>>> stems.update_version
19
>>>

The status indicates that, because you have changed the stems associated with the domain, you will need to re-index the documents in the domain before the new stems are used.

Viewing and Adjusting Stopwords for a Domain

Stopwords are words that should typically be ignored both during indexing and at search time because they are either insignificant or so common that including them would result in a massive number of matches.

To view the stopwords currently defined for your domain, use the get_stopwords method:

>>> stopwords = domain.get_stopwords()
>>> stopwords
{u'stopwords': [u'a',
 u'an',
 u'and',
 u'are',
 u'as',
 u'at',
 u'be',
 u'but',
 u'by',
 u'for',
 u'in',
 u'is',
 u'it',
 u'of',
 u'on',
 u'or',
 u'the',
 u'to',
 u'was']}
>>>

You can add additional stopwords by simply appending the values to the list:

>>> stopwords['stopwords'].append('foo')
>>> stopwords['stopwords'].append('bar')
>>> stopwords

Similarly, you could remove currently defined stopwords from the list. To save the changes, use the save method:

>>> stopwords.save()

The stopwords object has similar attributes defined above for stemming that provide additional information about the stopwords in your domain.

Viewing and Adjusting Synonyms for a Domain

You can configure synonyms for terms that appear in the data you are searching. That way, if a user searches for the synonym rather than the indexed term, the results will include documents that contain the indexed term.

If you want two terms to match the same documents, you must define them as synonyms of each other. For example:

cat, feline
feline, cat

To view the synonyms currently defined for your domain, use the get_synonyms method:

>>> synonyms = domain.get_synonyms()
>>> synonyms
{u'synonyms': {}}
>>>

You can define new synonyms by adding new term:synonyms entries to the synonyms dictionary object:

>>> synonyms['synonyms']['cat'] = ['feline', 'kitten']
>>> synonyms['synonyms']['dog'] = ['canine', 'puppy']

To save the changes, use the save method:

>>> synonyms.save()

The synonyms object has similar attributes defined above for stemming that provide additional information about the stopwords in your domain.

Deleting Documents

It is also possible to delete documents:

>>> import time
>>> from datetime import datetime

>>> doc_service = domain.get_document_service()

>>> # Again we'll cheat and use the current epoch time as our version number

>>> doc_service.delete(4, int(time.mktime(datetime.utcnow().timetuple())))
>>> doc_service.commit()

CloudWatch

First, make sure you have something to monitor. You can either create a LoadBalancer or enable monitoring on an existing EC2 instance. To enable monitoring, you can either call the monitor_instance method on the EC2Connection object or call the monitor method on the Instance object.

It takes a while for the monitoring data to start accumulating but once it does, you can do this:

>>> import boto.ec2.cloudwatch
>>> c = boto.ec2.cloudwatch.connect_to_region('us-west-2')
>>> metrics = c.list_metrics()
>>> metrics
[Metric:DiskReadBytes,
 Metric:CPUUtilization,
 Metric:DiskWriteOps,
 Metric:DiskWriteOps,
 Metric:DiskReadOps,
 Metric:DiskReadBytes,
 Metric:DiskReadOps,
 Metric:CPUUtilization,
 Metric:DiskWriteOps,
 Metric:NetworkIn,
 Metric:NetworkOut,
 Metric:NetworkIn,
 Metric:DiskReadBytes,
 Metric:DiskWriteBytes,
 Metric:DiskWriteBytes,
 Metric:NetworkIn,
 Metric:NetworkIn,
 Metric:NetworkOut,
 Metric:NetworkOut,
 Metric:DiskReadOps,
 Metric:CPUUtilization,
 Metric:DiskReadOps,
 Metric:CPUUtilization,
 Metric:DiskWriteBytes,
 Metric:DiskWriteBytes,
 Metric:DiskReadBytes,
 Metric:NetworkOut,
 Metric:DiskWriteOps]

The list_metrics call will return a list of all of the available metrics that you can query against. Each entry in the list is a Metric object. As you can see from the list above, some of the metrics are repeated. The repeated metrics are across different dimensions (per-instance, per-image type, per instance type) which can identified by looking at the dimensions property.

Because for this example, I’m only monitoring a single instance, the set of metrics available to me are fairly limited. If I was monitoring many instances, using many different instance types and AMI’s and also several load balancers, the list of available metrics would grow considerably.

Once you have the list of available metrics, you can actually query the CloudWatch system for that metric. Let’s choose the CPU utilization metric for one of the ImageID.:

>>> m_image = metrics[7]
>>> m_image
Metric:CPUUtilization
>>> m_image.dimensions
{u'ImageId': [u'ami-6ac2a85a']}

Let’s choose another CPU utilization metric for our instance.:

>>> m = metrics[20]
>>> m
Metric:CPUUtilization
>>> m.dimensions
{u'InstanceId': [u'i-4ca81747']}

The Metric object has a query method that lets us actually perform the query against the collected data in CloudWatch. To call that, we need a start time and end time to control the time span of data that we are interested in. For this example, let’s say we want the data for the previous hour:

>>> import datetime
>>> end = datetime.datetime.utcnow()
>>> start = end - datetime.timedelta(hours=1)

We also need to supply the Statistic that we want reported and the Units to use for the results. The Statistic can be one of these values:

['Minimum', 'Maximum', 'Sum', 'Average', 'SampleCount']

And Units must be one of the following:

['Seconds', 'Microseconds', 'Milliseconds', 'Bytes', 'Kilobytes', 'Megabytes', 'Gigabytes', 'Terabytes', 'Bits', 'Kilobits', 'Megabits', 'Gigabits', 'Terabits', 'Percent', 'Count', 'Bytes/Second', 'Kilobytes/Second', 'Megabytes/Second', 'Gigabytes/Second', 'Terabytes/Second', 'Bits/Second', 'Kilobits/Second', 'Megabits/Second', 'Gigabits/Second', 'Terabits/Second', 'Count/Second', None]

The query method also takes an optional parameter, period. This parameter controls the granularity (in seconds) of the data returned. The smallest period is 60 seconds and the value must be a multiple of 60 seconds. So, let’s ask for the average as a percent:

>>> datapoints = m.query(start, end, 'Average', 'Percent')
>>> len(datapoints)
60

Our period was 60 seconds and our duration was one hour so we should get 60 data points back and we can see that we did. Each element in the datapoints list is a DataPoint object which is a simple subclass of a Python dict object. Each Datapoint object contains all of the information available about that particular data point.:

>>> d = datapoints[0]
>>> d
{u'Timestamp': datetime.datetime(2014, 6, 23, 22, 25),
 u'Average': 20.0,
 u'Unit': u'Percent'}

My server obviously isn’t very busy right now!

An Introduction to boto’s VPC interface

This tutorial is based on the examples in the Amazon Virtual Private Cloud Getting Started Guide (http://docs.amazonwebservices.com/AmazonVPC/latest/GettingStartedGuide/). In each example, it tries to show the boto request that correspond to the AWS command line tools.

Creating a VPC connection

First, we need to create a new VPC connection:

>>> from boto.vpc import VPCConnection
>>> c = VPCConnection()

To create a VPC

Now that we have a VPC connection, we can create our first VPC.

>>> vpc = c.create_vpc('10.0.0.0/24')
>>> vpc
VPC:vpc-6b1fe402
>>> vpc.id
u'vpc-6b1fe402'
>>> vpc.state
u'pending'
>>> vpc.cidr_block
u'10.0.0.0/24'
>>> vpc.dhcp_options_id
u'default'
>>>

To create a subnet

The next step is to create a subnet to associate with your VPC.

>>> subnet = c.create_subnet(vpc.id, '10.0.0.0/25')
>>> subnet.id
u'subnet-6a1fe403'
>>> subnet.state
u'pending'
>>> subnet.cidr_block
u'10.0.0.0/25'
>>> subnet.available_ip_address_count
123
>>> subnet.availability_zone
u'us-east-1b'
>>>

To create a customer gateway

Next, we create a customer gateway.

>>> cg = c.create_customer_gateway('ipsec.1', '12.1.2.3', 65534)
>>> cg.id
u'cgw-b6a247df'
>>> cg.type
u'ipsec.1'
>>> cg.state
u'available'
>>> cg.ip_address
u'12.1.2.3'
>>> cg.bgp_asn
u'65534'
>>>

To create a VPN gateway

>>> vg = c.create_vpn_gateway('ipsec.1')
>>> vg.id
u'vgw-44ad482d'
>>> vg.type
u'ipsec.1'
>>> vg.state
u'pending'
>>> vg.availability_zone
u'us-east-1b'
>>>

Attaching a VPN Gateway to a VPC

>>> vg.attach(vpc.id)
>>>

Associating an Elastic IP with a VPC Instance

>>> ec2.connection.associate_address('i-71b2f60b', None, 'eipalloc-35cf685d')
>>>

Releasing an Elastic IP Attached to a VPC Instance

>>> ec2.connection.release_address(None, 'eipalloc-35cf685d')
>>>

To Get All VPN Connections

>>> vpns = c.get_all_vpn_connections()
>>> vpns[0].id
u'vpn-12ef67bv'
>>> tunnels = vpns[0].tunnels
>>> tunnels
[VpnTunnel: 177.12.34.56, VpnTunnel: 177.12.34.57]

To Create VPC Peering Connection

>>> vpcs = c.get_all_vpcs()
>>> vpc_peering_connection = c.create_vpc_peering_connection(vpcs[0].id, vpcs[1].id)
>>> vpc_peering_connection
VpcPeeringConnection:pcx-18987471

To Accept VPC Peering Connection

>>> vpc_peering_connections = c.get_all_vpc_peering_connections()
>>> vpc_peering_connection = vpc_peering_connections[0]
>>> vpc_peering_connection.status_code
u'pending-acceptance'
>>> vpc_peering_connection = c.accept_vpc_peering_connection(vpc_peering_connection.id)
>>> vpc_peering_connection.update()
u'active'

To Reject VPC Peering Connection

>>> vpc_peering_connections = c.get_all_vpc_peering_connections()
>>> vpc_peering_connection = vpc_peering_connections[0]
>>> vpc_peering_connection.status_code
u'pending-acceptance
>>> c.reject_vpc_peering_connection(vpc_peering_connection.id)
>>> vpc_peering_connection.update()
u'rejected'

An Introduction to boto’s Elastic Load Balancing interface

This tutorial focuses on the boto interface for Elastic Load Balancing from Amazon Web Services. This tutorial assumes that you have already downloaded and installed boto, and are familiar with the boto ec2 interface.

Elastic Load Balancing Concepts

Elastic Load Balancing (ELB) is intimately connected with Amazon’s Elastic Compute Cloud (EC2) service. Using the ELB service allows you to create a load balancer - a DNS endpoint and set of ports that distributes incoming requests to a set of EC2 instances. The advantages of using a load balancer is that it allows you to truly scale up or down a set of backend instances without disrupting service. Before the ELB service, you had to do this manually by launching an EC2 instance and installing load balancer software on it (nginx, haproxy, perlbal, etc.) to distribute traffic to other EC2 instances.

Recall that the EC2 service is split into Regions, which are further divided into Availability Zones (AZ). For example, the US-East region is divided into us-east-1a, us-east-1b, us-east-1c, us-east-1d, and us-east-1e. You can think of AZs as data centers - each runs off a different set of ISP backbones and power providers. ELB load balancers can span multiple AZs but cannot span multiple regions. That means that if you’d like to create a set of instances spanning both the US and Europe Regions you’d have to create two load balancers and have some sort of other means of distributing requests between the two load balancers. An example of this could be using GeoIP techniques to choose the correct load balancer, or perhaps DNS round robin. Keep in mind also that traffic is distributed equally over all AZs the ELB balancer spans. This means you should have an equal number of instances in each AZ if you want to equally distribute load amongst all your instances.

Creating a Connection

The first step in accessing ELB is to create a connection to the service.

Like EC2, the ELB service has a different endpoint for each region. By default the US East endpoint is used. To choose a specific region, use the connect_to_region function:

>>> import boto.ec2.elb
>>> elb = boto.ec2.elb.connect_to_region('us-west-2')

Here’s yet another way to discover what regions are available and then connect to one:

>>> import boto.ec2.elb
>>> regions = boto.ec2.elb.regions()
>>> regions
[RegionInfo:us-east-1,
 RegionInfo:ap-northeast-1,
 RegionInfo:us-west-1,
 RegionInfo:us-west-2,
 RegionInfo:ap-southeast-1,
 RegionInfo:eu-west-1]
>>> elb = regions[-1].connect()

Alternatively, edit your boto.cfg with the default ELB endpoint to use:

[Boto]
elb_region_name = eu-west-1
elb_region_endpoint = elasticloadbalancing.eu-west-1.amazonaws.com
Getting Existing Load Balancers

To retrieve any existing load balancers:

>>> conn.get_all_load_balancers()
[LoadBalancer:load-balancer-prod, LoadBalancer:load-balancer-staging]

You can also filter by name

>>> conn.get_all_load_balancers(load_balancer_names=['load-balancer-prod'])
[LoadBalancer:load-balancer-prod]

get_all_load_balancers returns a boto.resultset.ResultSet that contains instances of boto.ec2.elb.loadbalancer.LoadBalancer, each of which abstracts access to a load balancer. ResultSet works very much like a list.

>>> balancers = conn.get_all_load_balancers()
>>> balancers[0]
LoadBalancer:load-balancer-prod

Creating a Load Balancer

To create a load balancer you need the following:
  1. The specific ports and protocols you want to load balancer over, and what port you want to connect to all instances.
  2. A health check - the ELB concept of a heart beat or ping. ELB will use this health check to see whether your instances are up or down. If they go down, the load balancer will no longer send requests to them.
  3. A list of Availability Zones you’d like to create your load balancer over.
Ports and Protocols

An incoming connection to your load balancer will come on one or more ports - for example 80 (HTTP) and 443 (HTTPS). Each can be using a protocol - currently, the supported protocols are TCP and HTTP. We also need to tell the load balancer which port to route connects to on each instance. For example, to create a load balancer for a website that accepts connections on 80 and 443, and that routes connections to port 8080 and 8443 on each instance, you would specify that the load balancer ports and protocols are:

  • 80, 8080, HTTP
  • 443, 8443, TCP

This says that the load balancer will listen on two ports - 80 and 443. Connections on 80 will use an HTTP load balancer to forward connections to port 8080 on instances. Likewise, the load balancer will listen on 443 to forward connections to 8443 on each instance using the TCP balancer. We need to use TCP for the HTTPS port because it is encrypted at the application layer. Of course, we could specify the load balancer use TCP for port 80, however specifying HTTP allows you to let ELB handle some work for you - for example HTTP header parsing.

Configuring a Health Check

A health check allows ELB to determine which instances are alive and able to respond to requests. A health check is essentially a tuple consisting of:

  • Target: What to check on an instance. For a TCP check this is comprised of:

    TCP:PORT_TO_CHECK
    

    Which attempts to open a connection on PORT_TO_CHECK. If the connection opens successfully, that specific instance is deemed healthy, otherwise it is marked temporarily as unhealthy. For HTTP, the situation is slightly different:

    HTTP:PORT_TO_CHECK/RESOURCE
    

    This means that the health check will connect to the resource /RESOURCE on PORT_TO_CHECK. If an HTTP 200 status is returned the instance is deemed healthy.

  • Interval: How often the check is made. This is given in seconds and defaults to 30. The valid range of intervals goes from 5 seconds to 600 seconds.

  • Timeout: The number of seconds the load balancer will wait for a check to return a result.

  • Unhealthy threshold: The number of consecutive failed checks to deem the instance as being dead. The default is 5, and the range of valid values lies from 2 to 10.

The following example creates a health check called instance_health that simply checks instances every 20 seconds on port 80 over HTTP at the resource /health for 200 successes.

>>> from boto.ec2.elb import HealthCheck
>>> hc = HealthCheck(
        interval=20,
        healthy_threshold=3,
        unhealthy_threshold=5,
        target='HTTP:8080/health'
    )
Putting It All Together

Finally, let’s create a load balancer in the US region that listens on ports 80 and 443 and distributes requests to instances on 8080 and 8443 over HTTP and TCP. We want the load balancer to span the availability zones us-east-1a and us-east-1b:

>>> zones = ['us-east-1a', 'us-east-1b']
>>> ports = [(80, 8080, 'http'), (443, 8443, 'tcp')]
>>> lb = conn.create_load_balancer('my-lb', zones, ports)
>>> # This is from the previous section.
>>> lb.configure_health_check(hc)

The load balancer has been created. To see where you can actually connect to it, do:

>>> print lb.dns_name
my_elb-123456789.us-east-1.elb.amazonaws.com

You can then CNAME map a better name, i.e. www.MYWEBSITE.com to the above address.

Adding Instances To a Load Balancer

Now that the load balancer has been created, there are two ways to add instances to it:

  1. Manually, adding each instance in turn.
  2. Mapping an autoscale group to the load balancer. Please see the Autoscale tutorial for information on how to do this.
Manually Adding and Removing Instances

Assuming you have a list of instance ids, you can add them to the load balancer

>>> instance_ids = ['i-4f8cf126', 'i-0bb7ca62']
>>> lb.register_instances(instance_ids)

Keep in mind that these instances should be in Security Groups that match the internal ports of the load balancer you just created (for this example, they should allow incoming connections on 8080 and 8443).

To remove instances:

>>> lb.deregister_instances(instance_ids)

Modifying Availability Zones for a Load Balancer

If you wanted to disable one or more zones from an existing load balancer:

>>> lb.disable_zones(['us-east-1a'])

You can then terminate each instance in the disabled zone and then deregister then from your load balancer.

To enable zones:

>>> lb.enable_zones(['us-east-1c'])

Deleting a Load Balancer

>>> lb.delete()

An Introduction to boto’s S3 interface

This tutorial focuses on the boto interface to the Simple Storage Service from Amazon Web Services. This tutorial assumes that you have already downloaded and installed boto.

Creating a Connection

The first step in accessing S3 is to create a connection to the service. There are two ways to do this in boto. The first is:

>>> from boto.s3.connection import S3Connection
>>> conn = S3Connection('<aws access key>', '<aws secret key>')

At this point the variable conn will point to an S3Connection object. In this example, the AWS access key and AWS secret key are passed in to the method explicitly. Alternatively, you can set the environment variables:

  • AWS_ACCESS_KEY_ID - Your AWS Access Key ID
  • AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key

and then call the constructor without any arguments, like this:

>>> conn = S3Connection()

There is also a shortcut function in the boto package, called connect_s3 that may provide a slightly easier means of creating a connection:

>>> import boto
>>> conn = boto.connect_s3()

In either case, conn will point to an S3Connection object which we will use throughout the remainder of this tutorial.

Creating a Bucket

Once you have a connection established with S3, you will probably want to create a bucket. A bucket is a container used to store key/value pairs in S3. A bucket can hold an unlimited amount of data so you could potentially have just one bucket in S3 for all of your information. Or, you could create separate buckets for different types of data. You can figure all of that out later, first let’s just create a bucket. That can be accomplished like this:

>>> bucket = conn.create_bucket('mybucket')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "boto/connection.py", line 285, in create_bucket
    raise S3CreateError(response.status, response.reason)
boto.exception.S3CreateError: S3Error[409]: Conflict

Whoa. What happened there? Well, the thing you have to know about buckets is that they are kind of like domain names. It’s one flat name space that everyone who uses S3 shares. So, someone has already create a bucket called “mybucket” in S3 and that means no one else can grab that bucket name. So, you have to come up with a name that hasn’t been taken yet. For example, something that uses a unique string as a prefix. Your AWS_ACCESS_KEY (NOT YOUR SECRET KEY!) could work but I’ll leave it to your imagination to come up with something. I’ll just assume that you found an acceptable name.

The create_bucket method will create the requested bucket if it does not exist or will return the existing bucket if it does exist.

Creating a Bucket In Another Location

The example above assumes that you want to create a bucket in the standard US region. However, it is possible to create buckets in other locations. To do so, first import the Location object from the boto.s3.connection module, like this:

>>> from boto.s3.connection import Location
>>> print '\n'.join(i for i in dir(Location) if i[0].isupper())
APNortheast
APSoutheast
APSoutheast2
DEFAULT
EU
EUCentral1
SAEast
USWest
USWest2

As you can see, the Location object defines a number of possible locations. By default, the location is the empty string which is interpreted as the US Classic Region, the original S3 region. However, by specifying another location at the time the bucket is created, you can instruct S3 to create the bucket in that location. For example:

>>> conn.create_bucket('mybucket', location=Location.EU)

will create the bucket in the EU region (assuming the name is available).

Storing Data

Once you have a bucket, presumably you will want to store some data in it. S3 doesn’t care what kind of information you store in your objects or what format you use to store it. All you need is a key that is unique within your bucket.

The Key object is used in boto to keep track of data stored in S3. To store new data in S3, start by creating a new Key object:

>>> from boto.s3.key import Key
>>> k = Key(bucket)
>>> k.key = 'foobar'
>>> k.set_contents_from_string('This is a test of S3')

The net effect of these statements is to create a new object in S3 with a key of “foobar” and a value of “This is a test of S3”. To validate that this worked, quit out of the interpreter and start it up again. Then:

>>> import boto
>>> c = boto.connect_s3()
>>> b = c.get_bucket('mybucket') # substitute your bucket name here
>>> from boto.s3.key import Key
>>> k = Key(b)
>>> k.key = 'foobar'
>>> k.get_contents_as_string()
'This is a test of S3'

So, we can definitely store and retrieve strings. A more interesting example may be to store the contents of a local file in S3 and then retrieve the contents to another local file.

>>> k = Key(b)
>>> k.key = 'myfile'
>>> k.set_contents_from_filename('foo.jpg')
>>> k.get_contents_to_filename('bar.jpg')

There are a couple of things to note about this. When you send data to S3 from a file or filename, boto will attempt to determine the correct mime type for that file and send it as a Content-Type header. The boto package uses the standard mimetypes package in Python to do the mime type guessing. The other thing to note is that boto does stream the content to and from S3 so you should be able to send and receive large files without any problem.

When fetching a key that already exists, you have two options. If you’re uncertain whether a key exists (or if you need the metadata set on it, you can call Bucket.get_key(key_name_here). However, if you’re sure a key already exists within a bucket, you can skip the check for a key on the server.

>>> import boto
>>> c = boto.connect_s3()
>>> b = c.get_bucket('mybucket') # substitute your bucket name here

# Will hit the API to check if it exists.
>>> possible_key = b.get_key('mykey') # substitute your key name here

# Won't hit the API.
>>> key_we_know_is_there = b.get_key('mykey', validate=False)

Storing Large Data

At times the data you may want to store will be hundreds of megabytes or more in size. S3 allows you to split such files into smaller components. You upload each component in turn and then S3 combines them into the final object. While this is fairly straightforward, it requires a few extra steps to be taken. The example below makes use of the FileChunkIO module, so pip install FileChunkIO if it isn’t already installed.

>>> import math, os
>>> import boto
>>> from filechunkio import FileChunkIO

# Connect to S3
>>> c = boto.connect_s3()
>>> b = c.get_bucket('mybucket')

# Get file info
>>> source_path = 'path/to/your/file.ext'
>>> source_size = os.stat(source_path).st_size

# Create a multipart upload request
>>> mp = b.initiate_multipart_upload(os.path.basename(source_path))

# Use a chunk size of 50 MiB (feel free to change this)
>>> chunk_size = 52428800
>>> chunk_count = int(math.ceil(source_size / float(chunk_size)))

# Send the file parts, using FileChunkIO to create a file-like object
# that points to a certain byte range within the original file. We
# set bytes to never exceed the original file size.
>>> for i in range(chunk_count):
>>>     offset = chunk_size * i
>>>     bytes = min(chunk_size, source_size - offset)
>>>     with FileChunkIO(source_path, 'r', offset=offset,
                         bytes=bytes) as fp:
>>>         mp.upload_part_from_file(fp, part_num=i + 1)

# Finish the upload
>>> mp.complete_upload()

It is also possible to upload the parts in parallel using threads. The s3put script that ships with Boto provides an example of doing so using a thread pool.

Note that if you forget to call either mp.complete_upload() or mp.cancel_upload() you will be left with an incomplete upload and charged for the storage consumed by the uploaded parts. A call to bucket.get_all_multipart_uploads() can help to show lost multipart upload parts.

Accessing A Bucket

Once a bucket exists, you can access it by getting the bucket. For example:

>>> mybucket = conn.get_bucket('mybucket') # Substitute in your bucket name
>>> mybucket.list()
...listing of keys in the bucket...

By default, this method tries to validate the bucket’s existence. You can override this behavior by passing validate=False.:

>>> nonexistent = conn.get_bucket('i-dont-exist-at-all', validate=False)

Changed in version 2.25.0.

Warning

If validate=False is passed, no request is made to the service (no charge/communication delay). This is only safe to do if you are sure the bucket exists.

If the default validate=True is passed, a request is made to the service to ensure the bucket exists. Prior to Boto v2.25.0, this fetched a list of keys (but with a max limit set to 0, always returning an empty list) in the bucket (& included better error messages), at an increased expense. As of Boto v2.25.0, this now performs a HEAD request (less expensive but worse error messages).

If you were relying on parsing the error message before, you should call something like:

bucket = conn.get_bucket('<bucket_name>', validate=False)
bucket.get_all_keys(maxkeys=0)

If the bucket does not exist, a S3ResponseError will commonly be thrown. If you’d rather not deal with any exceptions, you can use the lookup method.:

>>> nonexistent = conn.lookup('i-dont-exist-at-all')
>>> if nonexistent is None:
...     print "No such bucket!"
...
No such bucket!

Deleting A Bucket

Removing a bucket can be done using the delete_bucket method. For example:

>>> conn.delete_bucket('mybucket') # Substitute in your bucket name

The bucket must be empty of keys or this call will fail & an exception will be raised. You can remove a non-empty bucket by doing something like:

>>> full_bucket = conn.get_bucket('bucket-to-delete')
# It's full of keys. Delete them all.
>>> for key in full_bucket.list():
...     key.delete()
...
# The bucket is empty now. Delete it.
>>> conn.delete_bucket('bucket-to-delete')

Warning

This method can cause data loss! Be very careful when using it.

Additionally, be aware that using the above method for removing all keys and deleting the bucket involves a request for each key. As such, it’s not particularly fast & is very chatty.

Listing All Available Buckets

In addition to accessing specific buckets via the create_bucket method you can also get a list of all available buckets that you have created.

>>> rs = conn.get_all_buckets()

This returns a ResultSet object (see the SQS Tutorial for more info on ResultSet objects). The ResultSet can be used as a sequence or list type object to retrieve Bucket objects.

>>> len(rs)
11
>>> for b in rs:
... print b.name
...
<listing of available buckets>
>>> b = rs[0]

Setting / Getting the Access Control List for Buckets and Keys

The S3 service provides the ability to control access to buckets and keys within s3 via the Access Control List (ACL) associated with each object in S3. There are two ways to set the ACL for an object:

  1. Create a custom ACL that grants specific rights to specific users. At the moment, the users that are specified within grants have to be registered users of Amazon Web Services so this isn’t as useful or as general as it could be.
  2. Use a “canned” access control policy. There are four canned policies defined:
    1. private: Owner gets FULL_CONTROL. No one else has any access rights.
    2. public-read: Owners gets FULL_CONTROL and the anonymous principal is granted READ access.
    3. public-read-write: Owner gets FULL_CONTROL and the anonymous principal is granted READ and WRITE access.
    4. authenticated-read: Owner gets FULL_CONTROL and any principal authenticated as a registered Amazon S3 user is granted READ access.

To set a canned ACL for a bucket, use the set_acl method of the Bucket object. The argument passed to this method must be one of the four permissable canned policies named in the list CannedACLStrings contained in acl.py. For example, to make a bucket readable by anyone:

>>> b.set_acl('public-read')

You can also set the ACL for Key objects, either by passing an additional argument to the above method:

>>> b.set_acl('public-read', 'foobar')

where ‘foobar’ is the key of some object within the bucket b or you can call the set_acl method of the Key object:

>>> k.set_acl('public-read')

You can also retrieve the current ACL for a Bucket or Key object using the get_acl object. This method parses the AccessControlPolicy response sent by S3 and creates a set of Python objects that represent the ACL.

>>> acp = b.get_acl()
>>> acp
<boto.acl.Policy instance at 0x2e6940>
>>> acp.acl
<boto.acl.ACL instance at 0x2e69e0>
>>> acp.acl.grants
[<boto.acl.Grant instance at 0x2e6a08>]
>>> for grant in acp.acl.grants:
...   print grant.permission, grant.display_name, grant.email_address, grant.id
...
FULL_CONTROL <boto.user.User instance at 0x2e6a30>

The Python objects representing the ACL can be found in the acl.py module of boto.

Both the Bucket object and the Key object also provide shortcut methods to simplify the process of granting individuals specific access. For example, if you want to grant an individual user READ access to a particular object in S3 you could do the following:

>>> key = b.lookup('mykeytoshare')
>>> key.add_email_grant('READ', 'foo@bar.com')

The email address provided should be the one associated with the users AWS account. There is a similar method called add_user_grant that accepts the canonical id of the user rather than the email address.

Setting/Getting Metadata Values on Key Objects

S3 allows arbitrary user metadata to be assigned to objects within a bucket. To take advantage of this S3 feature, you should use the set_metadata and get_metadata methods of the Key object to set and retrieve metadata associated with an S3 object. For example:

>>> k = Key(b)
>>> k.key = 'has_metadata'
>>> k.set_metadata('meta1', 'This is the first metadata value')
>>> k.set_metadata('meta2', 'This is the second metadata value')
>>> k.set_contents_from_filename('foo.txt')

This code associates two metadata key/value pairs with the Key k. To retrieve those values later:

>>> k = b.get_key('has_metadata')
>>> k.get_metadata('meta1')
'This is the first metadata value'
>>> k.get_metadata('meta2')
'This is the second metadata value'
>>>

Setting/Getting/Deleting CORS Configuration on a Bucket

Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one domain to interact with resources in a different domain. With CORS support in Amazon S3, you can build rich client-side web applications with Amazon S3 and selectively allow cross-origin access to your Amazon S3 resources.

To create a CORS configuration and associate it with a bucket:

>>> from boto.s3.cors import CORSConfiguration
>>> cors_cfg = CORSConfiguration()
>>> cors_cfg.add_rule(['PUT', 'POST', 'DELETE'], 'https://www.example.com', allowed_header='*', max_age_seconds=3000, expose_header='x-amz-server-side-encryption')
>>> cors_cfg.add_rule('GET', '*')

The above code creates a CORS configuration object with two rules.

  • The first rule allows cross-origin PUT, POST, and DELETE requests from the https://www.example.com/ origin. The rule also allows all headers in preflight OPTIONS request through the Access-Control-Request-Headers header. In response to any preflight OPTIONS request, Amazon S3 will return any requested headers.
  • The second rule allows cross-origin GET requests from all origins.

To associate this configuration with a bucket:

>>> import boto
>>> c = boto.connect_s3()
>>> bucket = c.lookup('mybucket')
>>> bucket.set_cors(cors_cfg)

To retrieve the CORS configuration associated with a bucket:

>>> cors_cfg = bucket.get_cors()

And, finally, to delete all CORS configurations from a bucket:

>>> bucket.delete_cors()

Transitioning Objects

S3 buckets support transitioning objects to various storage classes. This is done using lifecycle policies. You can currently transitions objects to Infrequent Access, Glacier, or just plain Expire. All of these options are capable of being applied after a number of days or after a given date. Lifecycle configurations are assigned to buckets and require these parameters:

  • The object prefix that identifies the objects you are targeting. (or none)
  • The action you want S3 to perform on the identified objects.
  • The date or number of days when you want S3 to perform these actions.

For example, given a bucket s3-lifecycle-boto-demo, we can first retrieve the bucket:

>>> import boto
>>> c = boto.connect_s3()
>>> bucket = c.get_bucket('s3-lifecycle-boto-demo')

Then we can create a lifecycle object. In our example, we want all objects under logs/* to transition to Standard IA 30 days after the object is created, glacier 90 days after creation, and be deleted 120 days after creation.

>>> from boto.s3.lifecycle import Lifecycle, Transitions, Rule
>>> transitions = Transitions()
>>> transitions.add_transition(days=30, storage_class='STANDARD_IA')
>>> transitions.add_transition(days=90, storage_class='GLACIER')
>>> expiration = Expiration(days=120)
>>> rule = Rule(id='ruleid', prefix='logs/', status='Enabled', expiration=expiration, transition=transitions)
>>> lifecycle = Lifecycle()
>>> lifecycle.append(rule)

Note

For API docs for the lifecycle objects, see boto.s3.lifecycle

We can now configure the bucket with this lifecycle policy:

>>> bucket.configure_lifecycle(lifecycle)
True

You can also retrieve the current lifecycle policy for the bucket:

>>> current = bucket.get_lifecycle_config()
>>> print current[0].transition
>>> print current[0].expiration
[<Transition: in: 90 days, GLACIER>, <Transition: in: 30 days, STANDARD_IA>]
<Expiration: in: 120 days>

Note: We have deprecated directly accessing transition properties from the lifecycle object. You must index into the transition array first.

When an object transitions, the storage class will be updated. This can be seen when you list the objects in a bucket:

>>> for key in bucket.list():
...   print key, key.storage_class
...
<Key: s3-lifecycle-boto-demo,logs/testlog1.log> STANDARD_IA
<Key: s3-lifecycle-boto-demo,logs/testlog2.log> GLACIER

You can also use the prefix argument to the bucket.list method:

>>> print list(b.list(prefix='logs/testlog1.log'))[0].storage_class
>>> print list(b.list(prefix='logs/testlog2.log'))[0].storage_class
u'STANDARD_IA'
u'GLACIER'

Restoring Objects from Glacier

Once an object has been transitioned to Glacier, you can restore the object back to S3. To do so, you can use the boto.s3.key.Key.restore() method of the key object. The restore method takes an integer that specifies the number of days to keep the object in S3.

>>> import boto
>>> c = boto.connect_s3()
>>> bucket = c.get_bucket('s3-glacier-boto-demo')
>>> key = bucket.get_key('logs/testlog1.log')
>>> key.restore(days=5)

It takes about 4 hours for a restore operation to make a copy of the archive available for you to access. While the object is being restored, the ongoing_restore attribute will be set to True:

>>> key = bucket.get_key('logs/testlog1.log')
>>> print key.ongoing_restore
True

When the restore is finished, this value will be False and the expiry date of the object will be non None:

>>> key = bucket.get_key('logs/testlog1.log')
>>> print key.ongoing_restore
False
>>> print key.expiry_date
"Fri, 21 Dec 2012 00:00:00 GMT"

Note

If there is no restore operation either in progress or completed, the ongoing_restore attribute will be None.

Once the object is restored you can then download the contents:

>>> key.get_contents_to_filename('testlog1.log')

An Introduction to boto’s Route53 interface

This tutorial focuses on the boto interface to Route53 from Amazon Web Services. This tutorial assumes that you have already downloaded and installed boto.

Route53 is a Domain Name System (DNS) web service. It can be used to route requests to services running on AWS such as EC2 instances or load balancers, as well as to external services. Route53 also allows you to have automated checks to send requests where you require them.

In this tutorial, we will be setting up our services for example.com.

Creating a connection

To start using Route53 you will need to create a connection to the service as normal:

>>> import boto.route53
>>> conn = boto.route53.connect_to_region('us-west-2')

You will be using this conn object for the remainder of the tutorial to send commands to Route53.

Working with domain names

You can manipulate domains through a zone object. For example, you can create a domain name:

>>> zone = conn.create_zone("example.com.")

Note that trailing dot on that domain name is significant. This is known as a fully qualified domain name (FQDN).

>>> zone
<Zone:example.com.>

You can also retrieve all your domain names:

>>> conn.get_zones()
[<Zone:example.com.>]

Or you can retrieve a single domain:

>>> conn.get_zone("example.com.")
<Zone:example.com.>

Finally, you can retrieve the list of nameservers that AWS has setup for this domain name as follows:

>>> zone.get_nameservers()
[u'ns-1000.awsdns-42.org.', u'ns-1001.awsdns-30.com.', u'ns-1002.awsdns-59.net.', u'ns-1003.awsdns-09.co.uk.']

Once you have finished configuring your domain name, you will need to change your nameservers at your registrar to point to those nameservers for Route53 to work.

Setting up dumb records

You can also add, update and delete records on a zone:

>>> status = a.add_record("MX", "example.com.", "10 mail.isp.com")

When you send a change request through, the status of the update will be PENDING:

>>> status
<Status:PENDING>

You can call the API again and ask for the current status as follows:

>>> status.update()
'INSYNC'
>>> status
<Status:INSYNC>

When the status has changed to INSYNC, the change has been propagated to remote servers

Updating a record

You can create, upsert or delete a single record like this

>>> zone = conn.get_zone("example.com.")
>>> change_set = ResourceRecordSets(conn, zone.id)
>>> changes1 = change_set.add_change("UPSERT", "www" + ".example.com", type="CNAME", ttl=3600)
>>> changes1.add_value("webserver.example.com")
>>> change_set.commit()

In this example we create or update, depending on the existence of the record, the CNAME www.example.com to webserver.example.com.

Working with Change Sets

You can also do bulk updates using ResourceRecordSets. For example updating the TTL

>>> zone = conn.get_zone('example.com')
>>> change_set = boto.route53.record.ResourceRecordSets(conn, zone.id)
>>> for rrset in conn.get_all_rrsets(zone.id):
...     u = change_set.add_change("UPSERT", rrset.name, rrset.type, ttl=3600)
...     u.add_value(rrset.resource_records[0])
... results = change_set.commit()
Done

In this example we update the TTL to 1hr (3600 seconds) for all records recursed from example.com. Note: this will also change the SOA and NS records which may not be ideal for many users.

Boto Config

Introduction

There is a growing list of configuration options for the boto library. Many of these options can be passed into the constructors for top-level objects such as connections. Some options, such as credentials, can also be read from environment variables (e.g. AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SECURITY_TOKEN and AWS_PROFILE). It is also possible to manage these options in a central place through the use of boto config files.

Details

A boto config file is a text file formatted like an .ini configuration file that specifies values for options that control the behavior of the boto library. In Unix/Linux systems, on startup, the boto library looks for configuration files in the following locations and in the following order:

  • /etc/boto.cfg - for site-wide settings that all users on this machine will use
  • (if profile is given) ~/.aws/credentials - for credentials shared between SDKs
  • (if profile is given) ~/.boto - for user-specific settings
  • ~/.aws/credentials - for credentials shared between SDKs
  • ~/.boto - for user-specific settings

Comments You can comment out a line by putting a ‘#’ at the beginning of the line, just like in Python code.

In Windows, create a text file that has any name (e.g. boto.config). It’s recommended that you put this file in your user folder. Then set a user environment variable named BOTO_CONFIG to the full path of that file.

The options in the config file are merged into a single, in-memory configuration that is available as boto.config. The boto.pyami.config.Config class is a subclass of the standard Python ConfigParser.SafeConfigParser object and inherits all of the methods of that object. In addition, the boto Config class defines additional methods that are described on the PyamiConfigMethods page.

An example boto config file might look like:

[Credentials]
aws_access_key_id = <your_access_key_here>
aws_secret_access_key = <your_secret_key_here>

Sections

The following sections and options are currently recognized within the boto config file.

Credentials

The Credentials section is used to specify the AWS credentials used for all boto requests. The order of precedence for authentication credentials is:

  • Credentials passed into the Connection class constructor.
  • Credentials specified by environment variables
  • Credentials specified as named profiles in the shared credential file.
  • Credentials specified by default in the shared credential file.
  • Credentials specified as named profiles in the config file.
  • Credentials specified by default in the config file.

This section defines the following options: aws_access_key_id and aws_secret_access_key. The former being your AWS key id and the latter being the secret key.

For example:

[profile name_goes_here]
aws_access_key_id = <access key for this profile>
aws_secret_access_key = <secret key for this profile>

[Credentials]
aws_access_key_id = <your default access key>
aws_secret_access_key = <your default secret key>

Please notice that quote characters are not used to either side of the ‘=’ operator even when both your AWS access key ID and secret key are strings.

If you have multiple AWS keypairs that you use for different purposes, use the profile style shown above. You can set an arbitrary number of profiles within your configuration files and then reference them by name when you instantiate your connection. If you specify a profile that does not exist in the configuration, the keys used under the [Credentials] heading will be applied by default.

The shared credentials file in ~/.aws/credentials uses a slightly different format. For example:

[default]
aws_access_key_id = <your default access key>
aws_secret_access_key = <your default secret key>

[name_goes_here]
aws_access_key_id = <access key for this profile>
aws_secret_access_key = <secret key for this profile>

[another_profile]
aws_access_key_id = <access key for this profile>
aws_secret_access_key = <secret key for this profile>
aws_security_token = <optional security token for this profile>

For greater security, the secret key can be stored in a keyring and retrieved via the keyring package. To use a keyring, use keyring, rather than aws_secret_access_key:

[Credentials]
aws_access_key_id = <your access key>
keyring = <keyring name>

To use a keyring, you must have the Python keyring package installed and in the Python path. To learn about setting up keyrings, see the keyring documentation

Credentials can also be supplied for a Eucalyptus service:

[Credentials]
euca_access_key_id = <your access key>
euca_secret_access_key = <your secret key>

Finally, this section is also be used to provide credentials for the Internet Archive API:

[Credentials]
ia_access_key_id = <your access key>
ia_secret_access_key = <your secret key>
Boto

The Boto section is used to specify options that control the operation of boto itself. This section defines the following options:

debug:

Controls the level of debug messages that will be printed by the boto library. The following values are defined:

0 - no debug messages are printed
1 - basic debug messages from boto are printed
2 - all boto debugging messages plus request/response messages from httplib
proxy:

The name of the proxy host to use for connecting to AWS.

proxy_port:

The port number to use to connect to the proxy host.

proxy_user:

The user name to use when authenticating with proxy host.

proxy_pass:

The password to use when authenticating with proxy host.

num_retries:

The number of times to retry failed requests to an AWS server. If boto receives an error from AWS, it will attempt to recover and retry the request. The default number of retries is 5 but you can change the default with this option.

For example:

[Boto]
debug = 0
num_retries = 10

proxy = myproxy.com
proxy_port = 8080
proxy_user = foo
proxy_pass = bar
connection_stale_duration:
 Amount of time to wait in seconds before a connection will stop getting reused. AWS will disconnect connections which have been idle for 180 seconds.
is_secure:Is the connection over SSL. This setting will override passed in values.
https_validate_certificates:
 Validate HTTPS certificates. This is on by default
ca_certificates_file:
 Location of CA certificates or the keyword “system”. Using the system keyword lets boto get out of the way and makes the SSL certificate validation the responsibility the underlying SSL implementation provided by the system.
http_socket_timeout:
 Timeout used to overwrite the system default socket timeout for httplib .
send_crlf_after_proxy_auth_headers:
 Change line ending behaviour with proxies. For more details see this discussion
endpoints_path:Allows customizing the regions/endpoints available in Boto. Provide an absolute path to a custom JSON file, which gets merged into the defaults. (This can also be specified with the BOTO_ENDPOINTS environment variable instead.)
use_endpoint_heuristics:
 Allows using endpoint heuristics to guess endpoints for regions that aren’t built in. This can also be specified with the BOTO_USE_ENDPOINT_HEURISTICS environment variable.

These settings will default to:

[Boto]
connection_stale_duration = 180
is_secure = True
https_validate_certificates = True
ca_certificates_file = cacerts.txt
http_socket_timeout = 60
send_crlf_after_proxy_auth_headers = False
endpoints_path = /path/to/my/boto/endpoints.json
use_endpoint_heuristics = False

You can control the timeouts and number of retries used when retrieving information from the Metadata Service (this is used for retrieving credentials for IAM roles on EC2 instances):

metadata_service_timeout:
 Number of seconds until requests to the metadata service will timeout (float).
metadata_service_num_attempts:
 Number of times to attempt to retrieve information from the metadata service before giving up (int).

These settings will default to:

[Boto]
metadata_service_timeout = 1.0
metadata_service_num_attempts = 1

This section is also used for specifying endpoints for non-AWS services such as Eucalyptus and Walrus.

eucalyptus_host:
 Select a default endpoint host for eucalyptus
walrus_host:Select a default host for Walrus

For example:

[Boto]
eucalyptus_host = somehost.example.com
walrus_host = somehost.example.com

Finally, the Boto section is used to set defaults versions for many AWS services

AutoScale settings:

options: :autoscale_version: Set the API version :autoscale_endpoint: Endpoint to use :autoscale_region_name: Default region to use

For example:

[Boto]
autoscale_version = 2011-01-01
autoscale_endpoint = autoscaling.us-west-2.amazonaws.com
autoscale_region_name = us-west-2

Cloudformation settings can also be defined:

cfn_version:Cloud formation API version
cfn_region_name:
 Default region name
cfn_region_endpoint:
 Default endpoint

For example:

[Boto]
cfn_version = 2010-05-15
cfn_region_name = us-west-2
cfn_region_endpoint = cloudformation.us-west-2.amazonaws.com

Cloudsearch settings:

cs_region_name:Default cloudsearch region
cs_region_endpoint:
 Default cloudsearch endpoint

For example:

[Boto]
cs_region_name = us-west-2
cs_region_endpoint = cloudsearch.us-west-2.amazonaws.com

Cloudwatch settings:

cloudwatch_version:
 Cloudwatch API version
cloudwatch_region_name:
 Default region name
cloudwatch_region_endpoint:
 Default endpoint

For example:

[Boto]
cloudwatch_version = 2010-08-01
cloudwatch_region_name = us-west-2
cloudwatch_region_endpoint = monitoring.us-west-2.amazonaws.com

EC2 settings:

ec2_version:EC2 API version
ec2_region_name:
 Default region name
ec2_region_endpoint:
 Default endpoint

For example:

[Boto]
ec2_version = 2012-12-01
ec2_region_name = us-west-2
ec2_region_endpoint = ec2.us-west-2.amazonaws.com

ELB settings:

elb_version:ELB API version
elb_region_name:
 Default region name
elb_region_endpoint:
 Default endpoint

For example:

[Boto]
elb_version = 2012-06-01
elb_region_name = us-west-2
elb_region_endpoint = elasticloadbalancing.us-west-2.amazonaws.com

EMR settings:

emr_version:EMR API version
emr_region_name:
 Default region name
emr_region_endpoint:
 Default endpoint

For example:

[Boto]
emr_version = 2009-03-31
emr_region_name = us-west-2
emr_region_endpoint = elasticmapreduce.us-west-2.amazonaws.com

Precedence

Even if you have your boto config setup, you can also have credentials and options stored in environmental variables or you can explicitly pass them to method calls i.e.:

>>> boto.ec2.connect_to_region(
...     'us-west-2',
...     aws_access_key_id='foo',
...     aws_secret_access_key='bar')

In these cases where these options can be found in more than one place boto will first use the explicitly supplied arguments, if none found it will then look for them amidst environment variables and if that fails it will use the ones in boto config.

Notification

If you are using notifications for boto.pyami, you can specify the email details through the following variables.

smtp_from:Used as the sender in notification emails.
smtp_to:Destination to which emails should be sent
smtp_host:Host to connect to when sending notification emails.
smtp_port:Port to connect to when connecting to the :smtp_host:

Default values are:

[notification]
smtp_from = boto
smtp_to = None
smtp_host = localhost
smtp_port = 25
smtp_tls = True
smtp_user = john
smtp_pass = hunter2
SWF

The SWF section allows you to configure the default region to be used for the Amazon Simple Workflow service.

region:Set the default region

Example:

[SWF]
region = us-west-2
Pyami

The Pyami section is used to configure the working directory for PyAMI.

working_dir:Working directory used by PyAMI

Example:

[Pyami]
working_dir = /home/foo/
DB

The DB section is used to configure access to databases through the boto.sdb.db.manager.get_manager() function.

db_type:Type of the database. Current allowed values are SimpleDB and XML.
db_user:AWS access key id.
db_passwd:AWS secret access key.
db_name:Database that will be connected to.
db_table:Table name :note: This doesn’t appear to be used.
db_host:Host to connect to
db_port:Port to connect to
enable_ssl:Use SSL

More examples:

[DB]
db_type = SimpleDB
db_user = <aws access key id>
db_passwd = <aws secret access key>
db_name = my_domain
db_table = table
db_host = sdb.amazonaws.com
enable_ssl = True
debug = True

[DB_TestBasic]
db_type = SimpleDB
db_user = <another aws access key id>
db_passwd = <another aws secret access key>
db_name = basic_domain
db_port = 1111
SDB

This section is used to configure SimpleDB

region:Set the region to which SDB should connect

Example:

[SDB]
region = us-west-2
DynamoDB

This section is used to configure DynamoDB

region:Choose the default region
validate_checksums:
 Check checksums returned by DynamoDB

Example:

[DynamoDB]
region = us-west-2
validate_checksums = True

About the Documentation

boto’s documentation uses the Sphinx documentation system, which in turn is based on docutils. The basic idea is that lightly-formatted plain-text documentation is transformed into HTML, PDF, and any other output format.

To actually build the documentation locally, you’ll currently need to install Sphinx – easy_install Sphinx should do the trick.

Then, building the html is easy; just make html from the docs directory.

To get started contributing, you’ll want to read the ReStructuredText Primer. After that, you’ll want to read about the Sphinx-specific markup that’s used to manage metadata, indexing, and cross-references.

The main thing to keep in mind as you write and edit docs is that the more semantic markup you can add the better. So:

Import ``boto`` to your script...

Isn’t nearly as helpful as:

Add :mod:`boto` to your script...

This is because Sphinx will generate a proper link for the latter, which greatly helps readers. There’s basically no limit to the amount of useful markup you can add.

The fabfile

There is a Fabric file that can be used to build and deploy the documentation to a webserver that you ssh access to.

To build and deploy:

cd docs/
fab deploy:remote_path='/var/www/folder/whatever' --hosts=user@host

This will get the latest code from subversion, add the revision number to the docs conf.py file, call make html to build the documentation, then it will tarball it up and scp up to the host you specified and untarball it in the folder you specified creating a symbolic link from the untarballed versioned folder to {remote_path}/boto-docs.

Contributing to Boto

Setting Up a Development Environment

While not strictly required, it is highly recommended to do development in a virtualenv. You can install virtualenv using pip:

$ pip install virtualenv

Once the package is installed, you’ll have a virtualenv command you can use to create a virtual environment:

$ virtualenv venv

You can then activate the virtualenv:

$ . venv/bin/activate

Note

You may also want to check out virtualenvwrapper, which is a set of extensions to virtualenv that makes it easy to manage multiple virtual environments.

A requirements.txt is included with boto which contains all the additional packages needed for boto development. You can install these packages by running:

$ pip install -r requirements.txt

Running the Tests

All of the tests for boto are under the tests/ directory. The tests for boto have been split into two main categories, unit and integration tests:

  • unit - These are tests that do not talk to any AWS services. Anyone should be able to run these tests without have any credentials configured. These are the types of tests that could be run in something like a public CI server. These tests tend to be fast.
  • integration - These are tests that will talk to AWS services, and will typically require a boto config file with valid credentials. Due to the nature of these tests, they tend to take a while to run. Also keep in mind anyone who runs these tests will incur any usage fees associated with the various AWS services.

To run all the unit tests, cd to the tests/ directory and run:

$ python test.py unit

You should see output like this:

$ python test.py unit
................................
----------------------------------------------------------------------
Ran 32 tests in 0.075s

OK

To run the integration tests, run:

$ python test.py integration

Note that running the integration tests may take a while.

Various integration tests have been tagged with service names to allow you to easily run tests by service type. For example, to run the ec2 integration tests you can run:

$ python test.py -t ec2

You can specify the -t argument multiple times. For example, to run the s3 and ec2 tests you can run:

$ python test.py -t ec2 -t s3

Warning

In the examples above no top level directory was specified. By default, nose will assume the current working directory, so the above command is equivalent to:

$ python test.py -t ec2 -t s3 .

Be sure that you are in the tests/ directory when running the tests, or explicitly specify the top level directory. For example, if you in the root directory of the boto repo, you could run the ec2 and s3 tests by running:

$ python tests/test.py -t ec2 -t s3 tests/

You can use nose’s collect plugin to see what tests are associated with each service tag:

$ python tests.py -t s3 -t ec2 --with-id --collect -v
Testing Details

The tests/test.py script is a lightweight wrapper around nose. In general, you should be able to run nosetests directly instead of tests/test.py. The tests/unit and tests/integration args in the commands above were referring to directories. The command line arguments are forwarded to nose when you use tests/test.py. For example, you can run:

$ python tests/test.py -x -vv tests/unit/cloudformation

And the -x -vv tests/unit/cloudformation are forwarded to nose. See the nose docs for the supported command line options, or run nosetests --help.

The only thing that tests/test.py does before invoking nose is to inject an argument that specifies that any testcase tagged with “notdefault” should not be run. A testcase may be tagged with “notdefault” if the test author does not want everyone to run the tests. In general, there shouldn’t be many of these tests, but some reasons a test may be tagged “notdefault” include:

  • An integration test that requires specific credentials.
  • An interactive test (the S3 MFA tests require you to type in the S/N and code).

Tagging is done using nose’s tagging plugin. To summarize, you can tag a specific testcase by setting an attribute on the object. Nose provides an attr decorator for convenience:

from nose.plugins.attrib import attr

@attr('notdefault')
def test_s3_mfs():
    pass

You can then run these tests be specifying:

nosetests -a 'notdefault'

Or you can exclude any tests tagged with ‘notdefault’ by running:

nosetests -a '!notdefault'

Conceptually, tests/test.py is injecting the “-a !notdefault” arg into nosetests.

Testing Supported Python Versions

Boto supports python 2.6 and 2.7. An easy way to verify functionality across multiple python versions is to use tox. A tox.ini file is included with boto. You can run tox with no args and it will automatically test all supported python versions:

$ tox
GLOB sdist-make: boto/setup.py
py26 sdist-reinst: boto/.tox/dist/boto-2.4.1.zip
py26 runtests: commands[0]
................................
----------------------------------------------------------------------
Ran 32 tests in 0.089s

OK
py27 sdist-reinst: boto/.tox/dist/boto-2.4.1.zip
py27 runtests: commands[0]
................................
----------------------------------------------------------------------
Ran 32 tests in 0.087s

OK
____ summary ____
  py26: commands succeeded
  py27: commands succeeded
  congratulations :)

Writing Documentation

The boto docs use sphinx to generate documentation. All of the docs are located in the docs/ directory. To generate the html documentation, cd into the docs directory and run make html:

$ cd docs
$ make html

The generated documentation will be in the docs/build/html directory. The source for the documentation is located in docs/source directory, and uses restructured text for the markup language.

Merging A Branch (Core Devs)

  • All features/bugfixes should go through a review.
    • This includes new features added by core devs themselves. The usual branch/pull-request/merge flow that happens for community contributions should also apply to core.
  • Ensure there is proper test coverage. If there’s a change in behavior, there should be a test demonstrating the failure before the change & passing with the change.
    • This helps ensure we don’t regress in the future as well.
  • Merging of pull requests is typically done with git merge --no-ff <remote/branch_name>.
    • GitHub’s big green button is probably OK for very small PRs (like doc fixes), but you can’t run tests on GH, so most things should get pulled down locally.

Command Line Tools

Introduction

Boto ships with a number of command line utilities, which are installed when the package is installed. This guide outlines which ones are available & what they do.

Note

If you’re not already depending on these utilities, you may wish to check out the AWS-CLI (http://aws.amazon.com/cli/ - User Guide & Reference Guide). It provides much wider & complete access to the AWS services.

The included utilities available are:

asadmin
Works with Autoscaling
bundle_image
Creates a bundled AMI in S3 based on a EC2 instance
cfadmin
Works with CloudFront & invalidations
cq
Works with SQS queues
cwutil
Works with CloudWatch

dynamodb_dump dynamodb_load

Handle dumping/loading data from DynamoDB tables
elbadmin
Manages Elastic Load Balancer instances
fetch_file
Downloads an S3 key to disk
glacier
Lists vaults, jobs & uploads files to Glacier
instance_events
Lists all events for EC2 reservations
kill_instance
Kills a list of EC2 instances
launch_instance
Launches an EC2 instance
list_instances
Lists all of your EC2 instances
lss3
Lists what keys you have within a bucket in S3
mturk
Provides a number of facilities for interacting with Mechanical Turk
pyami_sendmail
Sends an email from the Pyami instance
route53
Interacts with the Route53 service
s3put
Uploads a directory or a specific file(s) to S3
sdbadmin
Allows for working with SimpleDB domains
taskadmin
A tool for working with the tasks in SimpleDB

An Introduction to boto’s Support interface

This tutorial focuses on the boto interface to Amazon Web Services Support, allowing you to programmatically interact with cases created with Support. This tutorial assumes that you have already downloaded and installed boto.

Creating a Connection

The first step in accessing Support is to create a connection to the service. There are two ways to do this in boto. The first is:

>>> from boto.support.connection import SupportConnection
>>> conn = SupportConnection('<aws access key>', '<aws secret key>')

At this point the variable conn will point to a SupportConnection object. In this example, the AWS access key and AWS secret key are passed in to the method explicitly. Alternatively, you can set the environment variables:

AWS_ACCESS_KEY_ID
Your AWS Access Key ID
AWS_SECRET_ACCESS_KEY
Your AWS Secret Access Key

and then call the constructor without any arguments, like this:

>>> conn = SupportConnection()

There is also a shortcut function in boto that makes it easy to create Support connections:

>>> import boto.support
>>> conn = boto.support.connect_to_region('us-west-2')

In either case, conn points to a SupportConnection object which we will use throughout the remainder of this tutorial.

Describing Existing Cases

If you have existing cases or want to fetch cases in the future, you’ll use the SupportConnection.describe_cases method. For example:

>>> cases = conn.describe_cases()
>>> len(cases['cases'])
1
>>> cases['cases'][0]['title']
'A test case.'
>>> cases['cases'][0]['caseId']
'case-...'

You can also fetch a set of cases (or single case) by providing a case_id_list parameter:

>>> cases = conn.describe_cases(case_id_list=['case-1'])
>>> len(cases['cases'])
1
>>> cases['cases'][0]['title']
'A test case.'
>>> cases['cases'][0]['caseId']
'case-...'

Describing Service Codes

In order to create a new case, you’ll need to fetch the service (& category) codes available to you. Fetching them is a simple call to:

>>> services = conn.describe_services()
>>> services['services'][0]['code']
'amazon-cloudsearch'

If you only care about certain services, you can pass a list of service codes:

>>> service_details = conn.describe_services(service_code_list=[
...     'amazon-cloudsearch',
...     'amazon-dynamodb',
... ])

Describing Severity Levels

In order to create a new case, you’ll also need to fetch the severity levels available to you. Fetching them looks like:

>>> severities = conn.describe_severity_levels()
>>> severities['severityLevels'][0]['code']
'low'

Creating a Case

Upon creating a connection to Support, you can now work with existing Support cases, create new cases or resolve them. We’ll start with creating a new case:

>>> new_case = conn.create_case(
...     subject='This is a test case.',
...     service_code='',
...     category_code='',
...     communication_body="",
...     severity_code='low'
... )
>>> new_case['caseId']
'case-...'

For the service_code/category_code parameters, you’ll need to do a SupportConnection.describe_services call, then select the appropriate service code (& appropriate category code within that service) from the response.

For the severity_code parameter, you’ll need to do a SupportConnection.describe_severity_levels call, then select the appropriate severity code from the response.

Adding to a Case

Since the purpose of a support case involves back-and-forth communication, you can add additional communication to the case as well. Providing a response might look like:

>>> result = conn.add_communication_to_case(
...     communication_body="This is a followup. It's working now."
...     case_id='case-...'
... )

Fetching all Communications for a Case

Getting all communications for a given case looks like:

>>> communications = conn.describe_communications('case-...')

Resolving a Case

Once a case is finished, you should mark it as resolved to close it out. Resolving a case looks like:

>>> closed = conn.resolve_case(case_id='case-...')
>>> closed['result']
True

An Introduction to boto’s DynamoDB v2 interface

This tutorial focuses on the boto interface to AWS’ DynamoDB v2. This tutorial assumes that you have boto already downloaded and installed.

Warning

This tutorial covers the SECOND major release of DynamoDB (including local secondary index support). The documentation for the original version of DynamoDB (& boto’s support for it) is at DynamoDB v1.

The v2 DynamoDB API has both a high-level & low-level component. The low-level API (contained primarily within boto.dynamodb2.layer1) provides an interface that rough matches exactly what is provided by the API. It supports all options available to the service.

The high-level API attempts to make interacting with the service more natural from Python. It supports most of the featureset.

The High-Level API

Most of the interaction centers around a single object, the Table. Tables act as a way to effectively namespace your records. If you’re familiar with database tables from an RDBMS, tables will feel somewhat familiar.

Creating a New Table

To create a new table, you need to call Table.create & specify (at a minimum) both the table’s name as well as the key schema for the table:

>>> from boto.dynamodb2.fields import HashKey
>>> from boto.dynamodb2.table import Table
>>> users = Table.create('users', schema=[HashKey('username')]);

Since both the key schema and local secondary indexes can not be modified after the table is created, you’ll need to plan ahead of time how you think the table will be used. Both the keys & indexes are also used for querying, so you’ll want to represent the data you’ll need when querying there as well.

For the schema, you can either have a single HashKey or a combined HashKey+RangeKey. The HashKey by itself should be thought of as a unique identifier (for instance, like a username or UUID). It is typically looked up as an exact value. A HashKey+RangeKey combination is slightly different, in that the HashKey acts like a namespace/prefix & the RangeKey acts as a value that can be referred to by a sorted range of values.

For the local secondary indexes, you can choose from an AllIndex, a KeysOnlyIndex or a IncludeIndex field. Each builds an index of values that can be queried on. The AllIndex duplicates all values onto the index (to prevent additional reads to fetch the data). The KeysOnlyIndex duplicates only the keys from the schema onto the index. The IncludeIndex lets you specify a list of fieldnames to duplicate over.

A full example:

>>> import boto.dynamodb2
>>> from boto.dynamodb2.fields import HashKey, RangeKey, KeysOnlyIndex, GlobalAllIndex
>>> from boto.dynamodb2.table import Table
>>> from boto.dynamodb2.types import NUMBER

# Uses your ``aws_access_key_id`` & ``aws_secret_access_key`` from either a
# config file or environment variable & the default region.
>>> users = Table.create('users', schema=[
...     HashKey('username'), # defaults to STRING data_type
...     RangeKey('last_name'),
... ], throughput={
...     'read': 5,
...     'write': 15,
... }, global_indexes=[
...     GlobalAllIndex('EverythingIndex', parts=[
...         HashKey('account_type'),
...     ],
...     throughput={
...         'read': 1,
...         'write': 1,
...     })
... ],
... # If you need to specify custom parameters, such as credentials or region,
... # use the following:
... # connection=boto.dynamodb2.connect_to_region('us-east-1')
... )
Using an Existing Table

Once a table has been created, using it is relatively simple. You can either specify just the table_name (allowing the object to lazily do an additional call to get details about itself if needed) or provide the schema/indexes again (same as what was used with Table.create) to avoid extra overhead.

Lazy example:

>>> from boto.dynamodb2.table import Table
>>> users = Table('users')

Efficient example:

>>> from boto.dynamodb2.fields import HashKey, RangeKey, GlobalAllIndex
>>> from boto.dynamodb2.table import Table
>>> from boto.dynamodb2.types import NUMBER
>>> users = Table('users', schema=[
...     HashKey('username'),
...     RangeKey('last_name'),
... ], global_indexes=[
...     GlobalAllIndex('EverythingIndex', parts=[
...         HashKey('account_type'),
...     ])
... ])
Creating a New Item

Once you have a Table instance, you can add new items to the table. There are two ways to do this.

The first is to use the Table.put_item method. Simply hand it a dictionary of data & it will create the item on the server side. This dictionary should be relatively flat (as you can nest in other dictionaries) & must contain the keys used in the schema.

Example:

>>> from boto.dynamodb2.table import Table
>>> users = Table('users')

# Create the new user.
>>> users.put_item(data={
...     'username': 'johndoe',
...     'first_name': 'John',
...     'last_name': 'Doe',
...     'account_type': 'standard_user',
... })
True

The alternative is to manually construct an Item instance & tell it to save itself. This is useful if the object will be around for awhile & you don’t want to re-fetch it.

Example:

>>> from boto.dynamodb2.items import Item
>>> from boto.dynamodb2.table import Table
>>> users = Table('users')

# WARNING - This doens't save it yet!
>>> janedoe = Item(users, data={
...     'username': 'janedoe',
...     'first_name': 'Jane',
...     'last_name': 'Doe',
...     'account_type': 'standard_user',
... })

# The data now gets persisted to the server.
>>> janedoe.save()
True
Getting an Item & Accessing Data

With data now in DynamoDB, if you know the key of the item, you can fetch it back out. Specify the key value(s) as kwargs to Table.get_item.

Example:

>>> from boto.dynamodb2.table import Table
>>> users = Table('users')

>>> johndoe = users.get_item(username='johndoe', last_name='Doe')

Once you have an Item instance, it presents a dictionary-like interface to the data.:

# Read a field out.
>>> johndoe['first_name']
'John'

# Change a field (DOESN'T SAVE YET!).
>>> johndoe['first_name'] = 'Johann'

# Delete data from it (DOESN'T SAVE YET!).
>>> del johndoe['account_type']
Updating an Item

Just creating new items or changing only the in-memory version of the Item isn’t particularly effective. To persist the changes to DynamoDB, you have three choices.

The first is sending all the data with the expectation nothing has changed since you read the data. DynamoDB will verify the data is in the original state and, if so, will send all of the item’s data. If that expectation fails, the call will fail:

>>> from boto.dynamodb2.table import Table
>>> users = Table('users')

>>> johndoe = users.get_item(username='johndoe', last_name='Doe')
>>> johndoe['first_name'] = 'Johann'
>>> johndoe['whatever'] = "man, that's just like your opinion"
>>> del johndoe['account_type']

# Affects all fields, even the ones not changed locally.
>>> johndoe.save()
True

The second is a full overwrite. If you can be confident your version of the data is the most correct, you can force an overwrite of the data.:

>>> johndoe = users.get_item(username='johndoe', last_name='Doe')
>>> johndoe['first_name'] = 'Johann'
>>> johndoe['whatever'] = "Man, that's just like your opinion"

# Specify ``overwrite=True`` to fully replace the data.
>>> johndoe.save(overwrite=True)
True

The last is a partial update. If you’ve only modified certain fields, you can send a partial update that only writes those fields, allowing other (potentially changed) fields to go untouched.:

>>> johndoe = users.get_item(username='johndoe', last_name='Doe')
>>> johndoe['first_name'] = 'Johann'
>>> johndoe['whatever'] = "man, that's just like your opinion"
>>> del johndoe['account_type']

# Partial update, only sending/affecting the
# ``first_name/whatever/account_type`` fields.
>>> johndoe.partial_save()
True
Deleting an Item

You can also delete items from the table. You have two choices, depending on what data you have present.

If you already have an Item instance, the easiest approach is just to call Item.delete.:

>>> johndoe.delete()
True

If you don’t have an Item instance & you don’t want to incur the Table.get_item call to get it, you can call Table.delete_item method.:

>>> from boto.dynamodb2.table import Table
>>> users = Table('users')

>>> users.delete_item(username='johndoe', last_name='Doe')
True
Batch Writing

If you’re loading a lot of data at a time, making use of batch writing can both speed up the process & reduce the number of write requests made to the service.

Batch writing involves wrapping the calls you want batched in a context manager. The context manager imitates the Table.put_item & Table.delete_item APIs. Getting & using the context manager looks like:

>>> import time
>>> from boto.dynamodb2.table import Table
>>> users = Table('users')

>>> with users.batch_write() as batch:
...     batch.put_item(data={
...         'username': 'anotherdoe',
...         'first_name': 'Another',
...         'last_name': 'Doe',
...         'date_joined': int(time.time()),
...     })
...     batch.put_item(data={
...         'username': 'joebloggs',
...         'first_name': 'Joe',
...         'last_name': 'Bloggs',
...         'date_joined': int(time.time()),
...     })
...     batch.delete_item(username='janedoe', last_name='Doe')

However, there are some limitations on what you can do within the context manager.

  • It can’t read data at all or do batch any other operations.
  • You can’t put & delete the same data within a batch request.

Note

Additionally, the context manager can only batch 25 items at a time for a request (this is a DynamoDB limitation). It is handled for you so you can keep writing additional items, but you should be aware that 100 put_item calls is 4 batch requests, not 1.

Querying

Warning

The Table object has both a query & a query_2 method. If you are writing new code, DO NOT use Table.query. It presents results in an incorrect order than expected & is strictly present for backward-compatibility.

Manually fetching out each item by itself isn’t tenable for large datasets. To cope with fetching many records, you can either perform a standard query, query via a local secondary index or scan the entire table.

A standard query typically gets run against a hash+range key combination. Filter parameters are passed as kwargs & use a __ to separate the fieldname from the operator being used to filter the value.

In terms of querying, our original schema is less than optimal. For the following examples, we’ll be using the following table setup:

>>> from boto.dynamodb2.fields import HashKey, RangeKey, GlobalAllIndex
>>> from boto.dynamodb2.table import Table
>>> from boto.dynamodb2.types import NUMBER
>>> import time
>>> users = Table.create('users2', schema=[
...     HashKey('account_type'),
...     RangeKey('last_name'),
... ], throughput={
...     'read': 5,
...     'write': 15,
... }, global_indexes=[
...     GlobalAllIndex('DateJoinedIndex', parts=[
...         HashKey('account_type'),
...         RangeKey('date_joined', data_type=NUMBER),
...     ],
...     throughput={
...         'read': 1,
...         'write': 1,
...     }),
... ])

And the following data:

>>> with users.batch_write() as batch:
...     batch.put_item(data={
...         'account_type': 'standard_user',
...         'first_name': 'John',
...         'last_name': 'Doe',
...         'is_owner': True,
...         'email': True,
...         'date_joined': int(time.time()) - (60*60*2),
...     })
...     batch.put_item(data={
...         'account_type': 'standard_user',
...         'first_name': 'Jane',
...         'last_name': 'Doering',
...         'date_joined': int(time.time()) - 2,
...     })
...     batch.put_item(data={
...         'account_type': 'standard_user',
...         'first_name': 'Bob',
...         'last_name': 'Doerr',
...         'date_joined': int(time.time()) - (60*60*3),
...     })
...     batch.put_item(data={
...         'account_type': 'super_user',
...         'first_name': 'Alice',
...         'last_name': 'Liddel',
...         'is_owner': True,
...         'email': True,
...         'date_joined': int(time.time()) - 1,
...     })

When executing the query, you get an iterable back that contains your results. These results may be spread over multiple requests as DynamoDB paginates them. This is done transparently, but you should be aware it may take more than one request.

To run a query for last names starting with the letter “D”:

>>> names_with_d = users.query_2(
...     account_type__eq='standard_user',
...     last_name__beginswith='D'
... )

>>> for user in names_with_d:
...     print user['first_name']
'John'
'Jane'
'Bob'

You can also reverse results (reverse=True) as well as limiting them (limit=2):

>>> rev_with_d = users.query_2(
...     account_type__eq='standard_user',
...     last_name__beginswith='D',
...     reverse=True,
...     limit=2
... )

>>> for user in rev_with_d:
...     print user['first_name']
'Bob'
'Jane'

You can also run queries against the local secondary indexes. Simply provide the index name (index='DateJoinedIndex') & filter parameters against its fields:

# Users within the last hour.
>>> recent = users.query_2(
...     account_type__eq='standard_user',
...     date_joined__gte=time.time() - (60 * 60),
...     index='DateJoinedIndex'
... )

>>> for user in recent:
...     print user['first_name']
'Jane'

By default, DynamoDB can return a large amount of data per-request (up to 1Mb of data). To prevent these requests from drowning other smaller gets, you can specify a smaller page size via the max_page_size argument to Table.query_2 & Table.scan. Doing so looks like:

# Small pages yield faster responses & less potential of drowning other
# requests.
>>> all_users = users.query_2(
...     account_type__eq='standard_user',
...     date_joined__gte=0,
...     index='DateJoinedIndex',
...     max_page_size=10
... )

# Usage is the same, but now many smaller requests are done.
>>> for user in all_users:
...     print user['first_name']
'Bob'
'John'
'Jane'

Finally, if you need to query on data that’s not in either a key or in an index, you can run a Table.scan across the whole table, which accepts a similar but expanded set of filters. If you’re familiar with the Map/Reduce concept, this is akin to what DynamoDB does.

Warning

Scans are eventually consistent & run over the entire table, so relatively speaking, they’re more expensive than plain queries or queries against an LSI.

An example scan of all records in the table looks like:

>>> all_users = users.scan()

Filtering a scan looks like:

>>> owners_with_emails = users.scan(
...     is_owner__eq=True,
...     email__null=False,
... )

>>> for user in owners_with_emails:
...     print user['first_name']
'John'
'Alice'
The ResultSet

Both Table.query_2 & Table.scan return an object called ResultSet. It’s a lazily-evaluated object that uses the Iterator protocol. It delays your queries until you request the next item in the result set.

Typical use is simply a standard for to iterate over the results:

>>> result_set = users.scan()
>>> for user in result_set:
...     print user['first_name']
'John'
'Jane'
'Bob'
'Alice'

However, this throws away results as it fetches more data. As a result, you can’t index it like a list:

>>> len(result_set)
TypeError: object of type 'ResultSet' has no len()

Because it does this, if you need to loop over your results more than once (or do things like negative indexing, length checks, etc.), you should wrap it in a call to list(). Ex.:

>>> result_set = users.scan()
>>> all_users = list(result_set)
# Slice it for every other user.
>>> for user in all_users[::2]:
...     print user['first_name']
'John'
'Bob'

Warning

Wrapping calls like the above in list(...) WILL cause it to evaluate the ENTIRE potentially large data set.

Appropriate use of the limit=... kwarg to Table.query_2 & Table.scan calls are VERY important should you chose to do this.

Alternatively, you can build your own list, using for on the ResultSet to lazily build the list (& potentially stop early).

Parallel Scan

DynamoDB also includes a feature called “Parallel Scan”, which allows you to make use of extra read capacity to divide up your result set & scan an entire table faster.

This does require extra code on the user’s part & you should ensure that you need the speed boost, have enough data to justify it and have the extra capacity to read it without impacting other queries/scans.

To run it, you should pick the total_segments to use, which is an integer representing the number of temporary partitions you’d divide your table into. You then need to spin up a thread/process for each one, giving each thread/process a segment, which is a zero-based integer of the segment you’d like to scan.

An example of using parallel scan to send out email to all users might look something like:

#!/usr/bin/env python
import threading

import boto.ses
import boto.dynamodb2
from boto.dynamodb2.table import Table


AWS_ACCESS_KEY_ID = '<YOUR_AWS_KEY_ID>'
AWS_SECRET_ACCESS_KEY = '<YOUR_AWS_SECRET_KEY>'
APPROVED_EMAIL = 'some@address.com'


def send_email(email):
    # Using Amazon's Simple Email Service, send an email to a given
    # email address. You must already have an email you've verified with
    # AWS before this will work.
    conn = boto.ses.connect_to_region(
        'us-east-1',
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY
    )
    conn.send_email(
        APPROVED_EMAIL,
        "[OurSite] New feature alert!",
        "We've got some exciting news! We added a new feature to...",
        [email]
    )


def process_segment(segment=0, total_segments=10):
    # This method/function is executed in each thread, each getting its
    # own segment to process through.
    conn = boto.dynamodb2.connect_to_region(
        'us-east-1',
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY
    )
    table = Table('users', connection=conn)

    # We pass in the segment & total_segments to scan here.
    for user in table.scan(segment=segment, total_segments=total_segments):
        send_email(user['email'])


def send_all_emails():
    pool = []
    # We're choosing to divide the table in 3, then...
    pool_size = 3

    # ...spinning up a thread for each segment.
    for i in range(pool_size):
        worker = threading.Thread(
            target=process_segment,
            kwargs={
                'segment': i,
                'total_segments': pool_size,
            }
        )
        pool.append(worker)
        # We start them to let them start scanning & consuming their
        # assigned segment.
        worker.start()

    # Finally, we wait for each to finish.
    for thread in pool:
        thread.join()


if __name__ == '__main__':
    send_all_emails()
Batch Reading

Similar to batch writing, batch reading can also help reduce the number of API requests necessary to access a large number of items. The Table.batch_get method takes a list (or any sliceable collection) of keys & fetches all of them, presented as an iterator interface.

This is done lazily, so if you never iterate over the results, no requests are executed. Additionally, if you only iterate over part of the set, the minimum number of calls are made to fetch those results (typically max 100 per response).

Example:

>>> from boto.dynamodb2.table import Table
>>> users = Table('users2')

# No request yet.
>>> many_users = users.batch_get(keys=[
...     {'account_type': 'standard_user', 'last_name': 'Doe'},
...     {'account_type': 'standard_user', 'last_name': 'Doering'},
...     {'account_type': 'super_user', 'last_name': 'Liddel'},
... ])

# Now the request is performed, requesting all five in one request.
>>> for user in many_users:
...     print user['first_name']
'Alice'
'John'
'Jane'
Deleting a Table

Deleting a table is a simple exercise. When you no longer need a table, simply run:

>>> users.delete()
DynamoDB Local

Amazon DynamoDB Local is a utility which can be used to mock DynamoDB during development. Connecting to a running DynamoDB Local server is easy:

#!/usr/bin/env python
from boto.dynamodb2.layer1 import DynamoDBConnection


# Connect to DynamoDB Local
conn = DynamoDBConnection(
    host='localhost',
    port=8000,
    aws_access_key_id='anything',
    aws_secret_access_key='anything',
    is_secure=False)

# List all local tables
tables = conn.list_tables()
Next Steps

You can find additional information about other calls & parameter options in the API docs.

Migrating from DynamoDB v1 to DynamoDB v2

For the v2 release of AWS’ DynamoDB, the high-level API for interacting via boto was rewritten. Since there were several new features added in v2, people using the v1 API may wish to transition their code to the new API. This guide covers the high-level APIs.

Creating New Tables

DynamoDB v1:

>>> import boto.dynamodb
>>> conn = boto.dynamodb.connect_to_region()
>>> message_table_schema = conn.create_schema(
...     hash_key_name='forum_name',
...     hash_key_proto_value=str,
...     range_key_name='subject',
...     range_key_proto_value=str
... )
>>> table = conn.create_table(
...     name='messages',
...     schema=message_table_schema,
...     read_units=10,
...     write_units=10
... )

DynamoDB v2:

>>> from boto.dynamodb2.fields import HashKey
>>> from boto.dynamodb2.fields import RangeKey
>>> from boto.dynamodb2.table import Table

>>> table = Table.create('messages', schema=[
...     HashKey('forum_name'),
...     RangeKey('subject'),
... ], throughput={
...     'read': 10,
...     'write': 10,
... })

Using an Existing Table

DynamoDB v1:

>>> import boto.dynamodb
>>> conn = boto.dynamodb.connect_to_region()
# With API calls.
>>> table = conn.get_table('messages')

# Without API calls.
>>> message_table_schema = conn.create_schema(
...     hash_key_name='forum_name',
...     hash_key_proto_value=str,
...     range_key_name='subject',
...     range_key_proto_value=str
... )
>>> table = conn.table_from_schema(
...     name='messages',
...     schema=message_table_schema)

DynamoDB v2:

>>> from boto.dynamodb2.table import Table
# With API calls.
>>> table = Table('messages')

# Without API calls.
>>> from boto.dynamodb2.fields import HashKey
>>> from boto.dynamodb2.table import Table
>>> table = Table('messages', schema=[
...     HashKey('forum_name'),
...     HashKey('subject'),
... ])

Updating Throughput

DynamoDB v1:

>>> import boto.dynamodb
>>> conn = boto.dynamodb.connect_to_region()
>>> table = conn.get_table('messages')
>>> conn.update_throughput(table, read_units=5, write_units=15)

DynamoDB v2:

>>> from boto.dynamodb2.table import Table
>>> table = Table('messages')
>>> table.update(throughput={
...     'read': 5,
...     'write': 15,
... })

Deleting a Table

DynamoDB v1:

>>> import boto.dynamodb
>>> conn = boto.dynamodb.connect_to_region()
>>> table = conn.get_table('messages')
>>> conn.delete_table(table)

DynamoDB v2:

>>> from boto.dynamodb2.table import Table
>>> table = Table('messages')
>>> table.delete()

Creating an Item

DynamoDB v1:

>>> import boto.dynamodb
>>> conn = boto.dynamodb.connect_to_region()
>>> table = conn.get_table('messages')
>>> item_data = {
...     'Body': 'http://url_to_lolcat.gif',
...     'SentBy': 'User A',
...     'ReceivedTime': '12/9/2011 11:36:03 PM',
... }
>>> item = table.new_item(
...     # Our hash key is 'forum'
...     hash_key='LOLCat Forum',
...     # Our range key is 'subject'
...     range_key='Check this out!',
...     # This has the
...     attrs=item_data
... )

DynamoDB v2:

>>> from boto.dynamodb2.table import Table
>>> table = Table('messages')
>>> item = table.put_item(data={
...     'forum_name': 'LOLCat Forum',
...     'subject': 'Check this out!',
...     'Body': 'http://url_to_lolcat.gif',
...     'SentBy': 'User A',
...     'ReceivedTime': '12/9/2011 11:36:03 PM',
... })

Getting an Existing Item

DynamoDB v1:

>>> table = conn.get_table('messages')
>>> item = table.get_item(
...     hash_key='LOLCat Forum',
...     range_key='Check this out!'
... )

DynamoDB v2:

>>> table = Table('messages')
>>> item = table.get_item(
...     forum_name='LOLCat Forum',
...     subject='Check this out!'
... )

Updating an Item

DynamoDB v1:

>>> item['a_new_key'] = 'testing'
>>> del item['a_new_key']
>>> item.put()

DynamoDB v2:

>>> item['a_new_key'] = 'testing'
>>> del item['a_new_key']

# Conditional save, only if data hasn't changed.
>>> item.save()

# Forced full overwrite.
>>> item.save(overwrite=True)

# Partial update (only changed fields).
>>> item.partial_save()

Deleting an Item

DynamoDB v1:

>>> item.delete()

DynamoDB v2:

>>> item.delete()

Querying

DynamoDB v1:

>>> import boto.dynamodb
>>> conn = boto.dynamodb.connect_to_region()
>>> table = conn.get_table('messages')
>>> from boto.dynamodb.condition import BEGINS_WITH
>>> items = table.query('Amazon DynamoDB',
...                     range_key_condition=BEGINS_WITH('DynamoDB'),
...                     request_limit=1, max_results=1)
>>> for item in