Scan DynamoDB Items with DynamoDBMapper

Previously we covered how to query a DynamoDB database either using DynamoDBMapper or the low level java api.

Apart from issuing queries, DynamoDB also offers Scan functionality.
What scan does, is fetching all the Items you might have on your DynamoDB Table.
Therefore scan does not require any rules based on our partition key or your global/local secondary indexes.
What scan offers is filtering based on the items already fetched and return specific attributes from the items fetched.

The snippet below issues a scan on the Logins table by filtering items with a lower date.

    public List<Login> scanLogins(Long date) {

        Map<String, String> attributeNames = new HashMap<String, String>();
        attributeNames.put("#timestamp", "timestamp");

        Map<String, AttributeValue> attributeValues = new HashMap<String, AttributeValue>();
        attributeValues.put(":from", new AttributeValue().withN(date.toString()));

        DynamoDBScanExpression dynamoDBScanExpression = new DynamoDBScanExpression()
                .withFilterExpression("#timestamp < :from")
                .withExpressionAttributeNames(attributeNames)
                .withExpressionAttributeValues(attributeValues);

        List<Login> logins = dynamoDBMapper.scan(Login.class, dynamoDBScanExpression);

        return logins;
    }

Another great feature of DynamoDBMapper is parallel scan. Parallel scan divides the scan task among multiple workers, one for each logical segment. The workers process the data in parallel and return the results.
Generally the performance of a scan request depends largely on the number of items stored in a DynamoDB table. Therefore parallel scan might lift some of the performance issues of a scan request, since you have to deal with large amounts of data.

    public List<Login> scanLogins(Long date,Integer workers) {

        Map<String, String> attributeNames = new HashMap<String, String>();
        attributeNames.put("#timestamp", "timestamp");

        Map<String, AttributeValue> attributeValues = new HashMap<String, AttributeValue>();
        attributeValues.put(":from", new AttributeValue().withN(date.toString()));

        DynamoDBScanExpression dynamoDBScanExpression = new DynamoDBScanExpression()
                .withFilterExpression("#timestamp < :from")
                .withExpressionAttributeNames(attributeNames)
                .withExpressionAttributeValues(attributeValues);

        List<Login> logins = dynamoDBMapper.parallelScan(Login.class, dynamoDBScanExpression,workers);

        return logins;
    }

Before using scan to our application we have to take into consideration that scan fetches all table items. Therefore It has a high cost both on charges and performance. Also it might consume your provision capacity.
Generally it is better to stick to queries and avoid scans.

You can find full source code with unit tests on github.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s