Centralize near-real-time governance via alerts on Amazon Redshift information warehouses for delicate queries


Amazon Redshift is a totally managed, petabyte-scale information warehouse service within the cloud that delivers highly effective and safe insights on all of your information with the perfect price-performance. With Amazon Redshift, you possibly can analyze your information to derive holistic insights about your corporation and your clients. In lots of organizations, one or a number of Amazon Redshift information warehouses run each day for information and analytics functions. Subsequently, over time, a number of Knowledge Definition Language (DDL) or Knowledge Management Language (DCL) queries, reminiscent of CREATE, ALTER, DROP, GRANT, or REVOKE SQL queries, are run on the Amazon Redshift information warehouse, that are delicate in nature as a result of they may result in dropping tables or deleting information, inflicting disruptions or outages. Monitoring such consumer queries as a part of the centralized governance of the information warehouse helps stakeholders perceive potential dangers and take immediate motion to mitigate them following the operational excellence pillar of the AWS Knowledge Analytics Lens. Subsequently, for a strong governance mechanism, it’s essential to alert or notify the database and safety directors on the form of delicate queries which might be run on the information warehouse, in order that immediate remediation actions might be taken if wanted.

To deal with this, on this submit we present you how one can automate near-real-time notifications over a Slack channel when sure queries are run on the information warehouse. We additionally create a easy governance dashboard utilizing a mix of Amazon DynamoDB, Amazon Athena, and Amazon QuickSight.

Answer overview

An Amazon Redshift information warehouse logs details about connections and consumer actions going down in databases, which helps monitor the database for safety and troubleshooting functions. These logs might be saved in Amazon Easy Storage Service (Amazon S3) buckets or Amazon CloudWatch. Amazon Redshift logs info within the following log information, and this answer relies on utilizing an Amazon Redshift audit log to CloudWatch as a vacation spot:

  • Connection log – Logs authentication makes an attempt, connections, and disconnections
  • Consumer log – Logs details about modifications to database consumer definitions
  • Consumer exercise log – Logs every question earlier than it’s run on the database

The next diagram illustrates the answer structure.

Solution Architecture

The answer workflow consists of the next steps:

  1. Audit logging is enabled in every Amazon Redshift information warehouse to seize the consumer exercise log in CloudWatch.
  2. Subscription filters on CloudWatch seize the required DDL and DCL instructions by offering filter standards.
  3. The subscription filter triggers an AWS Lambda perform for sample matching.
  4. The Lambda perform processes the occasion information and sends the notification over a Slack channel utilizing a webhook.
  5. The Lambda perform shops the information in a DynamoDB desk over which a easy dashboard is constructed utilizing Athena and QuickSight.

Conditions

Earlier than beginning the implementation, be sure the next necessities are met:

  • You’ve gotten an AWS account.
  • The AWS Area used for this submit is us-east-1. Nevertheless, this answer is related in every other Area the place the required AWS companies can be found.
  • Permissions to create Slack a workspace.

Create and configure an Amazon Redshift cluster

To arrange your cluster, full the next steps:

  1. Create a provisioned Amazon Redshift information warehouse.

For this submit, we use three Amazon Redshift information warehouses: demo-cluster-ou1, demo-cluster-ou2, and demo-cluster-ou3. On this submit, all of the Amazon Redshift information warehouses are provisioned clusters. Nevertheless, the identical answer applies for Amazon Redshift Serverless.

  1. To allow audit logging with CloudWatch because the log supply vacation spot, open an Amazon Redshift cluster and go to the Properties tab.
  2. On the Edit menu, select Edit audit logging.

Redshift edit audit logging

  1. Choose Activate beneath Configure audit logging.
  2. Choose CloudWatch for Log export kind.
  3. Choose all three choices for Consumer log, Connection log, and Consumer exercise log.
  4. Select Save modifications.

  1. Create a parameter group for the clusters with enable_user_activity_logging set as true for every of the clusters.
  2. Modify the cluster to connect the brand new parameter group to the Amazon Redshift cluster.

For this submit, we create three {custom} parameter teams: custom-param-grp-1, custom-param-grp-2, and custom-param-grp-3 for 3 clusters.

Observe, for those who allow solely the audit logging characteristic, however not the related parameter, the database audit logs log info for under the connection log and consumer log, however not for the consumer exercise log.

  1. On the CloudWatch console, select Log teams beneath Logs within the navigation pane.
  2. Seek for /aws/redshift/cluster/demo.

This may present all of the log teams created for the Amazon Redshift clusters.

Create a DynamoDB audit desk

To create your audit desk, full the next steps:

  1. On the DynamoDB console, select Tables within the navigation pane.
  2. Select Create desk.
  3. For Desk identify, enter demo_redshift_audit_logs.
  4. For Partition key, enter partKey with the information kind as String.

  1. Hold the desk settings as default.
  2. Select Create desk.

Create Slack sources

Slack Incoming Webhooks count on a JSON request with a message string comparable to a "textual content" key. Additionally they help message customization, reminiscent of including a consumer identify and icon, or overriding the webhook’s default channel. For extra info, see Sending messages utilizing Incoming Webhooks on the Slack web site.

The next sources are created for this submit:

  • A Slack workspace named demo_rc
  • A channel named #blog-demo within the newly created Slack workspace
  • A brand new Slack app within the Slack workspace named demo_redshift_ntfn (utilizing the From Scratch choice)
  • Observe down the Incoming Webhook URL, which might be used on this submit for sending the notifications

Create an IAM position and coverage

On this part, we create an AWS Identification and Entry Administration (IAM) coverage that might be hooked up to an IAM position. The position is then used to grant a Lambda perform entry to a DynamoDB desk. The coverage additionally consists of permissions to permit the Lambda perform to write down log information to Amazon CloudWatch Logs.

  1. On the IAM console, select Insurance policies in navigation pane.
  2. Select Create coverage.
  3. Within the Create coverage part, select the JSON tab and enter the next IAM coverage. Ensure you substitute your AWS account ID within the coverage (substitute XXXXXXXX together with your AWS account ID).
{
    "Model": "2012-10-17",
    "Assertion": [
        {
            "Sid": "ReadWriteTable",
            "Effect": "Allow",
            "Action": [
                "dynamodb:BatchGetItem",
                "dynamodb:BatchWriteItem",
                "dynamodb:PutItem",
                "dynamodb:GetItem",
                "dynamodb:Scan",
                "dynamodb:Query",
                "dynamodb:UpdateItem",
                "logs:PutLogEvents"
            ],
            "Useful resource": [
                "arn:aws:dynamodb:us-east-1:XXXXXXXX:table/demo_redshift_audit_logs",
                "arn:aws:logs:*:XXXXXXXX:log-group:*:log-stream:*"
            ]
        },
        {
            "Sid": "WriteLogStreamsAndGroups",
            "Impact": "Permit",
            "Motion": [
                "logs:CreateLogStream",
                "logs:CreateLogGroup"
            ],
            "Useful resource": "arn:aws:logs:*:XXXXXXXX:log-group:*"
        }
    ]
}

  1. Select Subsequent: Tags, then select Subsequent: Overview.
  2. Present the coverage identify demo_post_policy and select Create coverage.

To use demo_post_policy to a Lambda perform, you first have to connect the coverage to an IAM position.

  1. On the IAM console, select Roles within the navigation pane.
  2. Select Create position.
  3. Choose AWS service after which choose Lambda.
  4. Select Subsequent.

  1. On the Add permissions web page, seek for demo_post_policy.
  2. Choose demo_post_policy from the record of returned search outcomes, then select Subsequent.

  1. On the Overview web page, enter demo_post_role for the position and an acceptable description, then select Create position.

Create a Lambda perform

We create a Lambda perform with Python 3.9. Within the following code, substitute the slack_hook parameter with the Slack webhook you copied earlier:

import gzip
import base64
import json
import boto3
import uuid
import re
import urllib3

http = urllib3.PoolManager()
dynamodb = boto3.useful resource('dynamodb')
desk = dynamodb.Desk("demo_redshift_audit_logs")
slack_hook = "https://hooks.slack.com/companies/xxxxxxx"

def exe_wrapper(information):
    cluster_name = (information['logStream'])
    for occasion in information['logEvents']:
        message = occasion['message']
        reg = re.match(r"'(?P<ts>d{4}-dd-ddTdd:dd:ddZ).*?bdb=(?P<db>S*).*?buser=(?P<consumer>S*).*?LOG:s+(?P<question>.*?);?$", message)
        if reg shouldn't be None:
            filter = reg.groupdict()
            ts = filter['ts']
            db = filter['db']
            consumer = filter['user']
            question = filter['query']
            query_type=" ".be a part of((question.break up(" "))[0 : 2]).higher()
            object = question.break up(" ")[2]
            put_dynamodb(ts,cluster_name,db,consumer,question,query_type,object,message)
            slack_api(cluster_name,db,consumer,question,query_type,object)
            
def put_dynamodb(timestamp,cluster,database,consumer,sql,query_type,object,occasion):
    desk.put_item(Merchandise = {
        'partKey': str(uuid.uuid4()),
        'redshiftCluster': cluster,
        'sqlTimestamp' : timestamp,
        'databaseName' : database,
        'userName': consumer,
        'sqlQuery': sql,
        'queryType' : query_type,
        'objectName': object,
        'rawData': occasion
        })
        
def slack_api(cluster,database,consumer,sql,query_type,object):
    payload = {
	'channel': '#blog-demo',
	'username': 'demo_redshift_ntfn',
	'blocks': [{
			'type': 'section',
			'text': {
				'type': 'mrkdwn',
				'text': 'Detected *{}* commandn *Affected Object*: `{}`'.format(query_type, object)
			}
		},
		{
			'type': 'divider'
		},
		{
			'type': 'section',
			'fields': [{
					'type': 'mrkdwn',
					'text': ':desktop_computer: *Cluster Name:*n{}'.format(cluster)
				},
				{
					'type': 'mrkdwn',
					'text': ':label: *Query Type:*n{}'.format(query_type)
				},
				{
					'type': 'mrkdwn',
					'text': ':card_index_dividers: *Database Name:*n{}'.format(database)
				},
				{
					'type': 'mrkdwn',
					'text': ':technologist: *User Name:*n{}'.format(user)
				}
			]
		},
		{
			'kind': 'part',
			'textual content': {
				'kind': 'mrkdwn',
				'textual content': ':page_facing_up: *SQL Question*n ```{}```'.format(sql)
			}
		}
	]
	}
    encoded_msg = json.dumps(payload).encode('utf-8')
    resp = http.request('POST',slack_hook, physique=encoded_msg)
    print(encoded_msg) 

def lambda_handler(occasion, context):
    print(f'Logging Occasion: {occasion}')
    print(f"Awslog: {occasion['awslogs']}")
    encoded_zipped_data = occasion['awslogs']['data']
    print(f'information: {encoded_zipped_data}')
    print(f'kind: {kind(encoded_zipped_data)}')
    zipped_data = base64.b64decode(encoded_zipped_data)
    information = json.masses(gzip.decompress(zipped_data))
    exe_wrapper(information)

Create your perform with the next steps:

  1. On the Lambda console, select Create perform.
  2. Choose Writer from scratch and for Perform identify, enter demo_function.
  3. For Runtime, select Python 3.9.
  4. For Execution position, choose Use an present position and select demo_post_role because the IAM position.
  5. Select Create perform.

  1. On the Code tab, enter the previous Lambda perform and substitute the Slack webhook URL.
  2. Select Deploy.

Create a CloudWatch subscription filter

We have to create the CloudWatch subscription filter on the useractivitylog log group created by the Amazon Redshift clusters.

  1. On the CloudWatch console, navigate to the log group /aws/redshift/cluster/demo-cluster-ou1/useractivitylog.
  2. On the Subscription filters tab, on the Create menu, select Create Lambda subscription filter.

  1. Select demo_function because the Lambda perform.
  2. For Log format, select Different.
  3. Present the subscription filter sample as ?create ?alter ?drop ?grant ?revoke.
  4. Present the filter identify as Delicate Queries demo-cluster-ou1.
  5. Take a look at the filter by deciding on the precise log stream. If it has any queries with a match sample, then you possibly can see some outcomes. For testing, use the next sample and select Take a look at sample.
'2023-04-02T04:18:43Z UTC [ db=dev user=awsuser pid=100 userid=100 xid=100 ]' LOG: alter desk my_table alter column string kind varchar(16);
'2023-04-02T04:06:08Z UTC [ db=dev user=awsuser pid=100 userid=100 xid=200 ]' LOG: create consumer rs_user with password '***';

  1. Select Begin streaming.

  1. Repeat the identical steps for /aws/redshift/cluster/demo-cluster-ou2/useractivitylog and /aws/redshift/cluster/demo-cluster-ou3/useractivitylog by giving distinctive subscription filter names.
  2. Full the previous steps to create a second subscription filter for every of the Amazon Redshift information warehouses with the filter sample ?CREATE ?ALTER ?DROP ?GRANT ?REVOKE, guaranteeing uppercase SQL instructions are additionally captured via this answer.

Take a look at the answer

On this part, we check the answer within the three Amazon Redshift clusters that we created within the earlier steps and examine for the notifications of the instructions on the Slack channel as per the CloudWatch subscription filters in addition to information getting ingested within the DynamoDB desk. We use the next instructions to check the answer; nevertheless, this isn’t restricted to those instructions solely. You possibly can examine with different DDL instructions as per the filter standards in your Amazon Redshift cluster.

create schema gross sales;
create schema advertising and marketing;
create desk dev.public.demo_test_table_1  (id int, string varchar(10));
create desk dev.public.demo_test_table_2  (empid int, empname varchar(100));
alter desk dev.public.class alter column catdesc kind varchar(65);
drop desk dev.public.demo_test_table_1;
drop desk dev.public.demo_test_table_2;

Within the Slack channel, particulars of the notifications appear like the next screenshot.

To get the ends in DynamoDB, full the next steps:

  1. On the DynamoDB console, select Discover objects beneath Tables within the navigation pane.
  2. Within the Tables pane, choose demo_redshift_audit_logs.
  3. Choose Scan and Run to get the ends in the desk.

Athena federation over the DynamoDB desk

The Athena DynamoDB connector permits Athena to speak with DynamoDB as a way to question your tables with SQL. As a part of the conditions for this, deploy the connector to your AWS account utilizing the Athena console or the AWS Serverless Utility Repository. For extra particulars, confer with Deploying an information supply connector or Utilizing the AWS Serverless Utility Repository to deploy an information supply connector. For this submit, we use the Athena console.

  1. On the Athena console, beneath Administration within the navigation pane, select Knowledge sources.
  2. Select Create information supply.

  1. Choose the information supply as Amazon DynamoDB, then select Subsequent.

  1. For Knowledge supply identify, enter dynamo_db.
  2. For Lambda perform, select Create Lambda perform to open a brand new window with the Lambda console.

  1. Beneath Utility settings, enter the next info:
    • For Utility identify, enter AthenaDynamoDBConnector.
    • For SpillBucket, enter the identify of an S3 bucket.
    • For AthenaCatalogName, enter dynamo.
    • For DisableSpillEncryption, enter false.
    • For LambdaMemory, enter 3008.
    • For LambdaTimeout, enter 900.
    • For SpillPrefix, enter athena-spill-dynamo.

  1. Choose I acknowledge that this app creates {custom} IAM roles and select Deploy.
  2. Look ahead to the perform to deploy, then return to the Athena window and select the refresh icon subsequent to Lambda perform.
  3. Choose the newly deployed Lambda perform and select Subsequent.

  1. Overview the data and select Create information supply.
  2. Navigate again to the question editor, then select dynamo_db for Knowledge supply and default for Database.
  3. Run the next question within the editor to examine the pattern information:
SELECT partkey,
       redshiftcluster,
       databasename,
       objectname,
       username,
       querytype,
       sqltimestamp,
       sqlquery,
       rawdata
FROM dynamo_db.default.demo_redshift_audit_logs restrict 10;

Visualize the information in QuickSight

On this part, we create a easy governance dashboard in QuickSight utilizing Athena in direct question mode to question the report set, which is persistently saved in a DynamoDB desk.

  1. Join QuickSight on the QuickSight console.
  2. Choose Amazon Athena as a useful resource.
  3. Select Lambda and choose the Lambda perform created for DynamoDB federation.

  1. Create a brand new dataset in QuickSight with Athena because the supply.
  2. Present the identify of the information supply identify as demo_blog.
  3. Select dynamo_db for Catalog, default for Database, and demo_redshift_audit_logs for Desk.
  4. Select Edit/Preview information.

  1. Select String within the sqlTimestamp column and select Date.

  1. Within the dialog field that seems, enter the information format yyyy-MM-dd'T'HH:mm:ssZZ.
  2. Select Validate and Replace.

  1. Select PUBLISH & VISUALIZE.
  2. Select Interactive sheet and select CREATE.

This may take you to the visualization web page to create the evaluation on QuickSight.

  1. Create a governance dashboard with the suitable visualization kind.

Check with the Amazon QuickSight studying movies in QuickSight group for fundamental to superior degree of authoring. The next screenshot is a pattern visualization created on this information.

Clear up

Clear up your sources with the next steps:

  1. Delete all of the Amazon Redshift clusters.
  2. Delete the Lambda perform.
  3. Delete the CloudWatch log teams for Amazon Redshift and Lambda.
  4. Delete the Athena information supply for DynamoDB.
  5. Delete the DynamoDB desk.

Conclusion

Amazon Redshift is a strong, totally managed information warehouse that may provide considerably elevated efficiency and decrease value within the cloud. On this submit, we mentioned a sample to implement a governance mechanism to establish and notify delicate DDL/DCL queries on an Amazon Redshift information warehouse, and created a fast dashboard to allow the DBA and safety staff to take well timed and immediate motion as required. Moreover, you possibly can lengthen this answer to incorporate DDL instructions used for Amazon Redshift information sharing throughout clusters.

Operational excellence is a crucial a part of the general information governance on creating a contemporary information structure, because it’s an amazing enabler to drive our clients’ enterprise. Ideally, any information governance implementation is a mix of individuals, course of, and know-how that organizations use to make sure the standard and safety of their information all through its lifecycle. Use these directions to arrange your automated notification mechanism as delicate queries are detected in addition to create a fast dashboard on QuickSight to trace the actions over time.


In regards to the Authors

Rajdip Chaudhuri is a Senior Options Architect with Amazon Net Companies specializing in information and analytics. He enjoys working with AWS clients and companions on information and analytics necessities. In his spare time, he enjoys soccer and flicks.

Dhiraj Thakur is a Options Architect with Amazon Net Companies. He works with AWS clients and companions to supply steering on enterprise cloud adoption, migration, and technique. He’s keen about know-how and enjoys constructing and experimenting within the analytics and AI/ML area.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles