Empower your Jira information in a knowledge lake with Amazon AppFlow and AWS Glue


On the earth of software program engineering and growth, organizations use challenge administration instruments like Atlassian Jira Cloud. Managing tasks with Jira results in wealthy datasets, which may present historic and predictive insights about challenge and growth efforts.

Though Jira Cloud gives reporting functionality, loading this information into a knowledge lake will facilitate enrichment with different enterprise information, in addition to help using enterprise intelligence (BI) instruments and synthetic intelligence (AI) and machine studying (ML) purposes. Corporations usually take a knowledge lake method to their analytics, bringing information from many alternative methods into one place to simplify how the analytics are carried out.

This submit reveals you methods to use Amazon AppFlow and AWS Glue to create a completely automated information ingestion pipeline that may synchronize your Jira information into your information lake. Amazon AppFlow gives software program as a service (SaaS) integration with Jira Cloud to load the information into your AWS account. AWS Glue is a serverless information discovery, load, and transformation service that may put together information for consumption in BI and AI/ML actions. Moreover, this submit strives to realize a low-code and serverless resolution for operational effectivity and price optimization, and the answer helps incremental loading for value optimization.

Answer overview

This resolution makes use of Amazon AppFlow to retrieve information from the Jira Cloud. The info is synchronized to an Amazon Easy Storage Service (Amazon S3) bucket utilizing an preliminary full obtain and subsequent incremental downloads of adjustments. When new information arrives within the S3 bucket, an AWS Step Capabilities workflow is triggered that orchestrates extract, rework, and cargo (ETL) actions utilizing AWS Glue crawlers and AWS Glue DataBrew. The info is then out there within the AWS Glue Knowledge Catalog and could be queried by companies corresponding to Amazon Athena, Amazon QuickSight, and Amazon Redshift Spectrum. The answer is totally automated and serverless, leading to low operational overhead. When this setup is full, your Jira information shall be robotically ingested and stored updated in your information lake!

The next diagram illustrates the answer structure.

The Jira Appflow Architecture is shown. The Jira Cloud data is retrieved by Amazon AppFlow and is stored in Amazon S3. This triggers an Amazon EventBridge event that runs an AWS Step Functions workflow. The workflow uses AWS Glue to catalog and transform the data, The data is then queried with QuickSight.

The Step Capabilities workflow orchestrates the next ETL actions, leading to two tables:

  • An AWS Glue crawler collects all downloads right into a single AWS Glue desk named jira_raw. This desk is comprised of a mixture of full and incremental downloads from Jira, with many variations of the identical data representing adjustments over time.
  • A DataBrew job prepares the information for reporting by unpacking key-value pairs within the fields, in addition to eradicating depreciated data as they’re up to date in subsequent change information captures. This reporting-ready information will out there in an AWS Glue desk named jira_data.

The next determine reveals the Step Capabilities workflow.

A diagram represents the AWS Step Functions workflow. It contains the steps to run an AWS Crawler, wait for it's completion, and then run a AWS Glue DataBrew data transformation job.

Stipulations

This resolution requires the next:

Configure the Jira Occasion

After logging in to your Jira Cloud occasion, you determine a Jira challenge with related epics and points to obtain into a knowledge lake. When you’re beginning with a brand new Jira occasion, it helps to have a minimum of one challenge with a sampling of epics and points for the preliminary information obtain, as a result of it permits you to create an preliminary dataset with out errors or lacking fields. Notice that you will have a number of tasks as nicely.

An image show a Jira Cloud example, with several issues arranged in a Kansan board.

After you could have established your Jira challenge and populated it with epics and points, make sure you even have entry to the Jira developer portal. In later steps, you employ this developer portal to ascertain authentication and permissions for the Amazon AppFlow connection.

Provision sources with AWS CloudFormation

For the preliminary setup, you launch an AWS CloudFormation stack to create an S3 bucket to retailer information, IAM roles for information entry, and the AWS Glue crawler and Knowledge Catalog parts. Full the next steps:

  1. Check in to your AWS account.
  2. Click on Launch Stack:
  3. For Stack title, enter a reputation for the stack (the default is aws-blog-jira-datalake-with-AppFlow).
  4. For GlueDatabaseName, enter a novel title for the Knowledge Catalog database to carry the Jira information desk metadata (the default is jiralake).
  5. For InitialRunFlag, select Setup. This mode will scan all information and disable the change information seize (CDC) options of the stack. (As a result of that is the preliminary load, the stack wants an preliminary information load earlier than you configure CDC in later steps.)
  6. Beneath Capabilities and transforms, choose the acknowledgement verify bins to permit IAM sources to be created inside your AWS account.
  7. Assessment the parameters and select Create stack to deploy the CloudFormation stack. This course of will take round 5–10 minutes to finish.
    An image depicts the Amazon CloudFormation configuration steps, including setting a stack name, setting parameters to "jiralake" and "Setup" mode, and checking all IAM capabilities requested.
  8. After the stack is deployed, assessment the Outputs tab for the stack and gather the next values to make use of once you arrange Amazon AppFlow:
    • Amazon AppFlow vacation spot bucket (o01AppFlowBucket)
    • Amazon AppFlow vacation spot bucket path (o02AppFlowPath)
    • Position for Amazon AppFlow Jira connector (o03AppFlowRole)
      An image demonstrating the Amazon Cloudformation "Outputs" tab, highlighting the values to add to the Amazon AppFlow configuration.

Configure Jira Cloud

Subsequent, you configure your Jira Cloud occasion for entry by Amazon AppFlow. For full directions, consult with Jira Cloud connector for Amazon AppFlow. The next steps summarize these directions and talk about the precise configuration to allow OAuth within the Jira Cloud:

  1. Open the Jira developer portal.
  2. Create the OAuth 2 integration from the developer utility console by selecting Create an OAuth 2.0 Integration. This can present a login mechanism for AppFlow.
  3. Allow fine-grained permissions. See Really helpful scopes for the permission settings to grant AppFlow applicable entry to your Jira occasion.
  4. Add the next permission scopes to your OAuth app:
    1. handle:jira-configuration
    2. learn:field-configuration:jira
  5. Beneath Authorization, set the Name Again URL to return to Amazon AppFlow with the URL https://us-east-1.console.aws.amazon.com/AppFlow/oauth.
  6. Beneath Settings, be aware the shopper ID and secret to make use of in later steps to arrange authentication from Amazon AppFlow.

Create the Amazon AppFlow Jira Cloud connection

On this step, you configure Amazon AppFlow to run a one-time full information fetch of all of your information, establishing the preliminary information lake:

  1. On the Amazon AppFlow console, select Connectors within the navigation pane.
  2. Seek for the Jira Cloud connector.
  3. Select Create move on the connector tile to create the connection to your Jira occasion.
    An image of Amazon AppFlor, showing the search for the "Jira Cloud" connector.
  4. For Circulate title, enter a reputation for the move (for instance, JiraLakeFlow).
  5. Depart the Knowledge encryption setting because the default.
  6. Select Subsequent.
    The Amazon AppFlow Jira connector configuration, showing the Flow name set to "JiraLakeFlow" and clicking the "next" button.
  7. For Supply title, preserve the default of Jira Cloud.
  8. Select Create new connection beneath Jira Cloud connection.
  9. Within the Connect with Jira Cloud part, enter the values for Consumer ID, Consumer secret, and Jira Cloud Website that you simply collected earlier. This gives the authentication from AppFlow to Jira Cloud.
  10. For Connection Title, enter a connection title (for instance, JiraLakeCloudConnection).
  11. Select Join. You’ll be prompted to permit your OAuth app to entry your Atlassian account to confirm authentication.
    An image of the Amazon AppFlow conflagration, reflecting the completion of the prior steps.
  12. Within the Authorize App window that pops up, select Settle for.
  13. With the connection created, return to the Configure move part on the Amazon AppFlow console.
  14. For API model, select V2 to make use of the newest Jira question API.
  15. For Jira Cloud object, select Subject to question and obtain all points and related particulars.
    An image of the Amazon AppFlow configuration, reflecting the completion of the prior steps.
  16. For Vacation spot Title within the Vacation spot Particulars part, select Amazon S3.
  17. For Bucket particulars, select the S3 bucket title that matches the Amazon AppFlow vacation spot bucket worth that you simply collected from the outputs of the CloudFormation stack.
  18. Enter the Amazon AppFlow vacation spot bucket path to finish the total S3 path. This can ship the Jira information to the S3 bucket created by the CloudFormation script.
  19. Depart Catalog your information within the AWS Glue Knowledge Catalog unselected. The CloudFormation script makes use of an AWS Glue crawler to replace the Knowledge Catalog in a distinct method, grouping all of the downloads into a standard desk, so we disable the replace right here.
  20. For File format settings, choose Parquet format and choose Protect supply information sorts in Parquet output. Parquet is a columnar format to optimize subsequent querying.
  21. Choose Add a timestamp to the file title for Filename desire. This can assist you to simply discover information recordsdata downloaded at a selected date and time.
    An image of the Amazon AppFlow configuration, reflecting the completion of the prior steps.
  22. For now, choose Run on Demand for the Circulate set off to run the total load move manually. You’ll schedule downloads in a later step when implementing CDC.
  23. Select Subsequent.
    An image of the Amazon AppFlow Flow Trigger configuration, reflecting the completion of the prior steps.
  24. On the Map information fields web page, choose Manually map fields.
  25. For Supply to vacation spot discipline mapping, select the drop-down field beneath Supply discipline title and choose Map all fields straight. This can carry down all fields as they’re obtained, as a result of we are going to as an alternative implement information preparation in later steps.
    An image of the Amazon AppFlow configuration, reflecting the completion of steps 24 & 25.
  26. Beneath Partition and aggregation settings, you may arrange the partitions in a means that works on your use case. For this instance, we use a every day partition, so choose Date and time and select Each day.
  27. For Aggregation settings, go away it because the default of Don’t combination.
  28. Select Subsequent.
    An image of the Amazon AppFlow configuration, reflecting the completion of steps 26-28.
  29. On the Add filters web page, you may create filters to solely obtain particular information. For this instance, you obtain all the information, so select Subsequent.
  30. Assessment and select Create move.
  31. When the move is created, select Run move to start out the preliminary information seeding. After a while, it is best to obtain a banner indicating the run completed efficiently.
    An image of the Amazon AppFlow configuration, reflecting the completion of step 31.

Assessment seed information

At this stage within the course of, you now have information in your S3 setting. When new information recordsdata are created within the S3 bucket, it should robotically run an AWS Glue crawler to catalog the brand new information. You’ll be able to see if it’s full by reviewing the Step Capabilities state machine for a Succeeded run standing. There’s a hyperlink to the state machine on the CloudFormation stack’s Assets tab, which is able to redirect you to the Step Capabilities state machine.

A image showing the CloudFormation resources tab of the stack, with a link to the AWS Step Functions workflow.

When the state machine is full, it’s time to assessment the uncooked Jira information with Athena. The database is as you specified within the CloudFormation stack (jiralake by default), and the desk title is jira_raw. When you stored the default AWS Glue database title of jiralake, the Athena SQL is as follows:

SELECT * FROM "jiralake"."jira_raw" restrict 10;

When you discover the information, you’ll discover that a lot of the information you’d need to work with is definitely packed right into a column referred to as fields. This implies the information will not be out there as columns in your Athena queries, making it more durable to pick out, filter, and kind particular person fields inside an Athena SQL question. This shall be addressed within the subsequent steps.

An image demonstrating the Amazon Athena query SELECT * FROM "jiralake"."jira_raw" limit 10;

Arrange CDC and unpack the fields columns

So as to add the continuing CDC and reformat the information for analytics, we introduce a DataBrew job to rework the information and filter to the latest model of every report as adjustments are available. You are able to do this by updating the CloudFormation stack with a flag that features the CDC and information transformation steps.

  1. On the AWS CloudFormation console, return to the stack.
  2. Select Replace.
  3. Choose Use present template and select Subsequent.
    An image showing Amazon CloudFormation, with steps 1-3 complete.
  4. For SetupOrCDC, select CDC, then select Subsequent. This can allow each the CDC steps and the information transformation steps for the Jira information.
    An image showing Amazon CloudFormation, with step 4 complete.
  5. Proceed selecting Subsequent till you attain the Assessment part.
  6. Choose I acknowledge that AWS CloudFormation may create IAM sources, then select Submit.
    An image showing Amazon CloudFormation, with step 5-6 complete.
  7. Return to the Amazon AppFlow console and open your move.
  8. On the Actions menu, select Edit move. We are going to now edit the move set off to run an incremental load on a periodic foundation.
  9. Choose Run move on schedule.
  10. Configure the specified repeats, in addition to begin time and date. For this instance, we select Each day for Repeats and enter 1 for the variety of days you’ll have the move set off. For Beginning at, enter 01:00.
  11. Choose Incremental switch for Switch mode.
  12. Select Up to date on the drop-down menu in order that adjustments shall be captured based mostly on when the data have been up to date.
  13. Select Save. With these settings in our instance, the run will occur nightly at 1:00 AM.
    An image showing the Flow Trigger, with incremental transfer selected.

Assessment the analytics information

When the subsequent incremental load happens that leads to new information, the Step Capabilities workflow will begin the DataBrew job and populate a brand new staged analytical information desk named jira_data in your Knowledge Catalog database. When you don’t need to wait, you may set off the Step Capabilities workflow manually.

The DataBrew job performs information transformation and filtering duties. The job unpacks the key-values from the Jira JSON information and the uncooked Jira information, leading to a tabular information schema that facilitates use with BI and AI/ML instruments. As Jira objects are modified, the modified merchandise’s information is resent, leading to a number of variations of an merchandise within the uncooked information feed. The DataBrew job filters the uncooked information feed in order that the ensuing information desk solely comprises the latest model of every merchandise. You can improve this DataBrew job to additional customise the information on your wants, corresponding to renaming the generic Jira customized discipline names to mirror their enterprise which means.

When the Step Capabilities workflow is full, we will question the information in Athena once more utilizing the next question:

SELECT * FROM "jiralake"."jira_data" restrict 10;

You’ll be able to see that in our reworked jira_data desk, the nested JSON fields are damaged out into their very own columns for every discipline. Additionally, you will discover that we’ve filtered out out of date data which were outdated by newer report updates in later information hundreds so the information is recent. If you wish to rename customized fields, take away columns, or restructure what comes out of the nested JSON, you may modify the DataBrew recipe to perform this. At this level, the information is prepared for use by your analytics instruments, corresponding to Amazon QuickSight.

An image demonstrating the Amazon Athena query SELECT * FROM "jiralake"."jira_data" limit 10;

Clear up

If you want to discontinue this resolution, you may take away it with the next steps:

  1. On the Amazon AppFlow console, deactivate the move for Jira, and optionally delete it.
  2. On the Amazon S3 console, choose the S3 bucket for the stack, and empty the bucket to delete the present information.
  3. On the AWS CloudFormation console, delete the CloudFormation stack that you simply deployed.

Conclusion

On this submit, we created a serverless incremental information load course of for Jira that may synchronize information whereas dealing with customized fields utilizing Amazon AppFlow, AWS Glue, and Step Capabilities. The method makes use of Amazon AppFlow to incrementally load the information into Amazon S3. We then use AWS Glue and Step Capabilities to handle the extraction of the Jira customized fields and cargo them in a format to be queried by analytics companies corresponding to Athena, QuickSight, or Redshift Spectrum, or AI/ML companies like Amazon SageMaker.

To be taught extra about AWS Glue and DataBrew, consult with Getting began with AWS Glue DataBrew. With DataBrew, you may take the pattern information transformation on this challenge and customise the output to fulfill your particular wants. This might embody renaming columns, creating further fields, and extra.

To be taught extra about Amazon AppFlow, consult with Getting began with Amazon AppFlow. Notice that Amazon AppFlow helps integrations with many SaaS purposes along with the Jira Cloud.

To be taught extra about orchestrating flows with Step Capabilities, see Create a Serverless Workflow with AWS Step Capabilities and AWS Lambda. The workflow may very well be enhanced to load the information into a knowledge warehouse, corresponding to Amazon Redshift, or set off a refresh of a QuickSight dataset for analytics and reporting.

In future posts, we are going to cowl methods to unnest parent-child relationships inside the Jira information utilizing Athena and methods to visualize the information utilizing QuickSight.


In regards to the Authors

Tom Romano is a Sr. Options Architect for AWS World Huge Public Sector from Tampa, FL, and assists GovTech and EdTech clients as they create new options which are cloud native, occasion pushed, and serverless. He’s an enthusiastic Python programmer for each utility growth and information analytics, and is an Analytics Specialist. In his free time, Tom flies distant management mannequin airplanes and enjoys vacationing along with his household round Florida and the Caribbean.

Shane Thompson is a Sr. Options Architect based mostly out of San Luis Obispo, California, working with AWS Startups. He works with clients who use AI/ML of their enterprise mannequin and is keen about democratizing AI/ML so that each one clients can profit from it. In his free time, Shane likes to spend time along with his household and journey around the globe.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles