Discover real-world use circumstances for Amazon CodeWhisperer powered by AWS Glue Studio notebooks

Many shoppers are inquisitive about boosting productiveness of their software program improvement lifecycle by utilizing generative AI. Lately, AWS introduced the final availability of Amazon CodeWhisperer, an AI coding companion that makes use of foundational fashions underneath the hood to enhance software program developer productiveness. With Amazon CodeWhisperer, you may rapidly settle for the highest suggestion, view extra options, or proceed writing your personal code. This integration reduces the general time spent in writing knowledge integration and extract, rework, and cargo (ETL) logic. It additionally helps beginner-level programmers write their first traces of code. AWS Glue Studio notebooks means that you can creator knowledge integration jobs with a web-based serverless pocket book interface.

On this submit, we talk about real-world use circumstances for CodeWhisperer powered by AWS Glue Studio notebooks.

Answer overview

For this submit, you utilize the CSV eSports Earnings dataset, obtainable to obtain by way of Kaggle. The information is scraped from eSportsEarnings.com, which gives info on earnings of eSports gamers and groups. The target is to carry out transformations utilizing an AWS Glue Studio pocket book with CodeWhisperer suggestions after which write the info again to Amazon Easy Storage Service (Amazon S3) in Parquet file format in addition to to Amazon Redshift.

Conditions

Our answer has the next stipulations:

Arrange AWS Glue Studio.

Configure an AWS Id and Entry Administration (IAM) position to work together with CodeWhisperer. Connect the next coverage to your IAM position that’s connected to the AWS Glue Studio pocket book:

{
    "Model": "2012-10-17",
    "Assertion": [{
        "Sid": "CodeWhispererPermissions",
        "Effect": "Allow",
        "Action": [
            "codewhisperer:GenerateRecommendations"
        ],
        "Useful resource": "*"
    }]
}

Obtain the CSV eSports Earnings dataset and add the CSV file highest_earning_players.csv to the S3 folder you’ll be utilizing on this use case.

Create an AWS Glue Studio pocket book

Let’s get began. Create a brand new AWS Glue Studio pocket book job by finishing the next steps:

On the AWS Glue console, select Notebooks underneath ETL jobs within the navigation pane.
Choose Jupyter Pocket book and select Create.
For Job title, enter CodeWhisperer-s3toJDBC.

A brand new pocket book shall be created with the pattern cells as proven within the following screenshot.

We use the second cell for now, so you may take away all the opposite cells.

Within the second cell, replace the interactive session configuration by setting the next:
1. Employee sort to G.1X
2. Variety of staff to three
3. AWS Glue model to 4.0

Furthermore, import the DynamicFrame module and current_timestamp perform as follows:

from pyspark.sql.capabilities import current_timestamp
from awsglue.dynamicframe import DynamicFrame

After you make these modifications, the pocket book ought to be wanting like the next screenshot.

Now, let’s guarantee CodeWhisperer is working as supposed. On the backside proper, you can find the CodeWhisperer possibility beside the Glue PySpark standing, as proven within the following screenshot.

You may select CodeWhisperer to view the choices to make use of Auto-Options.

Develop your code utilizing CodeWhisperer in an AWS Glue Studio pocket book

On this part, we present find out how to develop an AWS Glue pocket book job for Amazon S3 as a knowledge supply and JDBC knowledge sources as a goal. For our use case, we have to guarantee Auto-Options are enabled. Write your advice utilizing CodeWhisperer utilizing the next steps:

Write a remark in pure language (in English) to learn Parquet recordsdata out of your S3 bucket:

After you enter the previous remark and press Enter, the CodeWhisperer button on the finish of the web page will present that it’s operating to write down the advice. The output of the CodeWhisperer advice will seem within the subsequent line and the code is chosen after you press Tab. You may study extra in Person actions.

After you enter the previous remark, CodeWhisperer will generate a code snippet that’s much like the next:

df = (spark.learn.format("csv")
      .possibility("header", "true")
      .possibility("inferSchema", "true")
      .load("s3://<bucket>/<path>/highest_earning_players.csv"))

Word that you should replace the paths to match the S3 bucket you’re utilizing as a substitute of the CodeWhisperer-generated bucket.

From the previous code snippet, CodeWhisperer used Spark DataFrames to learn the CSV recordsdata.

Now you can strive some rephrasing to get a suggestion with DynamicFrame capabilities:

# Learn CSV file from S3 with the header format possibility utilizing DynamicFrame"

Now CodeWhisperer will generate a code snippet that’s near the next:

dyF = glueContext.create_dynamic_frame.from_options(
    connection_type="s3",
    connection_options={
        "paths": ["s3://<bucket>/<path>/highest_earning_players.csv"],
        "recurse": True,
    },
    format="csv",
    format_options={
        "withHeader": True,
    },
    transformation_ctx="dyF")

Rephrasing the sentences written now has proved that after some modifications to the feedback we wrote, we acquired the proper advice from CodeWhisperer.

Subsequent, use CodeWhisperer to print the schema of the previous AWS Glue DynamicFrame by utilizing the next remark:
```
# Print the schema of the above DynamicFrame
```

CodeWhisperer will generate a code snippet that’s near the next:

We get the next output.

Now we use CodeWhisperer to create some transformation capabilities that may manipulate the AWS Glue DynamicFrame learn earlier. We begin by getting into code in a brand new cell.

First, check if CodeWhisperer can use the proper AWS Glue context capabilities like ResolveChoice:
```
# Convert the "PlayerId" sort from string to integer
```

CodeWhisperer has really useful a code snippet much like the next:

dyF = dyF.resolveChoice(specs=[('PlayerId', 'cast:long')])
dyF.printSchema()

The previous code snippet doesn’t precisely characterize the remark that we entered.

You may apply sentence paraphrasing and simplifying by offering the next three feedback. Each has totally different ask and we use the withColumn Spark Body methodology, which is utilized in casting columns sorts:
```
# Convert the DynamicFrame to spark knowledge body
# Forged the 'PlayerId' column from string to Integer utilizing WithColumn perform
 # Convert the spark body again to DynamicFrame and print the schema
```

CodeWhisperer will choose up the previous instructions and suggest the next code snippet in sequence:

df = dyF.toDF()
df = df.withColumn("PlayerId", df["PlayerId"].forged("integer"))
dyF = DynamicFrame.fromDF(df, glueContext, "dyF")
dyF.printSchema()

The next output confirms the PlayerId column is modified from string to integer.