Information is crucial for companies to make knowledgeable choices, enhance operations, and innovate. Integrating information from totally different sources generally is a advanced and time-consuming course of. AWS affords AWS Glue that will help you combine your information from a number of sources on serverless infrastructure for evaluation, machine studying (ML), and software growth. AWS Glue gives totally different authoring experiences so that you can construct information integration jobs. One of the vital widespread choices is the pocket book. Information scientists are inclined to run queries interactively and retrieve outcomes instantly to writer information integration jobs. This interactive expertise can speed up constructing information integration pipelines.
Lately, AWS introduced basic availability of Amazon CodeWhisperer. Amazon CodeWhisperer is an AI coding companion that makes use of foundational fashions below the hood to enhance developer productiveness. This works by producing code recommendations in actual time based mostly on builders’ feedback in pure language and prior code of their built-in growth atmosphere (IDE). AWS additionally introduced the Amazon CodeWhisperer Jupyter extension to assist Jupyter customers by producing real-time, single-line, or full-function code recommendations for Python notebooks on Jupyter Lab and Amazon SageMaker Studio.
In the present day, we’re excited to announce that AWS Glue Studio notebooks now help Amazon CodeWhisperer for AWS Glue customers to enhance your expertise and assist increase growth productiveness. Now, in your Glue Studio pocket book, you’ll be able to write a remark in pure language (in English) that outlines a selected job, resembling “Create a Spark DataFrame from a json file.”. Based mostly on this data, CodeWhisperer recommends a number of code snippets instantly within the pocket book that may accomplish the duty. You possibly can shortly settle for the highest suggestion, view extra recommendations, or proceed writing your individual code.

This put up demonstrates how the consumer expertise on AWS Glue Studio pocket book has been modified with the Amazon CodeWhisperer integration.
Stipulations
Earlier than going ahead with this tutorial, it’s good to full the next conditions:
- Arrange AWS Glue Studio.
- Configure an AWS Id and Entry Administration (IAM) position to work together with Amazon CodeWhisperer. Connect the next coverage to your IAM position for the AWS Glue Studio pocket book:
{
"Model": "2012-10-17",
"Assertion": [
{
"Sid": "CodeWhispererPermissions",
"Effect": "Allow",
"Action": [
"codewhisperer:GenerateRecommendations"
],
"Useful resource": "*"
}
]
}
Getting Began
Let’s get began. Create a brand new AWS Glue Studio pocket book job by finishing the next steps:
- On the AWS Glue console, select Notebooks below ETL jobs within the navigation pane.
- Choose Jupyter Pocket book and select Create.
- For Job title, enter
codewhisperer-demo
.
- For IAM Position, choose your IAM position that you just configured as a prerequisite.
- Select Begin pocket book.
A brand new pocket book is created with pattern cells.

On the backside, there’s a menu named CodeWhisperer. By selecting this menu, you’ll be able to see the shortcuts and a number of other choices, together with disabling auto-suggestions.

Let’s attempt your first advice by Amazon CodeWhisperer. Observe that this put up incorporates examples of suggestions, however you might even see totally different code snippets advisable by Amazon CodeWhisperer.
Add a brand new cell and enter your remark to explain what you need to obtain. After you press Enter, the advisable code is proven.

When you press Tab, then code is chosen. When you press arrow keys, then you’ll be able to choose different suggestions. You possibly can study extra in Consumer actions.
Now let’s learn a JSON file from Amazon Easy Storage Service (Amazon S3). Enter the next code remark right into a pocket book cell and press Enter:
# Create a Spark DataFrame from a json file
CodeWhisperer will suggest a code snippet just like the next:
def create_spark_df_from_json(spark, file_path):
return spark.learn.json(file_path)
Now use this technique to make the most of the prompt code snippet:
df = create_spark_df_from_json(spark, "s3://awsglue-datasets/examples/us-legislators/all/individuals.json")
df.present()
The continuing code returns the next output:
+----------+--------------------+----------+-----------+------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+------------------+--------------------+----------------+
|birth_date| contact_details|death_date|family_name|gender|given_name| id| identifiers| picture| photos| hyperlinks| title| other_names| sort_name|
+----------+--------------------+----------+-----------+------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+------------------+--------------------+----------------+
|1944-10-15| null| null| Collins| male| Michael|0005af3a-9471-4d1...|[{C000640, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Mac Collins|[{bar, Mac Collin...|Collins, Michael|
|1969-01-31|[{fax, 202-226-07...| null| Huizenga| male| Bill|00aa2dc0-bfb6-441...|[{Bill Huizenga, ...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Bill Huizenga|[{da, Bill Huizen...| Huizenga, Bill|
|1959-09-28|[{phone, 202-225-...| null| Clawson| male| Curtis|00aca284-9323-495...|[{C001102, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (comm...| Curt Clawson|[{bar, Curt Claws...| Clawson, Curtis|
|1930-08-14| null|2001-10-26| Solomon| male| Gerald|00b73df5-4180-441...|[{S000675, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Gerald Solomon|[{null, Gerald B....| Solomon, Gerald|
|1960-05-28|[{fax, 202-225-42...| null| Rigell| male| Edward|00bee44f-db04-4a7...|[{R000589, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| E. Scott Rigell|[{null, Scott Rig...| Rigell, Edward|
|1951-05-20|[{twitter, MikeCr...| null| Crapo| male| Michael|00f8f12d-6e27-4a2...|[{Mike Crapo, bal...|https://theunited...|[{https://theunit...|[{Wikipedia (da),...| Mike Crapo|[{da, Mike Crapo,...| Crapo, Michael|
|1926-05-12| null| null| Hutto| male| Earl|015d77c8-6edb-4ed...|[{H001018, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Earl Hutto|[{null, Earl Dewi...| Hutto, Earl|
|1937-11-07| null|2015-11-19| Ertel| male| Allen|01679bc3-da21-482...|[{E000208, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Allen Ertel|[{null, Allen E. ...| Ertel, Allen|
|1916-09-01| null|2007-11-24| Minish| male| Joseph|018247d0-2961-423...|[{M000796, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Joseph Minish|[{bar, Joseph Min...| Minish, Joseph|
|1957-08-04|[{phone, 202-225-...| null| Andrews| male| Robert|01b100ac-192e-4b5...|[{A000210, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Robert E. Andrews|[{null, Rob Andre...| Andrews, Robert|
|1957-01-10|[{fax, 202-225-57...| null| Walden| male| Greg|01bc21bf-8939-487...|[{Greg Walden, ba...|https://theunited...|[{https://theunit...|[{Wikipedia (comm...| Greg Walden|[{bar, Greg Walde...| Walden, Greg|
|1919-01-17| null|1987-11-29| Kazen| male| Abraham|02059c1e-0bdf-481...|[{K000025, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...|Abraham Kazen, Jr.|[{null, Abraham K...| Kazen, Abraham|
|1960-01-11|[{fax, 202-225-67...| null| Turner| male| Michael|020aa7dd-54ef-435...|[{Michael R. Turn...|https://theunited...|[{https://theunit...|[{Wikipedia (comm...| Michael R. Turner|[{null, Mike Turn...| Turner, Michael|
|1942-06-28| null| null| Kolbe| male| James|02141651-eca2-4aa...|[{K000306, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Jim Kolbe|[{ca, Jim Kolbe, ...| Kolbe, James|
|1941-03-08|[{fax, 202-225-79...| null| Lowenthal| male| Alan|0231c6ef-6e92-49b...|[{Alan Lowenthal,...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Alan S. Lowenthal|[{null, Alan Lowe...| Lowenthal, Alan|
|1952-01-09|[{fax, 202-225-93...| null| Capuano| male| Michael|0239032f-be5c-4af...|[{Michael Capuano...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...|Michael E. Capuano|[{null, Mike Capu...|Capuano, Michael|
|1951-10-19|[{fax, 202-225-56...| null| Schrader| male| Kurt|0263f619-eff8-4e1...|[{Kurt Schrader, ...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Kurt Schrader|[{bar, Kurt Schra...| Schrader, Kurt|
|1947-06-13|[{fax, 202-225-69...| null| Nadler| male| Jerrold|029e793d-ec40-4a1...|[{N000002, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Jerrold Nadler|[{ca, Jerrold Nad...| Nadler, Jerrold|
|1970-02-03|[{fax, 202-225-82...| null| Graves| male| Tom|02b621fc-0523-449...|[{Tom Graves, bal...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Tom Graves|[{bar, Tom Graves...| Graves, Tom|
|1932-05-09| null| null| McMillan| male| John|03018f7c-f866-419...|[{M000566, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| John McMillan|[{null, Alex McMi...| McMillan, John|
+----------+--------------------+----------+-----------+------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+------------------+--------------------+----------------+
only showing top 20 rows
As you can see from the result, you can quickly utilize the code snippet recommended by Amazon CodeWhisperer.
Examples of code recommendations
In this section, we provide additional examples of code recommendations. Note that these are just our examples, and different code snippets may be suggested by Amazon CodeWhisperer.
Add a column with a calculation
In extract, transform, and load (ETL) use cases, it’s common to add new columns from existing columns. When we need to add columns to our Spark DataFrame, we can articulate with a high level of detail to Amazon CodeWhisperer what type of column we need added and its respective attributes:
# Add age column to a given person DataFrame
# age is calculated from current date and birth_date. When death_date is not null, then use death_date to calculate age
Amazon CodeWhisperer will recommend a code snippet similar to the following:
def add_age_column(df):
# Use current date to calculate age
current_date = current_timestamp()
# Use birth_date column to calculate age
df = df.withColumn("age", datediff(current_date, df.birth_date) / 365)
# Use death_date column to calculate age
df = df.withColumn("age", when(df.death_date.isNull(), df.age).otherwise(datediff(df.death_date, df.birth_date) / 365))
return df
It can be utilized as follows:
df = add_age_column(df)
df.show()
The preceding code returns the following output:
+----------+--------------------+----------+-----------+------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+------------------+--------------------+----------------+------------------+--------------------+
|birth_date| contact_details|death_date|family_name|gender|given_name| id| identifiers| image| images| links| name| other_names| sort_name| age| current_date|
+----------+--------------------+----------+-----------+------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+------------------+--------------------+----------------+------------------+--------------------+
|1944-10-15| null| null| Collins| male| Michael|0005af3a-9471-4d1...|[{C000640, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Mac Collins|[{bar, Mac Collin...|Collins, Michael| 78.71506849315068|2023-06-14 06:12:...|
|1969-01-31|[{fax, 202-226-07...| null| Huizenga| male| Bill|00aa2dc0-bfb6-441...|[{Bill Huizenga, ...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Bill Huizenga|[{da, Bill Huizen...| Huizenga, Bill| 54.4027397260274|2023-06-14 06:12:...|
|1959-09-28|[{phone, 202-225-...| null| Clawson| male| Curtis|00aca284-9323-495...|[{C001102, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (comm...| Curt Clawson|[{bar, Curt Claws...| Clawson, Curtis| 63.75342465753425|2023-06-14 06:12:...|
|1930-08-14| null|2001-10-26| Solomon| male| Gerald|00b73df5-4180-441...|[{S000675, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Gerald Solomon|[{null, Gerald B....| Solomon, Gerald| 71.24931506849315|2023-06-14 06:12:...|
|1960-05-28|[{fax, 202-225-42...| null| Rigell| male| Edward|00bee44f-db04-4a7...|[{R000589, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| E. Scott Rigell|[{null, Scott Rig...| Rigell, Edward|63.087671232876716|2023-06-14 06:12:...|
|1951-05-20|[{twitter, MikeCr...| null| Crapo| male| Michael|00f8f12d-6e27-4a2...|[{Mike Crapo, bal...|https://theunited...|[{https://theunit...|[{Wikipedia (da),...| Mike Crapo|[{da, Mike Crapo,...| Crapo, Michael| 72.11780821917809|2023-06-14 06:12:...|
|1926-05-12| null| null| Hutto| male| Earl|015d77c8-6edb-4ed...|[{H001018, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Earl Hutto|[{null, Earl Dewi...| Hutto, Earl| 97.15616438356165|2023-06-14 06:12:...|
|1937-11-07| null|2015-11-19| Ertel| male| Allen|01679bc3-da21-482...|[{E000208, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Allen Ertel|[{null, Allen E. ...| Ertel, Allen| 78.08493150684932|2023-06-14 06:12:...|
|1916-09-01| null|2007-11-24| Minish| male| Joseph|018247d0-2961-423...|[{M000796, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Joseph Minish|[{bar, Joseph Min...| Minish, Joseph| 91.2904109589041|2023-06-14 06:12:...|
|1957-08-04|[{phone, 202-225-...| null| Andrews| male| Robert|01b100ac-192e-4b5...|[{A000210, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Robert E. Andrews|[{null, Rob Andre...| Andrews, Robert| 65.9041095890411|2023-06-14 06:12:...|
|1957-01-10|[{fax, 202-225-57...| null| Walden| male| Greg|01bc21bf-8939-487...|[{Greg Walden, ba...|https://theunited...|[{https://theunit...|[{Wikipedia (comm...| Greg Walden|[{bar, Greg Walde...| Walden, Greg| 66.46849315068494|2023-06-14 06:12:...|
|1919-01-17| null|1987-11-29| Kazen| male| Abraham|02059c1e-0bdf-481...|[{K000025, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...|Abraham Kazen, Jr.|[{null, Abraham K...| Kazen, Abraham| 68.91232876712328|2023-06-14 06:12:...|
|1960-01-11|[{fax, 202-225-67...| null| Turner| male| Michael|020aa7dd-54ef-435...|[{Michael R. Turn...|https://theunited...|[{https://theunit...|[{Wikipedia (comm...| Michael R. Turner|[{null, Mike Turn...| Turner, Michael|63.465753424657535|2023-06-14 06:12:...|
|1942-06-28| null| null| Kolbe| male| James|02141651-eca2-4aa...|[{K000306, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Jim Kolbe|[{ca, Jim Kolbe, ...| Kolbe, James| 81.01643835616439|2023-06-14 06:12:...|
|1941-03-08|[{fax, 202-225-79...| null| Lowenthal| male| Alan|0231c6ef-6e92-49b...|[{Alan Lowenthal,...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Alan S. Lowenthal|[{null, Alan Lowe...| Lowenthal, Alan| 82.32328767123288|2023-06-14 06:12:...|
|1952-01-09|[{fax, 202-225-93...| null| Capuano| male| Michael|0239032f-be5c-4af...|[{Michael Capuano...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...|Michael E. Capuano|[{null, Mike Capu...|Capuano, Michael| 71.47671232876712|2023-06-14 06:12:...|
|1951-10-19|[{fax, 202-225-56...| null| Schrader| male| Kurt|0263f619-eff8-4e1...|[{Kurt Schrader, ...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Kurt Schrader|[{bar, Kurt Schra...| Schrader, Kurt| 71.7013698630137|2023-06-14 06:12:...|
|1947-06-13|[{fax, 202-225-69...| null| Nadler| male| Jerrold|029e793d-ec40-4a1...|[{N000002, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Jerrold Nadler|[{ca, Jerrold Nad...| Nadler, Jerrold| 76.05479452054794|2023-06-14 06:12:...|
|1970-02-03|[{fax, 202-225-82...| null| Graves| male| Tom|02b621fc-0523-449...|[{Tom Graves, bal...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Tom Graves|[{bar, Tom Graves...| Graves, Tom|53.394520547945206|2023-06-14 06:12:...|
|1932-05-09| null| null| McMillan| male| John|03018f7c-f866-419...|[{M000566, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| John McMillan|[{null, Alex McMi...| McMillan, John| 91.15890410958905|2023-06-14 06:12:...|
+----------+--------------------+----------+-----------+------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+------------------+--------------------+----------------+------------------+--------------------+
only showing top 20 rows
Sort and extract records
You can use Amazon CodeWhisperer for sorting data and extracting records within a Spark DataFrame as well:
# Show top 5 oldest persons from DataFrame
# Use age column
Amazon CodeWhisperer will recommend a code snippet similar to the following:
def get_oldest_person(df):
return df.orderBy(desc("age")).limit(5)
It can be utilized as follows:
get_oldest_person(df).show()
The preceding code returns the following output:
+----------+---------------+----------+-----------+------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+---------------+--------------------+---------------+------------------+--------------------+
|birth_date|contact_details|death_date|family_name|gender|given_name| id| identifiers| image| images| links| name| other_names| sort_name| age| current_date|
+----------+---------------+----------+-----------+------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+---------------+--------------------+---------------+------------------+--------------------+
|1919-08-22| null| null| Winn| male| Edward|942d20ed-d838-436...|[{W000636, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...|Larry Winn, Jr.|[{null, Larry Win...| Winn, Edward|103.88219178082191|2023-06-14 06:13:...|
|1920-03-23| null| null| Smith| male| Neal|84a9cbe4-651b-46d...|[{S000596, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Neal Smith|[{null, Neal Edwa...| Smith, Neal| 103.2958904109589|2023-06-14 06:13:...|
|1920-09-17| null| null| Holt|female| Marjorie|8bfb671a-3147-4bc...|[{H000747, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Marjorie Holt|[{bar, Marjorie H...| Holt, Marjorie| 102.8082191780822|2023-06-14 06:13:...|
|1921-03-05| null| null| Bedell| male| Berkley|896f0ce3-afe4-4ea...|[{B000298, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (de),...| Berkley Bedell|[{ca, Berkley Bed...|Bedell, Berkley|102.34520547945205|2023-06-14 06:13:...|
|1921-06-23| null| null| Findley| male| Paul|2811f793-1108-4fb...|[{F000123, biogui...|https://theunited...|[{https://theunit...|[{Wikipedia (azb)...| Paul Findley|[{azb, پاول فایند...| Findley, Paul|102.04383561643836|2023-06-14 06:13:...|
+----------+---------------+----------+-----------+------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+---------------+--------------------+---------------+------------------+--------------------+
Generate sample datasets in a Spark DataFrame
Amazon CodeWhisperer is powerful enough to generate sample Spark DataFrames as well, which can be done like so:
# Generate sample Spark DataFrame of country name and country code
# First column name is country_name, and second column name is country_code
Amazon CodeWhisperer will recommend a code snippet similar to the following:
def get_country_code_df(spark):
return spark.createDataFrame(
[("United States", "US"), ("United Kingdom", "UK"), ("Canada", "CA")],
["country_name", "country_code"]
)
It may be utilized as follows:
df = get_country_code_df(spark)
df.present()
The previous code returns the next output:
+--------------+------------+
| country_name|country_code|
+--------------+------------+
| United States| US|
|United Kingdom| UK|
| Canada| CA|
+--------------+------------+
Generate transformations in SQL
We will additionally use Amazon CodeWhisperer to create a code snippet for transformation in SQL and create a brand new desk from the SQL question outcomes (CTAS) like so:
# Generate CTAS question by deciding on all of the data in a desk with grouping by a given column
Amazon CodeWhisperer will suggest a code snippet just like following:
def generate_ctas_query_with_group_by(table_name, group_by_col):
ctas_query = "CREATE TABLE " + table_name + " AS SELECT * FROM " + table_name + " GROUP BY " + group_by_col
return ctas_query
Conclusion
On this put up, we demonstrated how AWS Glue Studio pocket book integration with Amazon CodeWhisperer helps you construct information integration jobs quicker. This integration is obtainable right now in US East (N. Virginia). You can begin utilizing the AWS Glue Studio pocket book with Amazon CodeWhisperer to speed up constructing your information integration jobs. To get began with AWS Glue, go to AWS Glue.
Be taught extra
To study extra about utilizing AWS Glue notebooks and Amazon CodeWhisperer, take a look at the next video.
In regards to the authors
Noritaka Sekiyama is a Principal Huge Information Architect on the AWS Glue group. He works based mostly in Tokyo, Japan. He’s answerable for constructing software program artifacts to assist clients. In his spare time, he enjoys biking together with his street bike.
Gal Heyne is a Product Supervisor for AWS Glue with a robust concentrate on AI/ML, information engineering, and BI, and is predicated in California. She is captivated with growing a deep understanding of consumers’ enterprise wants and collaborating with engineers to design easy-to-use information merchandise. In her spare time, she enjoys enjoying card video games.