Exercise 4: ETL pipeline, CSV Source & FTP Connector using Sections
In this exercise, store data in an Entity with a formatted date.
Requirements
For this exercise, use a CSV file with the following fields containing customer information.
This is just test data, not real, and has been auto-generated by http://www.convertcsv.com/generate-test-data.htm. In this exercise, the joining date information is in the wrong format, and actions will be defined to correct it before storing it in the entity.
Store data through the ETL pipeline.
Use an FTP tool to establish a connection with the Langstack application. In this exercise, the tool in use is Filezilla.
File must be residing on the FTP server before the execution of the ETL pipeline starts.
Create entity “UserName_Acc_customers”.
Store the data from the CSV file in the entity “UserName_Acc_customers” after the following:
Format the date to align with the format of the entity
Create additional fields for
DayMonthJoinDate
Customer_Value
Rating
Solution
The CSV file stores customer information, which needs to be updated in the Customer entity. However, two issues need to be sorted beforehand:
The joining date for the customer in this file is in the format (M/DD/YYYY), whereas the format of the Join_Date in the entity is (MM/DD/YYYY). The joining date values must be reformatted before storing in the entity.
Three fields DateMonthJoinDate, Customer_Value, and Rating are not present in the CSV file. These fields must be created in the ETL pipeline and assigned relevant values. The solution is based on the following steps:
Step 1: Create the ETL pipeline adding the three extra fields for:
DateMonthJoinDate
Customer_Value
Rating. Map the Reader and Writer fields.
Step 2: When the ETL pipeline is created, define actions to convert the format of the Join_Date received in the Reader to align with the format of the Join_Date field in the target entity. Also, define Actions to assign values to Customer_Value and Rating fields before storing in the entity.
Last updated