AWS S3 connector as a source
Last updated
Last updated
The ETL pipeline, to transfer customer data from a csv file on AWS S3 to a Langstack entity, is created as follows.
To see step-by-step instructions of how to create an ETL pipeline, click here.
An ETL pipeline named “TestS3toEntity” is created.
To connect to the data source, the necessary information is added as follows:
The “Connector” tab is selected.
For this example, “TestS3Connector” is selected from the drop-down menu. The connector can be added by selecting a connector from the drop-down menu or a new connector can be created by clicking the [+] button.
To go to the settings, click the “Edit the settings” arrow.
To connect to the data destination, the necessary information is added in the Data destination section:
The “Entity” tab is selected.
The “CustomersData” entity is selected from the drop-down menu.
To disallow multiple simultaneous runs of the ETL pipeline, the toggle button is left enabled for “skip execution while in progress”. Enabling this toggle button defines that the execution of this ETL pipeline will be skipped when there is one already in progress.
The default selection for ETL pipeline execution is “Immediate”.
To align the source fields with destination fields, the settings for the reader and writer format are defined in the “Data Format” tab. The “Reader” tab is selected by default.
To update the settings for how the data should be read:
The reader stream should be “CSV Stream.”
To add the reader details, click on the “Edit the settings” arrow.
To add details necessary to read the records, the settings in this section “AWSS3 CSV format details:” are defined as follows:
In the field “File path or File URL or Folder path or Folder URL” the S3 URI is copied for the file.
The “File name” is left blank as it is included in the S3 URI.
The “CharacterSet” is selected as “Unicode(UTF-8)”.
The “Language” is selected as “English”.
The “Start reading CSV from line” is defined as “1”.
The “separator” is selected as “Comma”.
In the “Sample data”, the column names of the source file are pasted: “Customer_ID,Name,Age”.
When the Sample data is saved, source fields are displayed.
The column names are copied to Source Fields by clicking “Copy to Source Fields”.
Click on "Accept & Collapse" to save the information.
For this example, the source fields are added as per the image below.
The “Writer” mode is “Append”.
In the Field Mapping section, all the “Mapped Fields” are added and aligned.
When the ETL pipeline is executed (after Save and Publish), the records will be added to the destination.