AWS S3 connector as a destination

The ETL pipeline, to transfer customer data from a Langstack entity to a csv file on AWS S3, is created as follows.

To see step-by-step instructions of how to create an ETL pipeline, click here.

  • An ETL pipeline named โ€œTestEntitytoS3โ€ is created.

  • To connect to the data source, the necessary information is added in the Data source section:

    1. The โ€œEntityโ€ tab is selected.

    2. The entity โ€œCustomersDataโ€ is selected from the drop-down menu.

  • To connect to the data source, the necessary information is added as follows:

    1. The โ€œConnectorโ€ tab is selected.

    2. For this example, โ€œTestS3Connectorโ€ is selected from the drop-down menu. The connector can be added by selecting a connector from the drop-down menu or a new connector can be created by clicking the [+] button.

    3. To go to the settings, click the โ€œEdit the settingsโ€ arrow.

  • To disallow multiple simultaneous runs of the ETL pipeline, the toggle button is left enabled for โ€œskip execution while in progressโ€. Enabling this toggle button defines that the execution of this ETL pipeline will be skipped when there is one already in progress.

  • The default selection for ETL pipeline execution is โ€œImmediateโ€.

  • To align the source fields with destination fields, the settings for the reader and writer format are defined in the โ€œData Formatโ€ tab. The โ€œReaderโ€ tab is selected by default.

  • For this example, the fields are added as per the image below.

  • To update the settings for how the data should be written, select the โ€œWriterโ€ tab.

    1. The writer stream is โ€œCSV Stream.โ€

    2. To add the table name, click on the โ€œEdit the settingsโ€ arrow.

  • To add details necessary to write the records, the settings in this section โ€œAWS S3 CSV format details:โ€ are defined as follows:

    1. In the field โ€œFile path or File URL or Folder path or Folder URLโ€ the Object URL is copied for the file.

    2. The โ€œFile nameโ€ is left blank as it is included in the Object URL.

    3. The โ€œCharacterSetโ€ is selected as โ€œUnicode(UTF-8)โ€.

    4. The โ€œLanguageโ€ is selected as โ€œEnglishโ€.

    5. The โ€œStart reading CSV from lineโ€ is defined as โ€œ1โ€.

    6. The โ€œseparatorโ€ is selected as โ€œCommaโ€.

  • In the โ€œSample dataโ€, the column names of the source file are pasted: โ€œCustomer_ID,Name,Ageโ€.

  • Once the sample data is added, the writer fields are displayed:

    1. The column names are copied to Field Mapping by clicking โ€œCopy to Field Mappingโ€.

    2. Click on "Accept & Collapse" to save the information.

  • The โ€œWriterโ€ mode is โ€œAppendโ€.

  • In the Field Mapping section, all the โ€œMapped Fieldsโ€ are aligned.

  • When the ETL pipeline is executed (after Save and Publish), the records will be added to the destination. The changes can be checked by downloading the file from the S3 bucket.

Last updated