Step 1: Create the ETL pipeline

The first step is to create an ETL pipeline.

  • For this exercise, use a CSV file with the following fields: Customer_ID, Join_Date

  • Open Filezilla (or any FTP/SFTP) utility tool and connect to the remote site.

  • Place the CSV file in the connecting remote site.

  • To go to the ETL pipeline template, click on โ€œETL pipelineโ€ under the โ€œData managementโ€ menu on the left side panel.

  • To create a new ETL pipeline, click the [+ ETL pipeline] button under the โ€œETL pipelineโ€ tab.

  • To label the ETL pipeline and execution log entity, add the following information:

    1. Enter the โ€œETL pipelineโ€ name as โ€œUserName_Acc_customersโ€.

    2. Optionally add a description for this ETL pipeline.

    3. Enter a name for the execution log for this ETL pipeline as โ€œcsvtoentityโ€. This log will be updated each time the ETL pipeline is executed and can be viewed by clicking โ€œGo to recordsโ€.

  • To connect to the data source, add the necessary information in the โ€œData Sourceโ€ function. To add an existing connector:

    1. Select the โ€œConnectorโ€ tab.

    2. Select the relevant connector source. In this exercise, โ€œTutorial_SFTPโ€ is selected. (The source can be selected from the drop-down menu or a new source can be added by clicking the [+] button and adding relevant details.)

    3. To edit the settings, click the โ€œEdit the settingsโ€ arrow. In the settings, to add or update the details for the FTP/SFTP Connector, update the required fields (click here for the steps). If a connector does not exist, to create a new connector In the Data Source in the ETL pipeline, click the [+] button and add the required fields.

  • To add the necessary details to connect with the data destination, in the โ€œData Destinationโ€ section:

    1. Select the โ€œEntityโ€ tab.

    2. Select destination as โ€œUserName_Acc_skipCustomersโ€.

  • To disallow multiple simultaneous runs of the ETL pipeline leave the toggle button enabled for โ€œskip execution while in progressโ€. Enabling this toggle button defines that the execution of this ETL pipeline will be skipped when there is one already in progress.

  • The default value for ETL pipeline execution is selected as โ€œImmediateโ€. For this exercise, keep it as is.

  • To define the shared variables, go to the โ€œVariablesโ€ tab. The โ€œVariablesโ€ tab consists of the shared variable โ€œStartupParametersโ€ with the data type as โ€œListโ€. To add the variables for Actions to use, create the following additional three shared variables by clicking the [+ Variable] button.

  • These variables will cover the scope of all the sections in the ETL pipeline.

  • To store the the value for the joining month, define the following variable: Variable name: โ€œdefJoinMonthโ€, Data Type: โ€œintegerโ€, Initialized value: โ€œ3โ€

  • To store the the value for the joining day, define the following variable: Variable name: โ€œdefJoinDayโ€, Data Type: integer, Not initialized

  • To store the the value for the complete joining date, define the following variable: Variable name: โ€œjoinedDateโ€, Data Type: DateTime, Not initialized

  • To define the Reader (reading from the source) and Writer (writing on the destination) settings, go to the โ€œData Formatโ€ tab. To define the stream, select โ€œReader Streamโ€ as โ€œCSV Streamโ€.

  • To define the settings for the โ€œCSV Streamโ€, click on the โ€œEdit the settingsโ€ arrow to add details for the โ€œReader streamโ€.

  • โ€œFTP CSV Format Detailsโ€ box will be displayed. To define the โ€œFTP CSV format detailsโ€ settings, add the following details:

    1. Add the exact file path. This is the path to the file from which the CSV data will be read.

    2. Select โ€œCharacter Setโ€ as โ€œUniversal (UTF-8)โ€.

    3. Select โ€œLanguageโ€ as โ€œEnglishโ€.

    4. To define that the data is read from the first row, in the โ€œStart reading CSV from lineโ€ field, enter the digit โ€œ1โ€. The field โ€œHeader present in the first rowโ€ is selected by default.

    5. To define the separator based on which the fields are distinguished, select Separator as โ€œCommaโ€.

    6. The [Sample Data] button gets activated. To define the sample entity fields, click the [Sample Data] button.

  • When the [Sample Data]button is clicked, a box will be displayed to add the header file sample that is in the CSV file. Add the header file sample as follows: Customer_ID,Join_Date

    The header sample will contain the details of the fields to be read from the target Entity. To save the sample, click the [OK] button.

  • The fields of the entity will be displayed (based on the sample data entered).

  • To copy the CSV fields to Source fields in the entity, click the activated [Copy Sample to Source Fields] button.

  • The fields will be populated with string data types in the Source Fields section.

  • Update the data type for the โ€œJoin_Dateโ€ field to DateTime (as per the field's data type in the destination entity).

  • To accept the Ftp CSV format details settings, click the [Accept & Collapse] button.

  • After the โ€œReaderโ€ settings are defined, to define โ€œWriterโ€ settings, select the Data Format>Writer tab. To define that the data is written to the destination as mentioned in reader source fields without transforming it, select โ€œWriter Modeโ€ as โ€œAppendโ€.

  • To map the reader source fields to the entity destination fields, select the โ€œField Mappingโ€ tab. To add the Mapped Fields, click the [+ Field] button and add relevant information.

  • In case of CSV files, the sequence of fields top-down must be exactly the same as the sequence specified in the Reader.

    1. Mapping Sources: Select the field specified in the Reader. Select Variables>Reader>(field name). It displays as โ€œreader.(Field Name)โ€.

    2. Select field from Entity: Select mapping field from the Entity.

    3. Data Type: Select the data type for the mapping fields.

  • Add both the required fields in the โ€œField Mappingโ€ section.

    1. The Customer_ID field contains the odd numbers from 1 to 17 and is mapped to the target entity field โ€œCustomer_IDโ€.

    2. The โ€œJoin_Dateโ€ contains only the value for the day in the date and is mapped to a Shared variable โ€œjoinedDateโ€ of data type DateTime that is not initialized. (When a DateTime variable is not initialized to any value, it picks up the default date of โ€œ1970โ€“01-01 00:00:00โ€.)

  • To add a target entity, go to the Target Entities tab and click [+ Target entity].

  • Select the entity โ€œcsvtoentityโ€.

Last updated