Member-only story

Build A StreamSets Pipeline In 5 Minutes

Chloooo
5 min readJul 1, 2020

Demonstration on building a realtime data pipeline using Streamsets

StreamSets ia a modern data streaming and integration platform build by company StreamSets, Inc. It is used by many multinational companies such as Shell and Dell.

StreamSets Data Collector (SDC) is an open source data ingestion pipeline as one part of StreamSets DataOps platform, you can download here to try out. Today we will use SDC to demonstrate the real time data transformation from Aurora Postgres to Kinese Firehose.

1. Create a new pipeline by clicking the blue button on the top left of the StreamSets UI, and type in pipeline name, I’m typing in “Demo”. After that, click “Save”. Now we have a empty pipeline created.

name pipeline

You will see a small error triangle on the right top of the user interface, reminding you to add in pipeline Origin.

empty data pipeline interface

We will use JDBC Query Consumer as the origin to ingest the source data from Aurora Postgres. There is one more popular data ingestion…

--

--

Chloooo
Chloooo

Written by Chloooo

Writing articles to test own knowledge depth

Responses (1)