Skip to content

Serverless ETL for Sirocco on Google Cloud

I published an architecture of a serverless ETL solution for Sirocco on the GCP Big Data blog. This solution scales from a few news articles in a cloud bucket to millions of news posts in a database, taking advantage of Cloud Dataflow’s autoscaling features. With this blog you now should have all the components for building a news monitoring or a opinion tracking solution. I know it because I am using exactly the same setup for an actual news monitoring solution — more about it in a future post.

Here is what I suggest you do:

  • Read about Plutchik’s framework for Emotion analysis to understand the theory behind this solution
  • Read about the ETL solution
  • Go to the github repo and follow the instructions in README. Set up your own processing pipeline and run a test crunching a few news articles that I uploaded to the test folder.

Leave a Reply