Siva's Blog...: scaling

Welcome to Siva's Blog

~-Scribbles by Sivananda Hanumanthu

My experiences and learnings on Technology, Leadership, Domains, Life and on various topics as a reference!

What you can expect here, it could be something on Java, J2EE, Databases, or altogether on a newer Programming language, Software Engineering Best Practices, Software Architecture, SOA, REST, Web Services, Micro Services, APIs, Technical Architecture, Design, Programming, Cloud, Application Security, Artificial Intelligence, Machine Learning, Big data and Analytics, Integrations, Middleware, Continuous Delivery, DevOps, Cyber Security, Application Security, QA/QE, Automations, Emerging Technologies, B2B, B2C, ERP, SCM, PLM, FinTech, IoT, RegTech or any other domain, Tips & Traps, News, Books, Life experiences, Notes, latest trends and many more...

Showing posts with label scaling. Show all posts

Sunday, May 1, 2022

Upgrade to the Modern Data Stack

Solution: The Modern Data Stack

There's already a tool perfectly suited to storing massive amounts of data, that can be queried easily, and is connected to everything. A database or data warehouse. You probably already have one running in your company that you can reuse so you don't need to buy another CRM, CDP, DMP, MAP, or any other acronym.

Building around a data warehouse has additional benefits such as:

You own your data. It helps you comply with different regulations.
Get value quicker. It is 10x easier to dump historical data in a DB than importing the data in yet-another-tool.
Easier to sync with other tools. Databases integrate with everything, contrary to SaaS Tools that have limited APIs (and please don't get me started on APIs like Marketo).
Reusability. Other teams in the company can use this trusted source of truth.

In addition to a data warehouse, you will need 4 other key components:

An event tracking tool. You can continue using Segment here. It does the job well and allows you to collect events across all of your websites & apps.
A data loader. I recommend Fivetran. It’s easy to set up in a couple of clicks and amazingly reliable.
A data modeling tool. DBT is the new power tool here. It allows you to transform and model your data.
An Integration Platform. I’m a 100% biased here, but I recommend using Census. We integrate well with DBT and enable you to sync your clean and unified data models back to all of your other tools.

As a bonus, you can replace Amplitude with a BI tool like Mode or Chart.io, which is cheap and as good as Looker.

Reference: https://www.getcensus.com/blog/graduating-to-the-modern-data-stack-for-startups

Wednesday, November 3, 2021

When to use Airbyte along with Airflow

When to use Airbyte along with Airflow?

Airflow shines as a workflow orchestrator. Because Airflow is widely adopted, many data teams also use Airflow transfer and transformation operators to schedule and author their ETL pipelines. Several of those data teams have migrated their ETL pipelines to follow the ELT paradigm. We have seen some of the challenges of building full data replication and incremental loads DAGs with Airflow. More troublesome is that sources and destinations are tightly coupled in Airflow transfer operators. Because of this, it will be hard for Airflow to cover the long-tail of integrations for your business applications.

One alternative is to keep using Airflow as a scheduler and integrate it with two other open-source projects that are better suited for ELT pipelines, Airbyte for the EL parts and dbt for the T part. Airbyte sources are decoupled from destinations so you can already sync data from 100+ sources (databases, APIs, ...) to 10+ destinations (databases, data warehouses, data lakes, ...) and remove boilerplate code needed with Airflow. With dbt you can transform data with SQL in your data warehouse and avoid having to handle dependencies between tables in your Airflow DAGs.

References:

Airbyte https://github.com/airbytehq/airbyte

Airflow https://airbyte.io/blog/airflow-etl-pipelines

dbt https://github.com/dbt-labs/dbt-core

dbt implementation at Telegraph https://medium.com/the-telegraph-engineering/dbt-a-new-way-to-handle-data-transformation-at-the-telegraph-868ce3964eb4

Sunday, August 15, 2021

Five enterprise-architecture practices that add value to digital transformations

Five enterprise-architecture practices that add value to digital transformations

Engage top executives in key decisions
Emphasize strategic planning
Focus on business outcomes
Use capabilities to connect business and IT
Develop and retain high-caliber talent

Reference:

https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/five-enterprise-architecture-practices-that-add-value-to-digital-transformations

Friday, June 19, 2020

The more highly available system means that the less downtime you have to deal with

The more highly available system means that the less downtime you have to deal with...

Imagine,

The five nines, 99.999 availability means ~5 mins of downtime a year
The four nines, 99.99 availability means ~ 50 mins of downtime a year
The three nines, 99.9 availability means ~9 hours of downtime a year
The two nines, 99 availability means roughly 3 and half days of downtime a year

So, why are we concerned about the downtime? the more downtime you have means the less you attract your customers as I presume most of the systems or applications are accessible 24/7 and it means that round the clock!

We might be thinking that - oh, yeah - we have good ample of downtime in our hands to deal with it, but believe me, if you don't design the system very well with right practices and processes in place then it would be very hard to even achieving the two nines availability too, so, you have to really work very hard and smart to deal with it in an effective manner.

Some tips to utilize the downtime effectively for your systems or platforms are:

Make sure to eliminate any single point of failures; Especially handing with the third party API endpoints or dependent systems those are not in your control by having right retry frameworks and circuit breaker patterns
Observability and Monitoring with clear alerts and notifications in place
Have an efficient CI/CD pipelines with right checks on various stages of your configured environment to stop promoting the bad code (faulty code which has issues, and it can be performance issues as well)
Have well-defined deployments and the changed code propagation with

Blue-green deployments (having two identical production environments, one is blue and other one is green)
Canary deployments (by routing smaller traffic to the new changes and slowly increase the traffic)

Encourage A/B testing within your product features release where you are not sure on some of the features; also, promote canary releases too
Lastly, try to have a very diligent process by imagining all dimensions how your system can fail with scenarios(and that has come from your proactive resiliency testing activities) so that you have right tools to troubleshoot, tune and respond for any unknown hardware or network failures

Siva's Blog...