<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Software Engineering on Koby Bibas</title><link>https://kobybibas.github.io/tags/software-engineering/</link><description>Recent content in Software Engineering on Koby Bibas</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Fri, 12 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://kobybibas.github.io/tags/software-engineering/index.xml" rel="self" type="application/rss+xml"/><item><title>[Summary] MapReduce for Software Engineers</title><link>https://kobybibas.github.io/posts/20260612_mapreduce_for_software_engineers/summary/</link><pubDate>Fri, 12 Jun 2026 00:00:00 +0000</pubDate><guid>https://kobybibas.github.io/posts/20260612_mapreduce_for_software_engineers/summary/</guid><description>TL;DR Processing large-scale data sequentially is slow. MapReduce is a framework for parallel batch processing: you write map and reduce, and the system handles splitting the work, grouping intermediate results, retries, and execution across machines.
The Problem Suppose we have a very large set of web server logs and want to count how many times each URL was accessed. On one machine, the logic is simple:
Read every log line. Extract the URL.</description></item></channel></rss>