Recommands
Apache Spark
-
Memory management
- Project Tungsten: Bringing Spark Closer to Bare Metal
- Memory Management and Binary Processing: leveraging application semantics to manage memory explicitly and eliminate the overhead of JVM object model and garbage collection.
- Cache-aware computation: algorithms and data structures to exploit memory hierarchy.
- Code generation: using code generation to exploit modern compilers and CPUs.
-
Performance
- Tutorials and Courses
- Others
Streaming
- MICROBENCHMARKING APACHE STORM 1.0 PERFORMANCE
- High-throughput, low-latency, and exactly-once stream processing with Apache Flink
- Comparison of Apache Stream Processing Frameworks: Part 1, Part 2
- Carrier Payments Big Data Pipeline using Apache Storm
- Introduction to Apache Flink
- Throughput, Latency, and Yahoo! Performance Benchmarks. Is there a winner?
JAVA
- Direct Memory Alignment in Java
- What every programmer should know about memory
- On Heap vs Off Heap Memory Usage
- G1: One Garbage Collector To Rule Them All
- Tips for Tuning the Garbage First Garbage Collector
- Garbage Collection Optimization for High-Throughput and Low-Latency Java Applications
- Java Performance Tuning Guide: java.io.ByteArrayOutputStream
- Java Performance Tuning Guide: Performance of various methods of binary serialization in Java
- Compact Strings In Java 9
Python
Distributed System
- The Log: What every software engineer should know about real-time data’s unifying abstraction
- Four Really Real Meanings of Real-Time
- AN OVERVIEW OF APACHE STREAMING TECHNOLOGIES
- A BRIEF HISTORY OF APACHE STORM
- Understanding Consensus and Paxos in Distributed Systems
- Inside Cloud Spanner and the CAP Theorem
- Introducing Cloud Dataflow Shuffle: For up to 5x performance improvement in data analytic pipelines
Machine Learning
- Neural Networks and Deep Learning
- Deep Learning
- Practical Deep Learning For Coders
- The real prerequisite for machine learning isn’t math, it’s data analysis
Git
- Git Tutorial by Liao Xuefeng in Chinese
- Git Workflow by Ruan Yifeng in Chinese
- Git Use Process by Ruan Yifeng in Chinese
- Little Things I Like to Do with Git
About Trajectory Data
- All 1.1 Billion Taxi Rides on Redshift
- Monitoring Real-Time Uber Data Using Spark Machine Learning, Streaming, and the Kafka API. part 1, part 2
- DETECTING ABUSE AT SCALE: LOCALITY SENSITIVE HASHING AT UBER ENGINEERING
Insights
- Expert Interview Series: IBM’s Holden Karau on Hadoop, ETL, Machine Learning and the Future of Spark, Part 2
- Dataflow as Database
- BI & Analytics on a Data Lake: Part 1, Part 2
- Hadoop Best Practices and Anti-Patterns
- Server-side I/O Performance: Node vs. PHP vs. Java vs. Go
- Enough with the microservices
- Can your CTO still code?
Tech Stack
- THE UBER ENGINEERING TECH STACK, PART I: THE FOUNDATION
- THE UBER ENGINEERING TECH STACK, PART II: THE EDGE AND BEYOND
- How Uber Uses Spark and Hadoop to Optimize Customer Experience
- PayPal From Big Data to Fast Data: Part 1, Part 2
Algorithms
Open Source
Serverless
-Serverless on Kubernetes with Soam Vasani