Apache Kafka: a distributed event streaming platform used for high-performance data pipelines, streaming analytics, and data integration.AWS Private Link: a secure connection service that enables private connectivity between VPCs and AWS services without traversing the public internet.AWS Transit Gateway: a cloud router that connects VPCs and on-premises networks through a central hub, enabling network isolation with overlapping CIDRs.
Backfill Operations: the process of inserting historical data into compressed chunks, often requiring decompression of affected chunks first for optimal performance.Bottomless Storage: low-cost object storage built on Amazon S3 that provides unlimited storage capacity for infrequently accessed data while maintaining queryability.
Candlestick Chart: a financial visualization showing open, high, low, and close (OHLC) values for asset price movements over time intervals.CDC (Change Data Capture): a pattern that captures changes in database tables and streams them to other systems in real-time, implemented through tools like Debezium.: a child table in a that contains data for a specific time range, automatically managed by for partitioning. Compression: the process of converting s from to format to achieve up to 90% storage reduction and improve query performance.Chunk Interval: the time span that determines how data is partitioned into chunks, typically configured as a time duration. For example 7 days.: an optimization technique that allows queries to skip s that don’t contain relevant data based on metadata.: the compressed, columnar storage format in that optimizes data for analytical queries and reduces storage requirements.Connection Pooling: a technique that optimizes database connections by efficiently managing and reusing them, reducing overhead for high-concurrency applications.s (CAggs): materialized views that automatically refresh in the background as new data is added, providing pre-computed aggregations for faster analytical queries.
Data Tiering: a storage strategy that automatically moves data between high-performance storage and low-cost object storage based on access patterns and age.Debezium: an open-source distributed platform for change data capture that enables real-time streaming of database changes.Dual-Write and Backfill: a migration strategy for large-scale workloads that involves implementing dual writes to both source and target systems while backfilling historical data.
s: exact, up-to-date copies of your database hosted in multiple AWS availability zones that automatically take over if the primary node fails.Hierarchical s: CAggs built on top of other CAggs. For example seconds → minutes → hours → daily to reduce computational costs for multi-level aggregations.: ‘s hybrid row-columnar storage engine that seamlessly switches between row-oriented and column-oriented storage for optimal performance.: a PostgreSQL table optimized for time-series data that automatically partitions data by time into s for improved performance.
Iceberg Tables: an open table format for large analytical datasets that enables reliable data lake capability with ACID transactions.Insights: ‘s built-in query monitoring tool that captures per-query statistics in real-time, providing visibility into database performance.: an add-on feature that provides enhanced IOPS and bandwidth performance for demanding workloads.IoT (Internet of Things): physical objects with embedded computing capabilities that collect sensor data and generate time-series datasets.
Live Migration: an end-to-end migration solution that moves databases with minimal downtime using PostgreSQL logical replication and pgcopydb.: a feature that enables continuous real-time synchronization between a PostgreSQL source database and .Logical Replication: a PostgreSQL feature that replicates changes to database objects based on their replication identity, used for real-time data streaming.
Materialized View: a database object that contains the results of a query and can be refreshed to update the data, used in continuous aggregates.Multi-tenancy: a system architecture that enables multiple users (tenants) to share the same application or database while keeping their data isolated.
Object Storage: low-cost, scalable storage service built on Amazon S3 used for storing infrequently accessed data in ‘s tiered storage architecture.OHLCV: open, High, Low, Close, Volume - standard financial data points used in market analysis and candlestick charts.
pgcopydb: a PostgreSQL tool used for copying databases efficiently, particularly in live migration scenarios.PITR (Point-in-Time Recovery): a backup feature that allows restoration of a database to any specific point in time within the retention period.: service tiers (, , ) that determine available features, resources, and support levels in .
RAG (Retrieval-Augmented Generation): an AI technique that combines retrieval of relevant information with generation capabilities to provide more accurate responses.: a read-only copy of the primary database kept in sync for scaling read operations and analytical workloads.Read Replica Sets: an improved version of read replicas that allows up to 10 replica nodes behind a single read endpoint for horizontal read scaling.Real-time Analytics: the capability to process and analyze data as it’s generated, providing immediate insights and enabling quick decision-making.Rollup Compression: a feature that combines multiple smaller, uncompressed s into a single, larger compressed to reduce storage costs.: the uncompressed, row-oriented storage format in optimized for transactional operations and recent data access.
Schema per Tenant: a multi-tenancy approach where each tenant’s data is isolated within its own database schema while sharing the same database instance.Segmentby: a configuration option in hypertables that determines how data is segmented, typically using frequently queried columns for optimization.Service: a managed database instance in that provides Postgres functionality extended with capabilities.Service per Tenant: a multi-tenancy model where each tenant gets a dedicated for complete data isolation.
: a function that aggregates data by time intervals. For example 5-minute, 1-hour buckets for time-series analysis.Time-series Data: data that represents how a system, process, or behavior changes over time, typically timestamped and sequential.Tiger Lake: a connector that synchronizes data from services to Iceberg tables in Amazon S3 in real-time.Tiered Storage: a storage architecture that automatically moves data between high-performance and low-cost storage tiers based on usage patterns.timescaledb-parallel-copy: a tool for efficiently ingesting CSV data into in parallel for improved performance.: an open-source time-series database built on PostgreSQL, providing the core technology behind services.
Vector Search: a capability that enables similarity searches on high-dimensional vector data, commonly used in AI and machine learning applications. (Virtual Private Cloud): a private network environment in AWS that provides network isolation and security for cloud resources. Peering: a connection between two s that enables private communication between resources in different networks.
WAL (Write-Ahead Log): postgreSQL’s method of ensuring data integrity by writing changes to a log before applying them to the database.WebSocket: a communication protocol that provides full-duplex communication channels over a single TCP connection, used for real-time data streaming.Wide Table Layout: a table design approach with many columns (one per metric), suitable when all potential metrics are known upfront.Workload Isolation: the ability to separate read and write workloads to prevent performance interference and optimize resource utilization.