Skip to main content
This glossary provides definitions for technical terms and concepts commonly used in documentation and services.

A

Apache Kafka: a distributed event streaming platform used for high-performance data pipelines, streaming analytics, and data integration. AWS Private Link: a secure connection service that enables private connectivity between VPCs and AWS services without traversing the public internet. AWS Transit Gateway: a cloud router that connects VPCs and on-premises networks through a central hub, enabling network isolation with overlapping CIDRs.

B

Backfill Operations: the process of inserting historical data into compressed chunks, often requiring decompression of affected chunks first for optimal performance. Bottomless Storage: low-cost object storage built on Amazon S3 that provides unlimited storage capacity for infrequently accessed data while maintaining queryability.

C

Candlestick Chart: a financial visualization showing open, high, low, and close (OHLC) values for asset price movements over time intervals. CDC (Change Data Capture): a pattern that captures changes in database tables and streams them to other systems in real-time, implemented through tools like Debezium. : a child table in a that contains data for a specific time range, automatically managed by for partitioning. Compression: the process of converting s from to format to achieve up to 90% storage reduction and improve query performance. Chunk Interval: the time span that determines how data is partitioned into chunks, typically configured as a time duration. For example 7 days. : an optimization technique that allows queries to skip s that don’t contain relevant data based on metadata. : the compressed, columnar storage format in that optimizes data for analytical queries and reduces storage requirements. Connection Pooling: a technique that optimizes database connections by efficiently managing and reusing them, reducing overhead for high-concurrency applications. s (CAggs): materialized views that automatically refresh in the background as new data is added, providing pre-computed aggregations for faster analytical queries.

D

Data Tiering: a storage strategy that automatically moves data between high-performance storage and low-cost object storage based on access patterns and age. Debezium: an open-source distributed platform for change data capture that enables real-time streaming of database changes. Dual-Write and Backfill: a migration strategy for large-scale workloads that involves implementing dual writes to both source and target systems while backfilling historical data.

F

Fork: an exact copy of a database at a specific point in time that operates independently after creation, used for testing or troubleshooting.

H

s: exact, up-to-date copies of your database hosted in multiple AWS availability zones that automatically take over if the primary node fails. Hierarchical s: CAggs built on top of other CAggs. For example seconds → minutes → hours → daily to reduce computational costs for multi-level aggregations. : ‘s hybrid row-columnar storage engine that seamlessly switches between row-oriented and column-oriented storage for optimal performance. : a PostgreSQL table optimized for time-series data that automatically partitions data by time into s for improved performance.

I

Iceberg Tables: an open table format for large analytical datasets that enables reliable data lake capability with ACID transactions. Insights: ‘s built-in query monitoring tool that captures per-query statistics in real-time, providing visibility into database performance. : an add-on feature that provides enhanced IOPS and bandwidth performance for demanding workloads. IoT (Internet of Things): physical objects with embedded computing capabilities that collect sensor data and generate time-series datasets.

L

Live Migration: an end-to-end migration solution that moves databases with minimal downtime using PostgreSQL logical replication and pgcopydb. : a feature that enables continuous real-time synchronization between a PostgreSQL source database and . Logical Replication: a PostgreSQL feature that replicates changes to database objects based on their replication identity, used for real-time data streaming.

M

Materialized View: a database object that contains the results of a query and can be refreshed to update the data, used in continuous aggregates. Multi-tenancy: a system architecture that enables multiple users (tenants) to share the same application or database while keeping their data isolated.

O

Object Storage: low-cost, scalable storage service built on Amazon S3 used for storing infrequently accessed data in ‘s tiered storage architecture. OHLCV: open, High, Low, Close, Volume - standard financial data points used in market analysis and candlestick charts.

P

pgcopydb: a PostgreSQL tool used for copying databases efficiently, particularly in live migration scenarios. PITR (Point-in-Time Recovery): a backup feature that allows restoration of a database to any specific point in time within the retention period. : service tiers (, , ) that determine available features, resources, and support levels in .

R

RAG (Retrieval-Augmented Generation): an AI technique that combines retrieval of relevant information with generation capabilities to provide more accurate responses. : a read-only copy of the primary database kept in sync for scaling read operations and analytical workloads. Read Replica Sets: an improved version of read replicas that allows up to 10 replica nodes behind a single read endpoint for horizontal read scaling. Real-time Analytics: the capability to process and analyze data as it’s generated, providing immediate insights and enabling quick decision-making. Rollup Compression: a feature that combines multiple smaller, uncompressed s into a single, larger compressed to reduce storage costs. : the uncompressed, row-oriented storage format in optimized for transactional operations and recent data access.

S

Schema per Tenant: a multi-tenancy approach where each tenant’s data is isolated within its own database schema while sharing the same database instance. Segmentby: a configuration option in hypertables that determines how data is segmented, typically using frequently queried columns for optimization. Service: a managed database instance in that provides Postgres functionality extended with capabilities. Service per Tenant: a multi-tenancy model where each tenant gets a dedicated for complete data isolation.

T

: a function that aggregates data by time intervals. For example 5-minute, 1-hour buckets for time-series analysis. Time-series Data: data that represents how a system, process, or behavior changes over time, typically timestamped and sequential. Tiger Lake: a connector that synchronizes data from services to Iceberg tables in Amazon S3 in real-time. Tiered Storage: a storage architecture that automatically moves data between high-performance and low-cost storage tiers based on usage patterns. timescaledb-parallel-copy: a tool for efficiently ingesting CSV data into in parallel for improved performance. : an open-source time-series database built on PostgreSQL, providing the core technology behind services.

V

Vector Search: a capability that enables similarity searches on high-dimensional vector data, commonly used in AI and machine learning applications. (Virtual Private Cloud): a private network environment in AWS that provides network isolation and security for cloud resources. Peering: a connection between two s that enables private communication between resources in different networks.

W

WAL (Write-Ahead Log): postgreSQL’s method of ensuring data integrity by writing changes to a log before applying them to the database. WebSocket: a communication protocol that provides full-duplex communication channels over a single TCP connection, used for real-time data streaming. Wide Table Layout: a table design approach with many columns (one per metric), suitable when all potential metrics are known upfront. Workload Isolation: the ability to separate read and write workloads to prevent performance interference and optimize resource utilization.