Sync from S3

You use the to synchronize CSV and Parquet files from an S3 bucket in real time. The connector runs continuously, enabling you to leverage real-time analytics capabilities with data constantly synced from S3. This lets you take full advantage of real-time analytics capabilities without having to develop or manage custom ETL solutions between S3 and your database. Connectors overview

You can use the to synchronize your existing and new data. Here’s what the connector can do:

Sync data from an S3 bucket:
- Use glob patterns to identify the objects to sync.
- Watch an S3 bucket for new files and import them automatically. It runs on a configurable schedule and tracks processed files.
- Important: The connector processes files in lexicographical order. It uses the name of the last file processed as a marker and fetches only files later in the alphabet in subsequent queries. Files added with names earlier in the alphabet than the marker are skipped and never synced. For example, if you add the file Bob when the marker is at Elephant, Bob is never processed.
- For large backlogs, check every minute until caught up.
Sync data from multiple file formats:
- CSV: check for compression in GZ and ZIP format, then process using timescaledb-parallel-copy.
- Parquet: convert to CSV, then process using timescaledb-parallel-copy.
The offers an option to enable a during the file-to-table schema mapping setup. You can enable columnstore and continuous aggregates through the SQL editor once the connector has started running.
The connector offers a default 1-minute polling interval. This checks the S3 source every minute for new data. You can customize this interval by setting up a cron expression.

The continuously imports data from an Amazon S3 bucket into your database. It monitors your S3 bucket for new files matching a specified pattern and automatically imports them into your designated database table.

The connector currently only syncs existing and new files—it does not support updating or deleting records based on updates and deletes from S3 to tables.

Prerequisites

To follow the steps on this page:

Create a target with the Real-time analytics capability enabled. You need your connection details.

Ensure access to a standard Amazon S3 bucket containing your data files. Directory buckets are not supported.
Configure access credentials for the S3 bucket. The following credentials are supported:
- IAM Role.
  - Configure the trust policy. Set the:
    - Principal: arn:aws:iam::142548018081:role/timescale-s3-connections.
    - ExternalID: set to the and ID of the you are syncing to in the format <projectId>/<serviceId>. This is to avoid the confused deputy problem.
  - Give the following access permissions:
    - s3:GetObject.
    - s3:ListBucket.
- Public anonymous user.

This feature is currently not supported for on Microsoft Azure.

Limitations

File naming: Files must follow lexicographical ordering conventions. Files with names that sort earlier than already-processed files are permanently skipped. Example: if file_2024_01_15.csv has been processed, a file named file_2024_01_10.csv added later will never be synced. Recommended naming patterns: timestamps (for example, YYYY-MM-DD-HHMMSS), sequential numbers with fixed padding (for example, file_00001, file_00002).
CSV:
- Maximum file size: 1 GB To increase this limit, contact sales@tigerdata.com
- Maximum row size: 2 MB
- Supported compressed formats:
  - GZ
  - ZIP
- Advanced settings:
  - Delimiter: the default character is ,, you can choose a different delimiter
  - Skip header: skip the first row if your file has headers
Parquet:
- Maximum file size: 1 GB
- Maximum row size: 2 MB
Sync iteration: To prevent system overload, the connector tracks up to 100 files for each sync iteration. Additional checks only fill empty queue slots.

Synchronize data to your service

To sync data from your S3 bucket using :

Connect to your In , select the to sync live data to.
Connect the source S3 bucket to the target
1. Click Connectors > Amazon S3.
2. Click the pencil icon, then set the name for the new connector.
3. Set the Bucket name and Authentication method, then click Continue. For instruction on creating the IAM role to connect your S3 bucket, click Learn how. connects to the source bucket.
4. In Define files to sync, choose the File type and set the Glob pattern. Use the following patterns:
  - <folder name>/*: match all files in a folder. Also, any pattern ending with / is treated as /*.
  - <folder name>/**: match all recursively.
  - <folder name>/**/*.csv: match a specific file type.
  The uses prefix filters where possible, place patterns carefully at the end of your glob expression. AWS S3 doesn’t support complex filtering. If your expression filters too many files, the list operation may time out.
5. Click the search icon. You see the files to sync. Click Continue.
Optimize the data to synchronize in hypertables checks the file schema and, if possible, suggests the column to use as the time dimension in a .
1. Choose Create a new table for your data or Ingest data to an existing table.
2. Choose the Data type for each column, then click Continue.
3. Choose the interval. This can be a minute, an hour, or use a cron expression.
4. Click Start Connector. starts the connection between the source database and the target and displays the progress.
Monitor synchronization
1. To view the amount of data replicated, click Connectors. The diagram in Connector data flow gives you an overview of the connectors you have created, their status, and how much data has been replicated.
2. To view file import statistics and logs, click Connectors > Source connectors, then select the name of your connector in the table.
Manage the connector
1. To pause the connector, click Connectors > Source connectors. Open the three-dot menu next to your connector in the table, then click Pause.
2. To edit the connector, click Connectors > Source connectors. Open the three-dot menu next to your connector in the table, then click Edit and scroll down to Modify your Connector. You must pause the connector before editing it.
3. To pause or delete the connector, click Connectors > Source connectors, then open the three-dot menu on the right and select an option. You must pause the connector before deleting it.

And that is it, you are using the to synchronize all the data, or specific files, from an S3 bucket in real time.

Connectors

Coding

Business intelligence and data visualization

Configuration and deployment

Data engineering and extract, transform, load

Data ingestion and streaming

Observability and alerting

Query and administration

Secure connectivity to Tiger Cloud

Prerequisites

Limitations

Synchronize data to your service

Connectors

Coding

Business intelligence and data visualization

Configuration and deployment

Data engineering and extract, transform, load

Data ingestion and streaming

Observability and alerting

Query and administration

Secure connectivity to Tiger Cloud

​Prerequisites

​Limitations

​Synchronize data to your service

Prerequisites

Limitations

Synchronize data to your service