cloud

What is Glue Crawler?

An AWS Glue component that automatically scans data sources, infers schemas, and creates or updates table definitions in the Glue Data Catalog.

Detailed Explanation

Crawlers connect to S3 paths, JDBC databases, or DynamoDB tables and sample the data to determine file format, column names, and data types. They detect partition structures in S3 (e.g., year=/month=/day= folder hierarchies) and register them as partition keys. Crawlers are charged per DPU-hour, so for high-frequency partition updates, using the Glue API directly (batch_create_partition) is more cost-efficient.

AWS Gluemetadataschema discoveryData Catalog