cloud
What is Push-Down Predicate?
An optimization technique where filter conditions are sent to the data source before data is transferred, reducing the amount of data read into memory.
Detailed Explanation
In AWS Glue, push-down predicates on S3 partitioned tables mean that if you filter on year=2024, Glue only reads the year=2024 S3 prefix rather than scanning all partitions. This is critical for performance and cost on large partitioned datasets. Specify predicates using the push_down_predicate parameter in create_dynamic_frame.from_catalog().
Code Example
Examplepython
datasource = glueContext.create_dynamic_frame.from_catalog(
database="db", table_name="orders",
push_down_predicate="(year == 2024 and month == 12)"
)
AWS Glueperformanceoptimizationpartitioning