Postgres anti join8/2/2023 Rows are present to start with only few rows can meet any condition. So, yes, the planner could infer a bit more here - after all, if few LEFT JOIN tmp_san_2 ON tmp_san_1.id = tmp_san_2.id AND Join clause for the planner to make correct estimates again: Instead, it seems, I have to move this condition (inverted) into the WHERE (tmp_san_2.id IS NULL OR tmp_san_2.text IS NULL) įilter: ((tmp_san_2.id IS NULL) OR (tmp_san_2.text IS NULL)) It get's more interesting again, if the text field really could be NULLĪnd I wanted to include those rows. but if the join and the check refer to the same field everything is LEFT JOIN tmp_san_2 ON tmp_san_1.id = tmp_san_2.id So, even a join on the id field is badly estimated if checked on the Join field, but with the selection of the proper field for the Thanks a lot! Right, my problem had nothing to do with the type of the > but it doesn't look much like a typical use-case to me. > Possibly the planner could be smarter about estimating for this case, > but that row chanced to have a null value of id. > antijoin rows, ie tmp_san_1.text *did* have a match in tmp_san_2, > As written, the query could return some rows that weren't actually > join column to be IS NULL, not some random other column. To make it one, you have to constrain the RHS > left join tmp_san_2 on tmp_san_1.text = tmp_san_2.text > resulting row, while there are still of course 9,999 of them: > The same anti-join using the text fields, however estimates just 1 Unfortunately my real-world example has to use varchar for the join. I cannot explain that behavior and much less think of a fix or workaround. =# explain analyze select tmp_san_1.id from tmp_san_1 left join tmp_san_2 on tmp_san_1.text = tmp_san_2.text where tmp_san_2.id is null The same anti-join using the text fields, however estimates just 1 resulting row, while there are still of course 9,999 of them: =# explain analyze select tmp_san_1.id from tmp_san_1 left join tmp_san_2 on tmp_san_1.id = tmp_san_2.id where tmp_san_2.id is null QUERY PLAN The number of resulting rows are estimated correctly as 9,999: Now I do an anti-join between the two tables via the id field (integer). Table "public.tmp_san_2"Column | Type | Modifiers =# \d tmp_san_* Table "public.tmp_san_1"Column | Type | Modifiers =# select generate_series(1, 1) as id, generate_series(1,1)::text as text into table tmp_san_2 =# select generate_series(1, 10000) as id, generate_series(1,10000)::text as text into table tmp_san_1 The first table has 10,000 rows the second table just one: FIrst generate two tables with a series of numbers - once as integers once as text. I did some experiments to extract the problem in a simple form. I have encountered a strange problem when doing an anti-join with a very small table via a varchar or text field as opposed to an integer field.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |