[CALCITE-5406] Support the SELECT DISTINCT ON statement for PostgreSQL dialect#4933
[CALCITE-5406] Support the SELECT DISTINCT ON statement for PostgreSQL dialect#4933xuzifu666 wants to merge 7 commits into
Conversation
99a667c to
a5629e3
Compare
|
"DISTINCT ON" is a unique syntax of PGSQL. Would it be more appropriate to implement the parsing support for this function on the babel parser |
Configuring additional keywords, join types, and binary operators in Adding custom parser methods (such as PostgreSQL's Babel itself does not have a separate syntax file; all SELECT-related syntax must be defined in
QUALIFY is a syntax specific to Snowflake, which Calcite implements in the core: The The The Validator/SqlToRelConverter logic is in the core;
The current Parser unconditionally parses This is consistent with the handling of other extensions such as QUALIFY, LATERAL, and TABLESAMPLE: the Parser is as lenient as possible, and the Validator decides whether to allow based on the dialect. so in my view:The current implementation conforms to Calcite's architectural conventions and does not require migration to Babel. For adjustments, finer-grained switches can be added to SqlConformance to control the availability of DISTINCT ON. By the way: Besides PG, ClickHouse also supports DISTINCT ON. |
|
I attempted to verify this across several databases (though my testing may not have been exhaustive) and indeed found that only PostgreSQL supports this syntax; furthermore, |
Good suggestion, I had add a config to control it in babel. |
| "parserImpls.ftl" | ||
| ] | ||
|
|
||
| includeDistinctOn: true |
There was a problem hiding this comment.
It seems like there’s a better place for this parameter.
There was a problem hiding this comment.
Would it be better to add a test case similar to "SELECT DISTINCT ON (deptno)" here? I tried executing "SELECT DISTINCT ON (deptno)" and it didn't generate any error
There was a problem hiding this comment.
OK, I had added related test cases in SqlParserTest.java.
There was a problem hiding this comment.
Perhaps it's a problem with my input method. I intended to add"SELECT DISTINCTON (deptno) empno" and "SELECT DISTINCT ON(deptno) empno". Scenarios like this, where text is written consecutively, should normally result in an error. However, the outcomes of the two situations I executed did not meet my expectations.
https://www.postgresql.org/docs/current/queries-select-lists.html#QUERIES-DISTINCT
There was a problem hiding this comment.
As far as I know, distinct on needs to be followed by an order by statement.
There was a problem hiding this comment.
As far as I know,
distinct onneeds to be followed by anorder bystatement.
Perhaps I should elaborate and add test cases similar to "SELECT DISTINCTON (deptno) empno, ename FROM emp ORDER BY deptno, empno"
|
|
What is the status of this PR? |
Currently I should had addressed all comments, maybe need a further review @mihaibudiu If you're interested, please help review this PR~ |
| @BaseMessage("SELECT DISTINCT ON requires an ORDER BY clause") | ||
| ExInst<SqlValidatorException> distinctOnRequiresOrderBy(); | ||
|
|
||
| @BaseMessage("SELECT DISTINCT ON expressions must match initial ORDER BY expressions") |
There was a problem hiding this comment.
what does "initial" mean here?
There was a problem hiding this comment.
This could be ambiguous, so I deleted the word.
| // not in the SELECT list. | ||
| final SqlValidatorScope selectScope3 = | ||
| select.isDistinct() | ||
| (select.isDistinct() && !select.isDistinctOn()) |
There was a problem hiding this comment.
So for distinctOn() isDistinct is true as well?
Or you can have SELECT DISTINCT DISTINCT ON(...)?
If yes, I hope there is a test
There was a problem hiding this comment.
And then you won't need an aggregate because the result has only 1 row?
There was a problem hiding this comment.
And then you won't need an aggregate because the result has only 1 row?
Yes, that’s the reason. I also added tests for DISTINCT and DISTINCT ON; in fact, this syntax isn't supported and results in an immediate error.
There was a problem hiding this comment.
SELECT DISTINCT DISTINCT ON(...) is also not support in PostgreSQL, we can refer this link: https://onecompiler.com/postgresql/44sqp4qfd
| * | ||
| * <ul> | ||
| * <li>0: distinct ({@link SqlLiteral})</li> | ||
| * <li>0: keywordList ({@link SqlNodeList})</li> |
There was a problem hiding this comment.
looks like there were some bugs in this comment...
There was a problem hiding this comment.
Yes, there is no need. I changed it.
| .ok(); | ||
|
|
||
| // DISTINCT ON with qualified column reference | ||
| f.withSql("SELECT DISTINCT ON (e.deptno) e.deptno AS x, e.empno " |
There was a problem hiding this comment.
Does this work if the ORDER BY column is not qualified?
Maybe there is such a test already
There was a problem hiding this comment.
It work well, I also added related test in last commit.
| includeIntervalWithoutQualifier: true | ||
| includeStarExclude: true | ||
| includeSelectBy: true | ||
| includeDistinctOn: true |
There was a problem hiding this comment.
I think you need to edit the reference.md grammar too, if it's possible to show babel-only extensions
|
|
||
| if (select.isDistinct()) { | ||
| distinctify(bb, true); | ||
| if (!select.isDistinctOn()) { |
There was a problem hiding this comment.
apparently it's impossible to have both DISTINCT and DISTINCT ON, so perhaps isDistinct() should return 'false' for DISTINCT ON
There was a problem hiding this comment.
You are right, I changed here.
|
What's going on with this PR? |
Sorry! I missed this message earlier. I will fix the address related issues soon. Thank you very much for the review. @mihaibudiu |
f7012ed to
9107937
Compare
|
I think I had addressed the comments, PTAL @mihaibudiu |
| SELECT * REPLACE list contains unknown column(s): DEPTNO | ||
| !error | ||
|
|
||
| # [CALCITE-5406] Support the SELECT DISTINCT ON statement for PostgreSQL dialect |
There was a problem hiding this comment.
I think I made this suggestion before: if these were validated on postgres, pleas say so in the comment, to spare future reviewers the need to validate these. I have seen tests with incorrect results in the past.
There was a problem hiding this comment.
I sincerely apologize; this taught me a valuable lesson. This time, I conducted rigorous testing and cross-verification using PostgreSQL, and the results matched perfectly. Comment had been added and refer to the result: https://onecompiler.com/postgresql/44sqjg9t6
| fetch = operand; | ||
| break; | ||
| case 11: | ||
| hints = (SqlNodeList) operand; |
There was a problem hiding this comment.
this looks like a bug you are fixing
There was a problem hiding this comment.
This is a pre-existing bug. While adding case 12 for DISTINCT ON, I also filled in the missing case 11 (hint handling). The code is now correct.
| return distinctOn; | ||
| } | ||
|
|
||
| public void setDistinctOn(@Nullable SqlNodeList distinctOn) { |
There was a problem hiding this comment.
is this needed?
In general immutable data structures are preferred.
There was a problem hiding this comment.
Yes, it is not needed; the corresponding code has been removed.
|
I’ve addressed the recent relevant comments; please help to review them again. @mihaibudiu |
|
|
I have approved |



jira: https://issues.apache.org/jira/browse/CALCITE-5406