Druid官方文档翻译:教程查询数据

Tutorial: Querying data

This tutorial will demonstrate how to query data in Druid, with examples for Druid’s native query format and Druid SQL.

The tutorial assumes that you’ve already completed one of the 4 ingestion tutorials, as we will be querying the sample Wikipedia edits data.

本教程将演示如何在Druid中查询数据,并举例说明Druid的原始JSON查询格式和Druid SQL。

本教程假设您已经完成了4个摄取教程中的一个,因为我们将查询Wikipedia编辑数据的示例。

Native JSON queries

Druid’s native query format is expressed in JSON. We have included a sample native TopN query under quickstart/tutorial/wikipedia-top-pages.json:

Druid原始查询格式使用JSON表达。我们包含一个示例本机TopN查询

 {
“queryType” : “topN”,
“dataSource” : “wikipedia”,
“intervals” : [“2015-09-12/2015-09-13”],
“granularity” : “all”,
“dimension” : “page”,
“metric” : “count”,
“threshold” : 10,
“aggregations” : [
{
“type” : “count”,
“name” : “count”
}]
}

This query retrieves the 10 Wikipedia pages with the most page edits on 2015-09-12.

这个查询获取在Wikipedia上2015-09-12被编辑最多的10个页面

Let’s submit this query to the Druid broker:(提交这个查询到Druid broker)

 curl -X ‘POST’ -H ‘Content-Type:application/json’ -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8082/druid/v2?pretty

You should see the following query results:(你应该可以看到如下返回查询结果)

 [ {
“timestamp” : “2015-09-12T00:46:58.771Z”,
“result” : [ {
“count” : 33,
“page” : “Wikipedia:Vandalismusmeldung”
}, {
“count” : 28,
“page” : “User:Cyde/List of candidates for speedy deletion/Subpage”
}, {
“count” : 27,
“page” : “Jeremy Corbyn”
}, {
“count” : 21,
“page” : “Wikipedia:Administrators’ noticeboard/Incidents”
}, {
“count” : 20,
“page” : “Flavia Pennetta”
}, {
“count” : 18,
“page” : “Total Drama Presents: The Ridonculous Race”
}, {
“count” : 18,
“page” : “User talk:Dudeperson176123”
}, {
“count” : 18,
“page” : “Wikipédia:Le Bistro/12 septembre 2015”
}, {
“count” : 17,
“page” : “Wikipedia:In the news/Candidates”
}, {
“count” : 17,
“page” : “Wikipedia:Requests for page protection”
} ]
} ]

Druid SQL queries

Druid also supports a dialect of SQL for querying. Let’s run a SQL query that is equivalent to the native JSON query shown above:

Druid SQL查询
Druid也支持SQL语法查询。让我们运行一个与上面所示的本地JSON查询等效的SQL查询:

SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE “__time” BETWEEN TIMESTAMP ‘2015-09-12 00:00:00’ AND TIMESTAMP ‘2015-09-13 00:00:00’ GROUP BY page ORDER BY Edits DESC LIMIT 10;

The SQL queries are submitted as JSON over HTTP.

这个SQL查询通过HTTP JSON提交

TopN query example

The tutorial package includes an example file that contains the SQL query shown above at quickstart/tutorial/wikipedia-top-pages-sql.json. Let’s submit that query to the Druid broker:

curl -X ‘POST’ -H ‘Content-Type:application/json’ -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8082/druid/v2/sql

The following results should be returned:

 [
{
“page”: “Wikipedia:Vandalismusmeldung”,
“Edits”: 33
},
{
“page”: “User:Cyde/List of candidates for speedy deletion/Subpage”,
“Edits”: 28
},
{
“page”: “Jeremy Corbyn”,
“Edits”: 27
},
{
“page”: “Wikipedia:Administrators’ noticeboard/Incidents”,
“Edits”: 21
},
{
“page”: “Flavia Pennetta”,
“Edits”: 20
},
{
“page”: “Total Drama Presents: The Ridonculous Race”,
“Edits”: 18
},
{
“page”: “User talk:Dudeperson176123”,
“Edits”: 18
},
{
“page”: “Wikipédia:Le Bistro/12 septembre 2015”,
“Edits”: 18
},
{
“page”: “Wikipedia:In the news/Candidates”,
“Edits”: 17
},
{
“page”: “Wikipedia:Requests for page protection”,
“Edits”: 17
}
]

dsql client

For convenience, the Druid package includes a SQL command-line client, located at bin/dsql from the Druid package root.

交互方式查询SQL,Druid包包括一个SQL命令行客户端,它位于Druid包根目录的bin/dsql。

Let’s now run bin/dsql; you should see the following prompt:

 Welcome to dsql, the command-line client for Druid SQL.
Type “\h” for help.
dsql>

To submit the query, paste it to the dsql prompt and press enter:

 dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE “__time” BETWEEN TIMESTAMP ‘2015-09-12 00:00:00’ AND TIMESTAMP ‘2015-09-13 00:00:00′ GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌──────────────────────────────────────────────────────────┬───────┐
│ page │ Edits │
├──────────────────────────────────────────────────────────┼───────┤
│ Wikipedia:Vandalismusmeldung │ 33 │
│ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │
│ Jeremy Corbyn │ 27 │
│ Wikipedia:Administrators’ noticeboard/Incidents │ 21 │
│ Flavia Pennetta │ 20 │
│ Total Drama Presents: The Ridonculous Race │ 18 │
│ User talk:Dudeperson176123 │ 18 │
│ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │
│ Wikipedia:In the news/Candidates │ 17 │
│ Wikipedia:Requests for page protection │ 17 │
└──────────────────────────────────────────────────────────┴───────┘
Retrieved 10 rows in 0.06s.

Additional Druid SQL queries

Timeseries

SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR);

GroupBy

SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel ORDER BY SUM(added) DESC LIMIT 5;

Scan

SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5;

EXPLAIN PLAN FOR

By prepending EXPLAIN PLAN FOR to a Druid SQL query, it is possible to see what native Druid queries a SQL query will plan into.

Using the TopN query above as an example:

EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;

Further reading

The Queries documentation has more information on Druid’s native JSON queries.

The Druid SQL documentation has more information on using Druid SQL queries.

进一步阅读

Queries文档中有关于Druid的本地JSON查询的更多信息。

Druid SQL文档中有更多关于使用Druid SQL查询的信息。

推荐文章

沪公网安备 31010702002009号