Hands-On Machine Learning on Google Cloud Platform
上QQ阅读APP看书,第一时间看更新

Querying the database

So far we have seen how to move from unstructured information to structured information. In particular, we have learned that this operation requires a lot of resources. The need for properly structured data derives from the need to search for information in a database to extract knowledge.

Any Database management system (DBMS) provides a very powerful tool for consulting the contents of database tables: queries.

A DBMS, is a software system designed to allow the creation, manipulation (by a DBA administrator) and the efficient query (by one or more client users) of databases, so also called a database manager or an engine, and hosted on a dedicated hardware architecture (server) or on a simple computer.

A query is a method for querying the database, that is, to display information extracted from the tables. For this purpose it is possible to perform several operations:

  • Filter the data contained in a table according to various criteria
  • Reassociate data contained in different tables
  • Choose which fields to view
  • Sort the result based on the values of some fields
  • Group (aggregate) records that have the same values in a certain field

Queries can be executed to display the result only once, or they can be saved in the database to be executed several times. If you use a query again after changing the table, the query result returns the new table contents.

Suppose, for example, that you have a library database containing two tables:

  • The Books table, represents the books in the catalog and contains the fields ID_Book (primary key), Title, Author, Year, Price, Publisher
  • The Publisher table, represents the publishing houses with which the library has contacts and includes the fields Publishing House (primary key), City, Telephone

Think of wanting to define a query that shows the title and price of all books. This query will need to access the Books table and present the user with only the two fields chosen (Title and Price). In case the Books table contains 80 volumes, the query will display 80 Title and Price pairs in the result. If one of the records were deleted from the Books table, rerunning the already-defined query would result in only 79 Title and Price pairs, without having to intervene on the definition of the query.