Data Base

DDL.“DDL是数据定义语言的缩写。定义了不同的数据段数据库表格索引和其他数据库对象。普通语句的关键词主要包括CreateDropalter等。

RDBMS Relational Database Management System

Data Base

Lecture1 Introduction

Ted Codd. defines the relational data model

Oracle releases first commercial relational database

Rational Algebrea is the theoretical foundation for relational databases

Key

Duplicates are forbidden in relational tables

key可以是单独的一列，也可以是由多列组合成的

Primary Key

Normalization

“First Normal Form” (1NF)

• Each column should only contain ONE piece of information

一般就把一列拆成几列或者几个表

Normal Form 有UNF（universal），1NF到6NF

Every non key attribute must provide a fact about the key, the whole key, and nothing but the key——William Kent (1936 – 2005)

Entity（实体） and Relationship

• Entity Relationship Diagram (E/R Diagram, ER Diagram, ERD)

A way of representing entity tables and their relationships (relationship tables)

Data Definition Language (DDL)

DDL compiler generates a set of table templates stored in a data dictionary

• Database schema

• Integrity constraints (primary key, etc.)

• Authorization (who can access it)

Data Manipulation Language (DML)

• Language for accessing and updating the data organized by the appropriate data model (also known as query language)

SQL query language

• Takes several tables as input (possibly only one) and always returns a single table

Database Applications

• Database applications are usually partitioned into two or three parts

• Application programs generally access databases through one of

Language extensions to allow embedded SQL

Application Program Interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database

Lecture2 Introduction to SQL

history: ALPHA Codd’s querying language

SEQUEL: A Structured English Query Language

IBM Sequel language developed as part of System R project at the IBM San Jose Research Laboratory

It is now renamed as Structured Query Language (SQL)

Standardizaion of SQL

ANSI and ISO standard SQL:

• SQL-86

• SQL-89

• SQL-92

• SQL:1999 (language name became Y2K compliant!)

• SQL:2003

• Commercial systems offer most, if not all, SQL-92 features, plus varying feature sets from later standards and special proprietary features

• Not all examples here may work on your particular system.

其他类似的查询语言有：

QUEL--Ingres (后来升级成了postgres的前身)
QBE（Query by Example），IBM开发的。

Basic Syntax of SQL

select * from lab where time = '3-34';

数据库语言有两个重要的部分，contents (data) ，containers (tables) 所以就有了DDL 和DML

Query Language

Data Definition Language (DDL)

create
alter
drop

Data Manipulation Language (DML)

select
insert
delete
update

Data Types

Text data types

• char(length) -- fixed-length strings

• varchar(max length) -- non-fixed-length text

• varchar2(max length) -- Oracle’s transformation of varchar

• clob -- very long text (like GB-level text) 在MySQL中就是text

Numerical types

• int -- Integer (a finite subset of the integers that is machine-dependent)

• float(n) -- Floating point number, with user-specified precision of at least n digits

• real -- Floating point and double-precision floating point numbers, with machine-dependent precision

• numeric(p, d)

• Fixed point number, with user-specified precision of p digits, with d digits to the right of decimal point

• E.g., numeric(3,1), allows 44.5 to be stores exactly, but not 444.5 or 0.32 • In SQL Server, it is also called decimal

Date types

• date -- YYYY-MM-DD

• datetime -- YYYY-MM-DD HH:mm:SS

• timestamp -- YYYY-MM-DD HH:mm:SS

• But it is in the UNIX timestamp

• Value range: 1970-01-01 00:00:01 UTC - 2038-01-19 03:14:07 UTC

• More reading about the “Year 2038 Problem” of the timestamp data type: https://en.wikipedia.org/wiki/Year_2038_problem

Binary data types

• raw(max length)

• varbinary(max length)

• blob -- binary large object

• bytea -- used in PostgreSQL

DDL

CREATE TABLE

CREATE TABLE table_name(
	column1 datatype,
	column2 datatype,
	column3 datatype,
	column4 datatype
);

对大多数sql语言来说，table_name 是大小写不敏感的。keywords（create table）也是。

推荐使用下划线命名法。

双引号""代表着case-sensitive

Constraints 由 Chris Date 先提出的来保证数据的准确性

Constraints are declarative rules that the DBMS will check every time new data will be added, when data is changed, or even when data is deleted, in order to prevent any inconsistency.

DROP TABLE

DROP TABLE table_name;

ALTER TABLE

ALTER TABLE table_name ADD column_name datatype;
ALTER TABLE table_name DROP COLUMN column_name;
ALTER TABLE table_name ALTER COLUMN column_name TYPE datatype; -- 更改某列的类型。

ALTER TABLE table_name
ADD CONSTRAINT MyPrimaryKey PRIMARY KEY (column1, column2...);
ALTER TABLE table_name MODIFY column_name datatype NOT NULL;
ALTER TABLE table_name
ADD CONSTRAINT MyUniqueConstraint UNIQUE(column1, column2...);
ALTER TABLE table_name
ADD CONSTRAINT MyUniqueConstraint CHECK (CONDITION);

ALTER TABLE 主要用于添加，删除，或者修改现有表中的列，也可以用于在现有表格上添加和删除各种约束。

Constraints

Not NULL

CREATE TABLE table_name(
	column1 datatype not null,
	column2 datatype,
	column3 datatype,
	column4 datatype
);

Primary Key

CREATE TABLE table_name(
	column1 datatype not null primary key,
	column2 datatype,
	column3 datatype,
	column4 datatype,
);

primary key implies not null, 所以此处的 not null 是多余的

Unique

CREATE TABLE table_name(
	column1 datatype not null primary key,
	column2 datatype,
	column3 datatype,
	column4 datatype,
    unique (column1, column2)
);

CREATE TABLE table_name(
	column1 datatype not null primary key,
	column2 datatype unique,
	column3 datatype,
	column4 datatype,
);

unique可以是一列也可以是多列

Check

CREATE TABLE table_name(
	column1 datatype not null primary key,
	column2 datatype,
	column3 datatype,
	column4 datatype,
    unique (column1, column2),
    check(column1 >= 0),
    constraint validate_1 check(column2 >= 0) 
);

trick: 在名字中存在大小写不好去重时，可以用upper(name)全部转化成大写。

可以给每一个check取名字。如果每取，postgreSQL会生成一个默认的名字。

Referential Integrity 参照完整性/应用完整性

check语句是静态的，为了应对动态的变化，所以引入了 foreign key

Foreign Key

CREATE TABLE table_name(
	column1 datatype not null primary key,
	column2 datatype,
	column3 datatype,
	column4 datatype,
    unique (column1, column2),
    check(column1 >= 0),
    constraint validate_1 check(column2 >= 0),
    foreign key(column1) references table_2 (column3)
);

在大数据应用时，外键的使用要格外注意。

外键与级联更新更适用于单机低并发，不适合分布式、高并发集群。

Comments 注释

/*Multi-line
comments */
-- Single line comments
// Some DBMS support this one.

DML

INSERT

insert into table1(column1, column3, column6) values('1',1,'ds-dfds')

escape charactor: 文本中含'单引号怎么办，外层用双引号。

Lecture 3 Retrieving Data from One Table

What are you installing when you install PostgreSQL?

• Server-side program: the database management system itself

• Client-side program: the client tools to manipulate the server via networks

Search for the keywords after the class:

• basics in computer networking (IP address, port, client, server, web browser, HTTP)

• client-server architecture, browser-server architecture

超文本传输协议（Hyper Text Transfer Protocol，HTTP）是一个简单的请求-响应协议，它通常运行在TCP之上。它指定了客户端可能发送给服务器什么样的消息以及得到什么样的响应。

SELECT

select * from movies where country = 'us'

select 在交互程序可以多用，但是在应用程序中不应使用。

字符串常量应该用单引号括起来。

用法：

用法1. A common way to test expressions

select '437'

用法2. Can give the column a name

select '437' as FOO

用法3. 用于生成一个n行一列的表格，每个元素值都为'A'

select 'A' from movies

用法4. 用于对列进行算术计算

select runtime from movies;

select runtime * 10 as runtime10 from movies;

Logical Connectives

and, or, not

not: < > 或者 !=

select * from movies
where ( country = 'us' or country = 'gb' ) and (year_released between 1940 and 1949);

in()

select * from movies
where country = 'us' or country = 'gb';

--equivalent to

select * from movies
where coutry in ('us', 'gb');

Negation

key: not

select * from movies
where coutry not in ('us', 'gb');

Like

a kind of regex (regular expression) 正则表达式

% meaning "any number of characters, including none”

_ meaning "one and only one character"

https://www.jianshu.com/p/6dfeb91d6b51

如果要查%和_，就用escape character

select * from movies where upper(title) not like '%A%';
select username from gg_user where username like '%xiao/_%' escape '/';  
select username from gg_user where username like '%xiao/%%' escape '/';

Date

Date Formats

DD/MM/YYYY

MM/DD/YYYY

YYYY/MM/DD

select * from forum_posts where post_date >= '2018-03-12';
select * from forum_posts where post_date >= date('2018-03-12');
select * from forum_posts where post_date >= date('12 March, 2018');

select date_eq_timestamp(date('2018-03-12'), date('2018-02-12') + inteval '1 month'); --true

NULL

在其他很多语言中null是一个值

在sql中，null 不是一个值

select * from movies where runtime is null;
select * from movies where runtime = null; -- warning in DataGrip; not the same as 'is null'

Some Functions

Show DDL of a table

desc movies; -- Oracle, MySQL
describle table movies; --IBM DB2
\d movies -- PostgreSQL
.schema movies --SQLite

Concatenating Two Strings

Most products use || (two vertical bars) to indicate string concatenation

• SQL Server, though, uses +

• MySQL a special concat() function that also exists in some other products

select title 
	|| 'was released in '
	|| year_released movie_release
from movies
where country = 'us';

相当于选择格式为title || 'was released in ' || year_released ，把他们连接所形成的string 放在一列叫做 movie_release

select title
	|| 'was release in '
	|| cast(year_released as varchar) movie_release
from movies
where country = 'us'

cast(year_released as varchar): 把 year_released 这个数据类型原本是integer的转换成string

When to use functions

Computing age

You should never store an age; it changes all the time！！！

select peopleid, surname,
	date_part('year', now()) - born as age
from people
where died is null;

date_part 用于从一些时间标识符领域提取资源，like

century
decade
year
month
day
hour
minute
second
microseconds
milliseconds
dow
doy
epoch
isodow
isoyear
timezone
timezone_hour
timezone_minute

Numerical functions

round(3.141592,3) --3.142
trunc(3.141592,3) --3.141

round 四舍五入，trunc截断

String functions

upper('Citizen Kane')
lower('Citizen Kane')
substr('Citizen Kane',5,3) -- 'zen'
trim('   Oops    ') -- 'Oops'
replace('Sheep', 'ee', 'i') --'Ship'

Type casting

select cast(born as char)||'abc' from people;
select cast(born as char(2)) ||'abc' from people;

Case When

select peopleid, surname,
	case gender
		when 'M' then 'male'
		when 'F' then 'female'
	end as gender_2
from people
where died is null;

select peopleid,surname,
	case(date_part('year', now()) - born > 44)
		when true then 'older than 44'
		when false then 'younger than 44'
		else '44 years old'
	end as status
from people
where died is null;

Lecture4 More on Retrieving Data

retrieving data: 检索数据

it's not a relation because many rows cannot be distinguished

Retrieving Data from a Single Table

distinct

select distinct country from movies 
where year_released < 2000;

select distince country, year_released from movies where year_released in (2000,2001);

计算先后：先算where，再去重

第二个语句中distinct是作用在（country，year_released）上的。

aggregate functions (count)

聚合函数

Aggregate function will aggregate all rows that share a feature (such as being movies from the same country) and return a characteristic of each group of aggregated rows.

select country, count(*) number_of_movies
from movies group by country;

select country, year_releases count(*) number_of_movies_per_year from movies group by country, year_released;

notice: group by 后面的列要和聚合函数里的列一一对应。

number_of_movies 是生成的列的名称。

注意：执行顺序是先aggregate 再select from，所以时间复杂度较高。

其他常见的aggregate function有 count(*)/count(col), min(col), max(col), stddev(col), avg(col), 例如：

select country,min(year_released) oldest_movie
from movies group by country;

select * from(
	select country,
	min(year_released) oldest_movie
	from movies
	group by country
	) earliest_movies_per_country
where oldest_movie < 1940;

双重select，外层的select是在内层的结果集上进行查找。

However, 也可以不用双重select，用 having 来实现。

select country,
	min(year_released) oldest_movie
from movies
group by country
having min(year_release d) < 1940

注意：aggregate rows 隐喻着排序。

在DMS（database management systems）中有个"query optimizer" 会搞一些事情。。。可能会使情况变好/变坏。

aggregate functions 会忽视 null

count(*) 会count所有row包括null，而count(col) 只count非null的row

select count(distinct colname) 只count colname 中有多少个distinct的

既是演员又是导演

select count(*) number_of_acting_directors
from (
	select peopleid, count(*) as number_of_roles
	from(
		select distinct peopleid, credit_as 
		from credits
		where credits_as in('A','D')
		) all_actors_and_directors
	group by peopleid
	having count(*) == 2) acting_directors;

Lecture5 Retrieving Data from Multiple Tables

JOIN

select title, country_name, year_released
from movies
join countries
on country_code = country
where country_code <> 'us';

natural join

select * from people natural join credits;
-- The same as:
select *
from people join credits
on people.peopleid = credits.peopleid;
-- Or use "using"
select *
from people natural join credits using (peopleid);
-- A better practice : just write all of them in a unified way
select *
from people join credits
on people.peopleid = credits.peopleid;

if we don't specify the column.

如果要join的表中有两列名称相同，natural join 会自动找出来join，最终表格中只保留名字不变的一列。

Notice:

“If a column has the same name, then we should join on it”

• Bad idea!

• Same name != Same meaning

It is preferred not to depend on how database designers name their columns

Self Join

Join the same table together

• For example: How can we find all the pairs of people with the same first name?

select *
from people p1 join people p2 -- rename the tables, or you cannot refer to them respectively
on p1.first_name = p2.first_name -- p1 = first people table; p2 = second people table
where p1.peopleid <> p2.peopleid;

Chaining Joins Together

Example: Show names of actors and directors for Chinese movies

select m.title, c.credited_as, p.first_name, p.surname
from 
	movies m join credits c on movieid = c.movieid join people p on c.peopleid = p.peopleid
where m.country = 'cn';

上面的代码把 movies rename成 m

join was introduced in SQL-1999

join 默认是inner，不然就需要left join/right join/full join

Left outer join

select * from movies m left join credits c on m.movieid = c.movieid
where m.year_released = 2018;

select c.country_name, number_of_movies
from countries c left join(
    select country as stat_country_code,
    	count(*) as number_of_movies
    from movies
    group by country
) stat
on c.country_code = stat_country_code;

在合并表格的同时replace nulls

select c.country_name,
	case
		when stat.number_of_movies is null then 0
		else stat.number_of_movies
	end
from countries c left join(
	select country as stat_country_code,
		count(*) as number_of_movies
	from movies
	group by country
) stat
on c.country_code = stat_country_code;

Set Operators

Unoin

Takes two result sets and combines them into a single result set

把两个五列的表格unoin在一起会得到一个五列的表格，把他们join在一起是十列表格。

Union requires two (commonsensical) conditions:

• They must return the same number of columns

• The data types of corresponding columns must match.

select movieid
from movies
where country = 'us'
union
select movieid
from movies
where country = 'gb'

Usage scenario: combine movies from two tables, one for standard accounts and one for VIP accounts

(select movieid
from movies limit 5 offset 0)
union
(select movieid
from movies limit 5 offset 3)

limit 表示读取5条数据，offset 表示跳过 3个数据，即从第4个数据开始读5个。limit也可单独使用。

union 会自动去重，如果不要自动去重，就用union all

intersect

Return the rows that appears in both tables

except

Return the rows that appear in the first table but not the second one

Sometimes written as minus in some database products

However, they are not used as much as union

• intersect -> inner join

• except -> left outer join with an “is null” condition

Subquery

Subquery after Select

Example: show titles, released years, and country names for non-US movies

-- solution 1 : join
select m.title, m.year_released, c.country_name
from movies m join countries c
on m.country  = c.country_code
where m.country <> 'us';
-- solution 2 : Nested selection
select m.title,
	m.year_released,
	(
        select c.country_name
        from countries c
        where c.country_code = m.country
    ) country_name
from movies m
where m.country <> 'us';

Subquery after Where

Example: Select all European movies

-- solution1:list them all
select country, year_released, title
from movies
where country in('fr','de',...)

-- proper solution: use subquery after where
select country, year_released, title
from movies
where country in(
    select country_code
    from countries
    where continent = 'EUROPE'
);

Especially useful when the table in the subquery changes often

Some products (Oracle, DB2, PostgreSQL with some twisting) even allow comparing a set of column values (the correct word is "tuple") to the result of a subquery. e.g.

(col1, col2) in
	(select col3, col4
	from t
	where ...)

Notice:

in() means an implicit distinct in the subquery

• in(‘cn’, ‘us’, ‘cn’, ‘us’, ‘us’) is equal to in(‘cn’, ‘us’)

null values in in()

• Be extremely cautious if you are using not in(…) with a null value in it

value not in(2, 3, null)

等价于 not (value=2 or value=3 or value=null)

等价于 value<>2 and value<>3 and value<>null

因为null不是一个数，所以对任意的value，value = null 和value <> null 都永远为 false。所以not in(2,3,null) 的返回值一直是false。

Lecture6 Update，Delete，Applications

CRUD/CURD（就是顺序不一样，都是指数据库中最基本的操作） crud是指在做计算处理时的增加(Create)、检索(Retrieve)、更新(Update)和删除(Delete)几个单词的首字母简写。crud主要被用在描述软件系统中数据库或者持久层的基本操作功能。

• In SQL: insert, select, update, delete

• In RESTful API: Post, Get, Put, Delete

Update

When you are doing any experiments with writing operations (update, delete), backup first

在操作之前，先用select看一下效果

select replace('von Neumann', 'von ', '') || '(von)';

然后再操作

update people
set surname = replace(surname, 'von ', '') || '(von)'
where surname like 'von %';

如果surname中不含von，就是在末尾加上（von）

如果没有where的限制，会对整张表进行更改。在一些IDEs中如datagrip会有warning

update people p
set num_movies = (
	select count(*) from credits c where c.peopleid = p.peopleid
	);

Delete

delete from countries
where country_code = 'us';

many database products provide a roll-back mechanism when deleting rows 意思是如果执行到某一行出错了，之前已经执行了的delete的操作也会被恢复。

Application Programs and User Interfaces

Applications split into

• front-end : user interface

• Forms

• Graphical user interfaces

• Many interfaces are Web-based

• middle layer

• backend

Application Architecture Evolution

Three distinct era’s of application architecture

• Mainframe (1960’s and 70’s)

• Personal computer era (1980’s)

• Web era (mid 1990’s onwards)

• Web and Smartphone era (2010 onwards)

Web Interface

Web browsers have become the de-facto(事实上) standard user interface to databases

The World Wide Web

Most Web documents are hypertext documents formatted via the HyperText Markup Language (HTML)

HTML documents contain

• text along with font specifications, and other formatting instructions

• hypertext links to other documents, which can be associated with regions of the text.

• forms, enabling users to enter data which can then be sent back to the Web server

Three-Layer Web Architecture

HTML and HTTP

• HTML provides formatting, hypertext link, and image display features

• including tables, stylesheets (to alter default formatting), etc.

• HTML also provides input features

• Select from a set of options

• Pop-up menus, radio buttons, check lists

• Enter values

• Text boxes • Filled in input sent back to the server, to be acted upon by an executable at the server

• HyperText Transfer Protocol (HTTP) used for communication with the Web server

JavaScript

Lecture 7 - 1:More about NULL; Ordering; Window Function

NULL

大部分和null一起的运算结果都为null

col+NULL -> NULL
(col > NULL) -> NULL

需要注意的logical operator（and，or）

TRUE and NULL -> NULL
FALSE and NULL -> FALSE

TRUE or NULL -> TRUE
FALSE or NULL -> NULL

col is NULL -> Ture or False

-- 注意
value not in (2,3,null) -- 这句表达式的值不可能为true
-- 可以理解为 value not in (2,3) and (value <> NULL)
-- (value <> NULL) 的取值只可能为NULL

Ordering in SQL

order ... by ... asc/desc

可以 order by 多列

单表可以 order by

多表 join 时，可以 order by join 前的一个表的一列。

还可以有自定义的顺序

select * from credits
order by
	case credited_as
		when 'D' then 1
		when 'A' then 2
	end desc

data types in ordering

NULL: 在不同的数据库中有不同的级别

Oracle/PostgreSQL：NULL greater than anything

SQL Server， MySQL， SQLite：NULL smaller than anything

limit offset

limit k offset p

Return the top-k rows in the result set and skip the first p rows

select * from movies
where country = 'us'
order by year_released
limit 10 offset 5

注意：limit offset 一定和要 order by 一起使用

Lecture8 - Window Function

Scalar Functions vs Aggregation Functions

Scalar Function: 使用某行的数据或某个数据

Aggregation Functions:使用多行的数据

Window Function

select title, country, year_released,
	rank() over(partition by country order by year_released) movie_rank
from movies;

select country,title, year_released, min(year_released) over (partition by country order by year_released) movie_rank
from movies;

aggregate funcion 只能返回一个数据， window function会多加上一列

Lection10 Function

Procedural vs Declarative

Procedural language 过程式语言

Imperative: Describe the algorithms step-by-step (How to do)

• Procedural: C (and many other legacy languages) procedural 类似命令型的语言

• Object-oriented: Java

Declarative: Describe the result without specifying the detailed steps (What to do)

• (Pure) declarative: SQL, Regular Expressions, Markup (HTML, XML), CSS

• Functional: Scheme, Haskell, Scala, Erlang

• Logic programming: Prolog

Procedural Extension to SQL

Many DBMS products provide a procedural （私有的，每个DBMS会不一样）extension to the standard SQL

SQL Server Transact-SQL

ORACLE PL/SQL

PostgreSQL PL/PGSQL

MySQL (No specific name)

SQLite (Not supported) … well, sometimes SQLite is even not considered a DBMS

select case
	when first_name is null then ''
	else first_name || ''
end || case position('(' in surname)
	when 0 then surname
	else trim(')' from substr(surname, postition('(' in surname) + 1)) || '' || trim(substr(surname, 1, position('(' in surname) - 1))
end
from people
where surname not like '%(von)';


create function function_name(p_fname varchar, p_sname varchar)
returns varchar
as $$
begin
	return case
		when p_fname is null then ''
		else p_fname || ''
	end || case position('(' in surname)
		when 0 then surname
		else trim(')' from substr(surname, postition('(' in surname) + 1)) || '' || trim(substr(surname, 1, position('(' in surname) - 1))
	end;
end;
$$ language plpgsql;

Language Type

PostgreSQL supports 4 procedural languages: PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python (Tcl, Perl, Python 都是著名的scripting language)

Trim

built-in function 去掉一个字符串中的(第一个遇到的）指定字符。默认是去空格，直接用trim（）的话，默认是使用BTRIM（）

trim (LEADING FROM string)
LTRIM(string, [character])

trim (TRAILING FROM string)
RTRIM(string, [character])

trim (BOTH FROM string)
BTRIM(string, [character])

function 里也可以使用select等SQL语句

create function get_country_name(p_code varchar)
returns countries.country_name%type
-- %type 求这一列对应的类型
as $$
declare
	v_name countries.country_name%type;
begin
	select country_name
	into v_name
	from countries
	where country_code = p.code;
	return v_name;
end;
$$ language plpgsql;

drop function get_country_name(p_code varchar)

--调用函数
select get_country_name("dfs")
perform get_country_name("dfs")

Functions vs. Procedures

“Function” comes from mathematics （map a value to another，Thus, functions always have a return value ）

”Procedure” comes from programming （describe a set of instructions that will be executed in order，does NOT (necessarily) have a return value）

在一些语言中他们是等价的

Functions and Procedures in (Postgre)SQL

Function: return a value
Procedure: return NO value

但是在postgreSQL中Procedure通过void function实现

用procedure的好处：

Network overhead

减少和服务器交互的时间消耗
Security

Access to the Oracle data can be restricted by allowing users to manipulate the data only through stored procedures that execute with their definer’s privileges

How can we pack them into a single execution unit?

• Minimize the communication between the client program we are using and the database server

• Client program = DataGrip, psql

insert into select

insert into table2
select * form table1
where condition;

在使用insert into 之后，可以判断一下是不是正确插入了。

get diagnostics n_rowcount = row_count;
if n_rowcount != n_people
then 
	raise exception 'Some people couldn't be found';
end if;
-- 写在函数里

Lecture12 Trigger

Trigger

Trigger 触发器，在对表格的某种修改后会自动触发

作用：

Validate data
Checking complex rules
Manage data redundancy

On-the-fly modification（计算机：运行中）：Change the input directly when the input arrives

比如：插入同一部电影的英文信息，中文信息等。

PostgreSQL 触发器可以在下面几种情况下触发：在执行操作之前BEFROE（在检查约束并尝试INSERT、UPDATE或DELETE之前）。在执行操作之后AFTER（在检查约束并INSERT、UPDATE或DELETE之后）。更新操作（在对一个视图进行插入、更新、删除时）。
每个DBMS对允许触发trigger的时间会不一样。
触发器的 FOR EACH ROW 属性是可选的，如果选中，当操作修改时每行调用一次；相反，选中 FOR EACH STATEMENT，不管修改了多少行，每个语句标记的触发器执行一次。
WHEN 子句和触发器操作在引用 NEW.column-name 和 OLD.column-name 表单插入、删除或更新时可以访问每一行元素。其中 column-name 是与触发器关联的表中的列的名称。
如果存在 WHEN 子句，PostgreSQL 语句只会执行 WHEN 子句成立的那一行，如果没有 WHEN 子句，PostgreSQL 语句会在每一行执行。
BEFORE 或 AFTER 关键字决定何时执行触发器动作，决定是在关联行的插入、修改或删除之前或者之后执行触发器动作。
要修改的表必须存在于同一数据库中，作为触发器被附加的表或视图，且必须只使用 tablename，而不是 database.tablename。
当创建约束触发器时会指定约束选项。这与常规触发器相同，只是可以使用这种约束来调整触发器触发的时间。当约束触发器实现的约束被违反时，它将抛出异常。

Fire a trigger

create trigger test_trigger
	before update
	-- before/after/instead of + insert/update/delete
	on people_1
	for each row
	-- for each row/statement
	execute procedure fill_in_num_movies();
	
	
	
create or replace function fill_in_num_movies()
	returns trigger
as
$$
begin
	select count(distinct c.movieid)
	into new.num_movies
	-- "new" 和 "old" 是两个内部变量，that represent the row before and after the changes
	from credits c
	where c.peopleid = new.peopleid;
	return new;
end;
$$ language plpgsql;	

update people_l set num_movies = 0 where people_l.peopleid <= 100;

执行顺序：

create the procedure fill_in_num_movies()
then, create the trigger
finally, we can run some test updates statements

常见使用场景：

Modify input on the fly

before insert / update
for each row

Check complex rules

before insert / update / delete
for each row

Manage data redundancy

after insert / update / delete
for each row

Lecture13: View, Access Control

View

View（视图）实际上是一个以预定义的 PostgreSQL 查询形式存在的表的组合。

View（视图）可以包含一个表的所有行或从一个或多个表选定行。

View（视图）可以从一个或多个表创建，这取决于要创建视图的 PostgreSQL 查询。

View（视图）是一种虚拟表，允许用户实现以下几点：

用户或用户组认为更自然或直观查找结构数据的方式。
限制数据访问，用户只能看到有限的数据，而不是完整的表。
汇总各种表中的数据，用于生成报告。

PostgreSQL 视图是只读的，因此可能无法在视图上执行 DELETE、INSERT 或 UPDATE 语句。但是可以在视图上创建一个触发器，当尝试 DELETE、INSERT 或 UPDATE 视图时触发，需要做的动作在触发器内容中定义。i

create view my_view(col1, col2, col3) as 
select m.movieid,m.title,m.year_released, c.country_name
from movies m join countries c
on c.country_code = m.country;
--这样会把前三列重命名成col1,col2,col3,第四列会使用它原来的名字。

drop view my_view;

View（视图）实际上是一个以预定义的 PostgreSQL 查询形式存在的表的组合。

View（视图）可以包含一个表的所有行或从一个或多个表选定行。

View（视图）可以从一个或多个表创建，这取决于要创建视图的 PostgreSQL 查询。

View（视图）是一种虚拟表，允许用户实现以下几点：

用户或用户组认为更自然或直观查找结构数据的方式。
限制数据访问，用户只能看到有限的数据，而不是完整的表。
汇总各种表中的数据，用于生成报告。

PostgreSQL 视图是只读的，因此可能无法在视图上执行 DELETE、INSERT 或 UPDATE 语句。但是可以在视图上创建一个触发器，当尝试 DELETE、INSERT 或 UPDATE 视图时触发，需要做的动作在触发器内容中定义。

使用场景：

Simplify Complex Queries
An alternative way to implement E-R models

view可以作为另一种实现ER图的方式，比如 access control, dirty and messy original data.

Access Control

Operations (select, update, insert, delete, etc.)

Objects (table, database, views, trigger, etc.)

要向用户分配权限，可以使用 GRANT 命令。

grant select on movies to test_user;

GRANT privilege [,...]
on object [,...]
TO {PUBLIC | Group group | username}

privilege − 值可以为：SELECT，INSERT，UPDATE，DELETE， RULE，ALL。
object − 要授予访问权限的对象名称。可能的对象有： table， view，sequence。
PUBLIC − 表示所有用户。
GROUP group − 为用户组授予权限。
username − 要授予权限的用户名。PUBLIC 是代表所有用户的简短形式。

可以使用 REVOKE 命令取消权限。

revoke select on movies from test_user;

REVOKE privilege [,...]
ON object [,...]
FROM {PUBLIC | Group groupname | username}

和view的联动：

create view user_view as
select movieid，title
from movies
where user_name = user;

--user 是一个内置的变量，返回现有的所有user_name

Lecture14：Indexing

Concept：An index is a data structure which improves the efficiency of retrieving data with specific values from a database

In PostgreSQL, indexes are built automatically on columns with primary key or unique constraints

create index index_name
on table_name (column_name [, ...])
create index salary_index on company(salary);

Index Taxonomy

索引的分类

能否完全和数据分开

能：external index（postgreSQL）

不能：integrated index
是否说明了数据的存储顺序，是不是排了序

是：clustered index

否：non-clustered index
Does every search key in the data file correspond to an index entry?

Yes：Dense Index

No： Sparse Index

Does the search key contain more than one attribute?

Yes：Multi-key/Multi-column index

No：Single-key/Single-column index

Index implementation

索引的实现

主要是两种数据结构：

B-tree 和B+-tree 中间那个是连字符，不是B-

Hash table

B tree

虽然索引的目的在于提高数据库的性能，但这里有几个情况需要避免使用索引。

使用索引时，需要考虑下列准则：

索引不应该使用在较小的表上。
索引不应该使用在有频繁的大批量的更新或插入操作的表上。
索引不应该使用在含有大量的 NULL 值的列上。
索引不应该使用在频繁操作的列上。