7. Join Algorithms

由于为消除 table 中信息的冗余，我们会采取normalize来使得数据库 table 的设计符合一定范式，但是之后需要使用join来重建原来的 tuple

一般使用inner equijoin,inner equijoin连接两张表中 key 相同的 tuple。其他 join 算法可以通过该算法调整得到

7.1. Join operator output

copy outer 和 inner tuples 的 attributes into a new tuple.
Subsequent operators in the query plan never need to go back to the base tables to get more data.

两种处理方式
- 可以在 Join 的时候将所有非 Join Attributes 都放入新的 tuple 中，这样 Join 之后的操作都不需要从 tables 中重新获取数据
![image](https://2836672763-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LMjQD5UezC9P8miypMG%2F-La5GNw04x8dghkKNnFu%2F-La5KYkuE3zEy5hImXRs%2FScreen Shot 2019-03-16 at 7.49.45 PM.jpg?alt=media&token=5d9c0b76-0bd8-4224-b105-52edd6b2cb79)
- 也可以在 Join 的时候只复制 Join Attributes 以及 record id，后续操作自行根据 record id 去 tables 中获取相关数据。对于列存储数据库，这是比较理想的处理方式，被称为 Late Materialization。

由于数据库中的数据量通常较大，无法一次性载入内存，因此 Join Algorithm 的设计目的，在于减少磁盘 I/O，因此我们衡量 Join Algorithm 好坏的标准，就是 I/O 的数量。此外我们不需要考虑 Join 结果的大小，因为不论使用怎样的 Join Algorithm，结果集的大小都一样。
用笛卡尔积加谓词筛选实现 join 非常低效