Introduction to Data Structures and Algorithms¶
Introduction¶
- An elegant Problem: a problem that has a solution that is both simple and efficient.
- A solution is said to be efficient if:
- it solves the problem within the required resource constraints.
- it requires fewer resources than known alternatives, regardless of whether it meets any particular requirements.
- An exact-match query is a query that accesses records by their primary key (ID).
- A range query is a query that finds all records that match a particular value or range of values for a set of attributes.
- A Type is a set (collection) of values. e.g the set (true, false) forms the type Boolean.
- A Simple Type is a type whose values are indivisible (atomic, has no parts). e.g. Boolean, Integer, String, etc.
- A Composite Type (aggregate type) is a type whose values are made up of other values. e.g. BankAccount (balance, name, etc), Student (name, age, etc), etc.
- A Data Item is a value that is part of a type, e.g. the value “John” is a data item of type String; the value 100 is a data item of type Integer.
- A Data Type is a type together with a collection of operations that can be performed on values of that type. e.g. the type String has the operations length, substring, etc.
- An Abstract Data Type (ADT) is the software representation of a data type.
- It is a specification of the data type and its operations, but not the implementation of the operations.
- The implementation of the operations is left to the programmer according to the needs of the application.
- A Data Structure is an implementation of an ADT. Usually held in the main memory.
- Let’s study the
Integer
: - Type: Integer (Simple Type).
- Data Type: the mathematical concept of integers along with the operations of addition, subtraction, multiplication, division, etc.
- ADT: (in Java), the
int
type (limited representation of the mathematical concept of integers, since it is limited to 32 bits and can not hold all Integer values). - Hash tables and B-trees are both efficient in insertion, deletion, and exact-match queries. However, B-trees are more efficient in range queries. Therefore, if the application requires range queries, then B-trees are the preferred data structure. Otherwise, hash tables are always preferred.
- An Example of ADT is
Car
where all cars have the same operations (e.g. steer, accelerate, brake), while each car model implements those functions differently. - Data types have both logical and physical representations.
- The logical representation is the abstract data type.
- The physical representation is the data structure.
- Here are some examples of problems and their best data structure solutions:
Problem | Specification | Preferred Data Structure | Reason |
---|---|---|---|
Bank accounts | open/close, deposit/withdraw, balance | Hash table |
Allow fast exact-match queries. Finding records quickly => insertions/modifications are done quickly. |
Cities and towns database | get by name (exact match), search (range queries) | B-tree | Allow fast range queries. |
Design patterns¶
- Flyweight:
- When there are many identical objects, it is more efficient to store only one copy of the object and to use a reference to it.
- Example: the
1 space string
in a web document: instead of every empty space being a new object, a1-space-string flyweight
object is created and used by all empty spaces. - Used in: PR quad-tree - Visitor: - When an operation needs to be performed on all objects in a data structure, it is more efficient to use a visitor object which may be a general function, where we can pass the object as a parameter. - Example: in a Binary Tree, if we need a function that works on the leaves level; instead of adding the function to each leaf, we can create on visitor method in the tree class, and then pass the leaf as a parameter to the visitor method. - Used in: Tree traversal, and Graph traversal.
- Composite:
- When the data structure contains a collection of different objects and actions to be performed on them; instead of defining the method in the root class and checking the type of the object, we can define the method in the leaf class and override it in the composite class.
- Example: The
render
function in a web document:- We don’t define the render function in the
Document
object and check the type of the object (e.g. div, a, etc). - We can define the function in the
Element
class (or subclass) and override it in theDiv
andA
classes.
- We don’t define the render function in the
- Strategy: - With Strategy, we define the default behavior of functions and allow the user to pass their own implementation of the function as a parameter. - The passed parameters can be a function or a class
Problems, Algorithms, and Programs¶
- A Problem is a task that needs to be solved. and:
- It should not include any constraints on the solution.
- It should include constraints on the resources that the solution may use.
- A Function (in mathematics) is a matching between input (domain) and output (range).
- An Instance of a Problem is a particular problem with specific values for the input.
- An Algorithm is a method or process followed to solve a problem.
- If the problem is a function, then the algorithm is the implementation of the function.
- It must be correct, i.e. it must produce the correct output for every input.
- It must be composed of a series of concrete steps (clearly defined, doable, and consuming a finite amount of time).
- It must NOT include any ambiguity in determining the next step.
- It must have a finite number of steps.
- It must terminate in a finite amount of time.
- A Computer program is an instance or concrete representation of an algorithm in some programming language.
Mathematical Preliminaries¶
2.1 Sets and Relations¶
- A Set is a collection of indistinguishable members or elements (all are part of a greater population called the base type).
- Each element in the set is a primitive member of the base type or another set.
- Operations on set include Union, Intersection, Difference, Cartesian Product, Power Set, etc.
- A Power set is the set of all subsets of a given set. e.g. the power set of {1, 2, 3} is
{null, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}
. - Sets have no order, no duplicates, and no indexing.
- A Bag is a set that allows duplicates and is denoted by
[1, 2, 1, 3]
. - A Sequence is a set that allows duplicates and has an order and is denoted by
<1, 2, 1, 3>
. It is also called Tuple or Vector. - A Relation is a set of ordered pairs of elements from a given set. e.g. the relation
R = {<1, 2>, <2, 3>, <3, 1>}
is a relation on the set{1, 2, 3}
. - Define the properties of relations as follows, with R a binary relation over set S:
- Reflexive. if
aRa for all a ϵ S
. - Symmetric. if
aRb => bRa for all a, b ϵ S
. - AntiSymmetric. if
aRb and bRa => a = b for all a, b ϵ S
. - Transitive. if
aRb and bRc => aRc for all a, b, c ϵ S
. - An Equivalence relation is a relation that is reflexive, symmetric, and transitive which can be used to partition a set into equivalence classes.
- A Partition of a set S is a collection of subsets of S such that:
- The union of all the subsets is S.
- The intersection of any two subsets is the empty set (disjoint).
2.2 Miscellaneous Notation¶
- Factorial function written n! for n an integer greater than 0, is the product of the integers between 1 and n, inclusive. and
0! = 1
and can be defined recursively asn! = n * (n - 1)!
or using the formulan! ≈ (√(2πn)(n/e))ⁿ
. - A Permutation of a sequence S is one possible ordering of the elements of S.
- Example: the permutations of {1, 2, 3} are
{1, 2, 3}, {1, 3, 2}, {2, 1, 3}, {2, 3, 1}, {3, 1, 2}, {3, 2, 1}
. - The total number of permutations of a sequence of n elements is n!
- A Floor of a real number x is the largest integer less than or equal to x. e.g.
floor(3.14) = 3
andfloor(-3.14) = -4
. - A Ceiling of a real number x is the smallest integer greater than or equal to x. e.g.
ceiling(3.14) = 4
andceiling(-3.14) = -3
. - The Modulus operator returns the remainder of a division.
n mod m is the integer r such that n = q*m + r for q an integer, and |r| < |m|
. - A logarithm of base b for value y is the power to which b is raised to get y.
- if logb y = x then bx = y, and b^(logb y) = y.
2.6 Mathematical Proof Techniques¶
- Solving any problem has two distinct parts: the investigation and the argument.
- Three commonly used proof techniques: 1. Direct Proof (Deduction): a logical explanation of how the statement is true. 2. Proof by Contradiction: assume the statement is false and show that it leads to a contradiction, then conclude that the statement is true. 3. Proof by mathematical induction.
Readings¶
Algorithms
by Robert Sedgewick.Introduction to Algorithms: A Creative Approach
by Udi Manber.How to Solve It
by George P´olya: good for problem-solving techniques.The Origin of Consciousness in the Breakdown of the Bicameral Mind
by Julian Jaynes: Deep philosophy.The Practice of Programming
by Brian W. Kernighan and Rob Pike: Good for C programming.Design Patterns: Elements of Reusable Object-Oriented Software
by Gamma, Helm, Johnson, and Vlissides