JA8. Searching and Indexing¶
Statement¶
The Learning Journal is a tool for self-reflection on the learning process. In addition to completing directed tasks, you should use the Learning Journal to document your activities, record problems you may have encountered and to draft answers for Discussion Forums and Assignments.
Your learning journal entry must be a reflective statement that considers the following questions¶
1. Describe what you did. This does not mean that you copy and paste from what you have posted or the assignments you have prepared. You need to describe what you did and how you did it¶
This was the final week of this course, we learned about searching and indexing. I started the week on Sunday by reading chapter 9 of the Shaffer’s book, but that took me long time; I consumed a few hours on Monday to reading chapter 10, then I did the discussion forum and I’m doing the journal now. Since this week is the last week of the course, I tried the review quiz as well and it is not easy; I have to retry it again before the final exam on Saturday.
2. Describe your reactions to what you did¶
The searching problem is interesting, and the indexing is even more; however, they present a difficult issues where people have been trying solutions for decades. I do feel comfortable implementing these solutions from scratch, but I have different feeling when I’m doing database queries or searches.
3. Describe any feedback you received or any specific interactions you had. Discuss how they were helpful¶
There was no written assignment this week, and the journal feedback was good; but I noticed the Zipf distribution question in the forum go to mathematics, rather than computer science. I think students should thought more about data structures and algorithms, rather than mathematics in their answers.
4. Describe your feelings and attitudes¶
The searching is one of most important and difficult problems in computer science; indexing is a solution to the searching problem on big on-disk data sets. With every week, I feel more confident in my understanding about databases; although, the course is meant for databases, but we don’t face data structures on daily basis; but we use databases every day, so everything I learn just takes me to database realm.
5. Describe what you learned¶
The week starts with sequential searching of normal lists, then it suggests the self-organizing lists to improve that search putting the 80/20 in mind. The rule says that the first 20% of elements will be requested 80% of the time, so if we put them at the beginning of the list and do sequential search on the list has a practical usefulness.
Then we discussed bitmaps and hashing; including open hashing (using linked lists to store hashed elements in the table slots), or closed hashing (using linear probing to store hashed elements in the table slots). The closed hashing is more space efficient, but it has a problem of clustering, which is solved by double hashing.
Indexing was the next topic where we discussed linear indexing (two level linear indexing, two dimensional arrays indexing, and inverted lists), ISAM, and Tree-based indexing (2-3 trees, B-trees, B+-trees, and B*-trees).
6. What surprised me or caused me to wonder?¶
I was surprised with the number of possible tree structures: binary trees, binary search trees, k-ary trees, 2-3 trees, B-trees, B+-trees, B*-trees, and so on. You just change some specifications and tree rules to get a new tree structure.
I was also surprised with the number of possible hashing functions, and the number of possible ways to implement the hashing functions as it is critical that the hashing function returns the same hash when searching for the same key again.
7. What happened that felt particularly challenging? Why was it challenging to me?¶
Understanding the operations on the trees that used in indexing was challenging as it is more complex than doing the same operations on the BST; I think I need to practice more on these operations to get more comfortable with them.
8. What skills and knowledge do I recognize that I am gaining?¶
As mentioned, with every week my understanding to database servers and engines and how data stored internally within a database is sharpened. I’m also gaining more knowledge about the data structures and formal definitions of common problems like searching and indexing.
9. What am I realizing about myself as a learner?¶
I leaned things in stages or iterations; where I have to visit the same topic again later with different goal or different perspective. I think this is a good thing as it helps me to understand the topic better and deeper, but it is more time consuming.
10. In what ways am I able to apply the ideas and concepts gained to my own experience?¶
I’m planning to do a side personal project to implement a simple search engine that scrapes some data from the web and index it in a database. I think this will help me to understand the concepts better and deeper. I’m think I’ll use most of the concepts we learned in this course in this project, but I’ll use typescript instead of Java as I’m more comfortable with it.
11. Finally, describe one important thing that you are thinking about in relation to the activity¶
Finally, thank you very much for being our instructor for this course; the course was very interesting and challenging, but the Jeliot tool is a bit too old and it does not work well; I hope you can find a better tool for the next offering of this course. There was very little coding assignments in the course, I think putting more coding assignments will help students to understand the concepts better and deeper. The discussion forum questions were good as they emphasized on the theoretical knowledge, but I would suggest to use also coding questions in the discussion forum and the learning journals to help students to understand the concepts better and deeper.
References¶
- Shaffer, C. (2011). A Practical Introduction to Data Structures and Algorithm Analysis. Blacksburg: Virginia. Tech. https://people.cs.vt.edu/shaffer/Book/JAVA3e20130328.pdf. Chapter 9: Searching & Chapter 10: Indexing.