Choosing Methods for Sorting and Searching
In the literature you will see comparisons of the scalability of sorting and searching algorithms as the number of items in the list becomes large. Order of growth is usually expressed in Big O notation. For example, an algorithm with growth O(n log n) will be much faster for large lists (with a high value of n) than an algorithm with O(n2).
We tabulate below typical orders quoted for sorting and searching algorithms.
|Tree sort||O(n log n)|
|Quicksort||O(n log n)|
|Binary Search||O(log n)|
|Binary Search Tree||O(log n)|
Relevant questions you should consider when choosing a sorting method include the following.
- How many items need to be sorted?
- What is the state of the initial data set? Is it nearly sorted?
- How crucial is the sort time? Is there a time that must not be exceeded?
- How does the sort time depend on the number of items in (i) the best case (ii) the average case and (iii) the worst case.
- What is the size of the records? Do you need to use pointers to avoid shifting large records?
- Are duplicate values present?
- What is the storage medium of the data?
- How will you search the data? (You will sort using a tree sort if you are going to search using a BST).
- How much additional memory is needed for the sort?
- Is a combination of methods likely to be more efficient than a single one?
In general, you can use a bubble sort or insertion sort for small lists. For large lists, choose tree sort or quicksort.
Relevant questions to be considered when choosing a searching method have much in common with those for sorting and include the following.
- How many items are in the list to be searched?
- How crucial is the search time? Is there a time that must not be exceeded?
- How does the search time depend on the number of items in (i) the best case (ii) the average case and (iii) the worst case.
- What is the state of the initial data set? If it is not sorted, you cannot use a binary search or a binary search tree.
- Are duplicate values present? Do you need to report each occurrence?
- What is the storage medium of the data? Is random access possible?
- How has the data be sorted? (Search using a BST if you have sorted using a tree insertion sort).
In general, if you only have a small number of items in your list, a linear/sequential search is adequate and does not require the items to be sorted. For large lists, binary search and the binary search tree are much faster. If speed of searching is of the utmost importance, you should use a hash search.