Step-by-Step
Topic 2: Sorting & Organizing Your Data
2.1. Indexing a Data File
A database can be like a deck of cards, with hundreds or thousands of records shuffled indiscriminately.
But, just as a deck of cards can be sorted to make it trivial to find any specific card, so SharkBase provides a number of tools to help you sort and organize your data.
Consider the records in the EMPLOYEE data file, arranged by record number. This is not very useful if you want to look up an employee by name; for this, you would need the records ordered alphabetically. On the other hand, when you print mailing labels, you want the labels ordered by zip code
In this chapter you learn how to make a data file behave as if it were arranged in various different orders. The method used is called indexing.
The easiest way to understand indexing is through an example:
1>USE employee 1>LIST name, fname, addr, tel:no 1 Marek Joe 231 River Drive 206-566-7012 2 Balzer Joan 2407 E 38th Street 206-566-1212 3 Steiner Tom 114 North Pitt St. 206-596-0017 4 Rayme Pamela 42368 Wedgewood Dr. 206-566-1789 5 Poyner Roger 2757 Regency Road 206-403-1193 6 Wilson Robert 16255 Ventura Street 206-566-7701 1>INDEX ON name TO name 6 RECORDS READ 6 RECORDS INDEXED 1>LIST name, fname, addr, tel:no 2 Balzer Joan 2407 E 38th Street 206-566-1212 1 Marek Joe 231 River Drive 206-566-7012 5 Poyner Roger 2757 Regency Road 206-403-1193 4 Rayme Pamela 42368 Wedgewood Dr. 206-566-1789 3 Steiner Tom 114 North Pitt St. 206-596-0017 6 Wilson Robert 16255 Ventura Street 206-566-7701 1>INDEX ON tel:no TO telno 6 RECORDS READ 6 RECORDS INDEXED 1>LIST name, fname, addr, tel:no 5 Poyner Roger 2757 Regency Road 206-403-1193 2 Balzer Joan 2407 E 38th Street 206-566-1212 4 Rayme Pamela 42368 Wedgewood Dr. 206-566-1789 1 Marek Joe 231 River Drive 206-566-7012 6 Wilson Robert 16255 Ventura Street 206-566-7701 3 Steiner Tom 114 North Pitt St. 206-596-0017•
First, use the EMPLOYEE file. To remind you of what is in the file, list four fields from each record. Then use the new command, INDEX ON field name TO index file name:
1>INDEX ON name TO name
The command, of course, refers to the file in use: the EMPLOYEE file. It says: rearrange the data file so that it is ordered alphabetically by NAME. (NAME is of character type. You cannot index on a numeric field.)
INDEX is the command to rearrange the records. ON name instructs SharkBase that the records should be rearranged alphabetically by the field NAME. name is the key of the index. As a rule the key is a field, the key field, although you shall see examples of more complicated keys later in this Tutorial.
So far, everything is as one would have expected it. But then what is "TO name"?
Instead of rearranging all the records (that is, actually rearranging the cards to be in alphabetical order), make a new file, NAME.NDX. In NAME.NDX, the first part, NAME, comes from the last part of the command: TO name; the extension, NDX, indicates that NAME is an index file.
The index file NAME.NDX contains (in a skillfully arranged way) the following information:
Balzer 2 Marek 1 Poyner 5 Rayme 4 Steiner 3 Wilson 6
Now when you give the
1>LIST
command, SharkBase goes to NAME.NDX, decides that Balzer is the first record to be listed. NAME.NDX then tells SharkBase that Balzer is record 2 of the EMPLOYEE file; SharkBase goes to record 2 of EMPLOYEE, and displays it. Then SharkBase looks up in NAME.NDX what the next record is, and so on.
NAME.NDX acts like an index to the information in the EMPLOYEE file. Index files have an obvious advantage over an actual rearrangement of the records: you can have many index files. The command:
1>INDEX ON tel:no TO telno
makes a second index file: TELNO.NDX. This index file contains the following information:
5 Poyner Roger 2 Balzer Joan 4 Rayme Pamela 1 Marek Joe 6 Wilson Robert 3 Steiner Tom
Notice that using this index tile, you get a completely different order of the EMPLOYEE file.
Activating Index Files. When a data file is indexed, the index file is activated and may be used. If the index file already exists, you can activate it when you specify the file in use. Examples:
1>USE employee INDEX name or 1>USE employee INDEX telno
2.2. How to Name Index Files
In this chapter, you use the data file EMPLOYEE, and name the index file by the field on which you are indexing. (Did you notice that the field name is TEL:NO, but the index file is named TELNO? That's because ":" is not allowed in a file name.) In a production environment, this may not be a very sound practice. First, there may be many data files indexed on a field called NAME. Second, when you are finished with a data file and the associated index file(s), you may want to make a backup copy of the files. The index files should be named so as to make this process easy.
Use the following convention: the first few characters of the name of an index file should always be the same as the first characters of the name of the data file it indexes. Use the remaining characters to distinguish one index file from another. In fact, if there is only one index file, it may have the same name as the data file.
Examples:
1. If the EMPLOYEE file has only one index, call it EMPLOYEE. 2. If the EMPLOYEE file has three indexes: on NAME, on TEL:NO and on ZIP, call the index files: EMPLNAME, EMPLTEL, EMPLZIP.
In the second example, suppose that the SharkBase files are on drive C, and you want to back them up to drive A. Do this with the single operating system command:
C>COPY empl*.* a:
When you use an indexed file - that is, a data file with an index file - everything you learned in Chapter 1.8 has to be understood relative to the indexed order.
TOP and BOTTOM, now, refer to the first and last record in the indexed order. SKIP gives the next record in the indexed order. However, GO and # mean the same as before.
LIST and BROWSE work the same way as before, but the records are listed in the indexed order.
In the following conversation, the EMPLOYEE file is used first without indexing; then it is used with the index file NAME:
1>USE employee
1>? #
1.00
1>SKIP 3
1>SKIP 2
1>? #
6.00
1>G0 TOP
1>? #
1.00
1>USE employee INDEX name
1>? #
2.00
1>DISPLAY name
Balzer
1>SKIP 3
1>SKIP 2
1>? #
6.00
1>? name Wilson
1>GO TOP
1>? name
Balzer
How do you visualize indexed files? The easiest is to think of an indexed file as if the physical rearrangement took place. For instance, now the BOTTOM card (the last in the indexed order) is attached to the back of the filing cabinet and not the card with the largest record number.
Of course, if you have a data file with three index files, you have to think of it as three filing cabinets. However, if you edit a record in one, then - as if by magic - the change is carried out in all three filing cabinets at the same time!
You may bring a file and its index into use simultaneously with the USE file INDEX indexfile command, as in the previous conversation. If the file is already in use without an index, or with another index, it is preferable to open the index with the SET INDEX TO command. For example, the command:
1>SET INDEX TO employee
could replace the
1>USE employee INDEX name
command in the previous conversation.
The heart of any database management system is the capability of arranging records in such a way that they be easy and fast to find. SharkBase accomplishes this with indexed files and with the command FIND.
Try the command FIND with the EMPLOYEE data file. To facilitate the demonstration, please, change the index so it puts the records in order of the names in capital letters. You can do this quickly with the
1>INDEX ON UPPER(name) TO name
command. (This uses the upper-case function UPPER( )
1>USE employee INDEX name
1>FIND Balzer
1>DISP name Balzer
1>FIND W
1>DISP name
Wilson
1>FIND Ba
1>DISP name
Balzer
1>USE employee
1>FIND Balzer
27. File is not indexed.
1>USE employee INDEX name
1>FIND Balzer
1>DISP name
Balzer
Open the data file EMPLOYEE with the index NAME. The first command:
1>FIND Balzer
instructs SharkBase to find the first record (from the TOP record) whose field NAME matches "Balzer". SharkBase converts all commands to upper case, so SharkBase is looking for a match for BALZER; this is why we converted all keys to capital letters, even though the actual contents of the NAME field remains a mixture of capital and lower-case letters. SharkBase promptly finds record 2. The next command:
1>FIND W
illustrates that a complete match is not necessary. SharkBase produces the first record whose NAME field starts with "W'. The next command:
1>FIND ba
shows that SharkBase always starts looking from the top. You were at the bottom of the data file, still you got Balzer. And even though you used lower-case letters, it found the correct record.
Then again you use the EMPLOYEE file, but this time without an index. Now if you issue a FIND command it is not accepted. You get an error message: "27. File is not indexed." Then, you use EMPLOYEE with the index file NAME, and FIND works again.
The FIND command with an & can also be issued, enclosing what has to be found in quotation marks:
1>FIND &'W'
is the same as
1>FIND W
The FIND with & does not convert lower-case letters to upper case. So
1>FIND &'Ba' NO FIND
All the data file examples in this book are small; just a few records. The effectiveness of a database is due to the speed of FIND in larger data files. In a data file with 100,000 records, FIND locates a record in a fraction of a second!
First, you created the EMPLOYEE data file. Next, you appended six records. Lastly, you made two index files: NAME and TELNO.
How do you maintain these index files? That is, how do you ensure that commands such as APPEND and EDIT do not upset the order of the data file?
As an example, suppose that Joan Balzer got married, and her name is changed to Joan Tarak. EDIT record 2 in the EMPLOYEE file, change the NAME field from Balzer to Tarak. How will the NAME index file be told that now record 2 is no longer the TOP record but the last but one record? This readjustment of the index file is called maintaining the index file.
It is clear that any time you want to use an index file, say, NAME, you can issue the commands:
1>USE employee 1>INDEX ON name TO name
but this is clearly an inefficient way of maintaining the index files. In fact, if you do things right, the index files will always be maintained automatically by SharkBase.
This is what you do. Whenever you want to change anything in the EMPLOYEE file, call it out with the command:
1>USE employee INDEX name, telno
Now every APPEND, DELETE, EDIT, BROWSE (and other related commands not yet discussed) will automatically maintain the two index files.
The first of the two index files, NAME, will be in use, so FIND, TOP, BOTTOM, SKIP are all relative to NAME. TELNO will only be maintained. In this example, NAME is the master index file. If you wish to use TELNO and maintain NAME, of course, you use the command:
1>USE employee INDEX telno, name
You can list up to four index files in a USE command. It may happen that an index file gets out of kilter. For instance, by accident you APPEND to a file forgetting about an index file. In such a case, either simply do the INDEX command again:
1>USE employee 1>INDEX ON name TO name
or use the data file with the index, and give a REINDEX command:
1>USE employee INDEX name 1>REINDEX
The latter is preferable if you have an index file with a complicated key or if you want to reindex more than one index file.
2.3. Creating a New Index from an Existing One
When data comes to you with indexes made in another database language, like VP-Info, dBase, FoxPro, etc., their indexes will not be immediately usable by SharkBase. That's why SharkBase has the convenient INDEX FROM command, which allows you to create a new SharkBase/dBase index (NDX extension) directly without you're having to enter the index expression unless it cannot be understood by SharkBase.
Imagine that you have a file CUSTOMER and three index files CUST1, CUST2 and CUST2 created by VP-Info or dBase. All you have to do to make SharkBase indexes is:
1>USE customer 1>INDEX FROM custl TO custl 1>INDEX FROM cust2 TO cust2 1>INDEX FROM cust3 TO cust3
Clipper is slightly different with its unique NTX index type. Since SharkBase, dBase, FoxPro use a default NDX extension on indexes, and Clipper uses NTX, you do not have to specify any extensions.
2.4. Physically Sorting the Data
Indexing makes a data file appear as though it has been sorted in a specific order. There may be times when it is useful to sort the file into a new physical order. With SharkBase, this is very easy to accomplish:
1>USE employee 1>SORT ON name TO temp 1>USE temp 1>BROWSE;check new file to make sure it's OK 1>USE ;close the new file 1>DELETE FILE employee ;delete the old file 1>RENAME temp TO employee ;change name of new file to old name
The new command is SORT. Always SORT ON a field TO a new data file.
In actual practice, you should very seldom SORT. Indexing makes FIND possible, SORT does not. You can use the commands APPEND, DELETE, EDIT, and so on, and the data file stays correctly indexed. A sorted file may not stay sorted even after a single APPEND. Finally, indexing is faster than sorting. So why bother?
In some cases it may be argued that a sorted file is preferable. For instance, you may want to make a copy of a file for use by another; SORTing the file will make it easier for a novice to understand and use the data.