In the previous post in this series, we saw the limitations of Microsoft excel files for storing our data. Storing of billions of pieces of data is a real challenge. Retrieving relevant data from this heap of huge amount of data is even more challenging. The root causes of problems encountered for these purposes is the sheer size of amount of data creation and later retrieving of pieces of data. Let us find out how relational databases solve this problem.

In relational databases, you can create many database tables. Each table can have many columns. You can put the most atomized data in these columns. Each table itself can hold thousands of pieces of data. But a table can not hold billions of pieces of data. This is because of the same problem we have seen as in a Microsoft excel file. In relational databases, you can create joins among tables. These joins allow the data stored in those tables to have one-to-many or many-to-many or any other kind of relationships. Due to these relationships you can store billions of pieces of data in those tables. All this data will be related to each other and so retrieval of any piece of data becomes easy. Since data is stored in many tables, retrieval or searching of data will be fast.

Relationships among tables is created through primary and foreign keys. These keys ensure that data integrity and concurrency is intact even when some data in one of the tables is either changed or deleted.

Data once written inside a relational database is extremely safe. A piece of data will only change or get deleted when it is done explicitly. Generally data manipulation tasks for a database which is part of a software product are carried out through the business logic written for software products.

Old data in a database is archived safely on data disks. Data can be archived for many years and can be easily retrieved when required.

In the first part of this series, we saw that atomizing data is powerful. We saw that we can do many things with our data saved in a Microsoft excel file. But if a Microsoft excel file can be linked to some other Microsoft excel file so that we can establish relationships among data saved in those files? If we can do that then we will be able to solve one problem. Let us discuss the issues in this post.

Modern software products are built to have thousands and even millions of users. Naturally such systems will generate billions of pieces of data on a daily basis. All this generated pieces of data needs to be stored in a database. At the same time all this data should be available easily and quickly for many purposes. For example these pieces of data are needed to generate reports for various purposes. Similarly many people need to have a quick search facility to find out things like how many customer orders contain same product orders over a specific period of time so that back office work can be carried out smoothly. Or if a product part exists in the store so that it can be used in the assembly department.

Coming back to our discussion on saved data in a Microsoft excel file. If we can save all our generated data (which could be in tune of billions of pieces of data) in an excel file? Not possible because an excel file can store not more than say 50,000 pieces of data. Opening and using the excel file containing large amount of data will be also extremely slow. Saving all these pieces of data in many excel files also poses problems. It is difficult to relate data which are saved over many excel files. Essentially this means that excel files are not relational.

The best solution to store large amounts of data is the relational databases.

We will be learning about relational databases in our next post.


We have worked with relational databases but we do not know what the term “relational” means. We also do not know what is the significance of this “relational” thing here. Let us find out.

There are many types of databases being used in the software industry. Some of these types include relational databases, file databases, NoSQL databases, object relational databases etc. Even though there are many differences among these types of databases; nevertheless there is one crucial difference which separates relational databases from all other types of databases. Let us understand it.

You use Microsoft word in your work everyday. A Microsoft word document is a file which stores your data. Microsoft Excel is also used to store your data. The difference between data stored in a Microsoft excel file and a Microsoft word document is that in Microsoft excel file; data is stored in a more structured format compared to data stored in a Microsoft word file. You can do a lot of things with the data stored in a Microsoft excel file. You can summarize, add, sort, pivot etc with your data. This is possible because data stored in a Microsoft excel file allows to store data in a more atomic way. Data stored in each field in a Microsoft excel file is separate from data stored in other fields in the same excel document. This is why it is possible to do many operations on only specific fields in an Microsoft excel file. Any of these operations do not apply to other fields in the same excel file. This is because, data in an Microsoft excel file is stored in atomic form. This is also the reason why it is not possible do such things in a Microsoft word file because data in such files are not stored in atomic form.

Thus we can learn from this example that the first thing required to be a good database is the ability to store data in most atomic way.

We will learn more about relational databases in our next post.

Posted by: ahmedashfaque | January 25, 2016

Why software programming is difficult to automate?

Software products help to automate many manual tasks. For example, if you want to take a customer order then you write down the order details on a piece of paper. Then you pass this information to the production department. The production department then finds out what raw materials will be needed and what production processes will be employed to produce the required order items. The production department also aggregates many orders so that production runs can be scheduled.

Using manual processes (pieces of paper) to do everything results in many problems. For example going through all the orders and segregating order items from each of those orders may be error prone and the production run scheduling may thus be faulty. Doing all the paper work is also laborious. If during the entire process of information processing, something goes wrong then tracing back is also difficult.

Due to these reasons, software systems have been built which help in doing all the information processing. Software systems have become extremely useful for many kinds of information processing.

Now the question is: if software products help in automating things then why building software products itself can not be automated?

The simple answer is that software products are the results of innovative thinking. If for resolving a problem, there is already a software product available; then building another software product is simply not needed. But if people find it difficult to do some manual work and no software product currently exists for automating this work then some software product can be thought of which can provide a solution to this problem. This implies that building software products always requires innovative thinking. Present day computers are not capable of innovative thinking.

Even though artificial intelligence field is emerging fast, it is still not possible to take help from this field in automating the process of building software products.

Posted by: ahmedashfaque | January 23, 2016

databases and software engineering

Most software engineering books do not include a discussion on databases. If this approach is good? Let us find out.

Software engineering is the process which helps in building software products. For building a software product, first of all you need to have software requirement specification. These software requirement specifications specify what the software product will do. Based on these software requirement specification, software engineers create a design of the software product. This software design is implemented by writing source code. The executable machine code from this source code is your software product. This software product is tested to find out if it has any defects. If any defects are found out then they are fixed. Fully tested software product can then be used by users.

We can see that software engineering includes the processes of software requirement specification building, designing the software product, implementing the software design and finally testing the software product. Now what about databases? Most software products use databases. Databases are specialized area and a separate computer science discipline can be devoted entirely to study database engine design.

However since software products use databases, you need to create database design in form of Entity Relationship diagrams. This database design does not include database engine design. It concerns creating database schemas, database tables, indexes etc. Once you have these required database objects and entities created then you can connect the database to your business logic and your software product will be able to use the database.

Since databases are integral part of most of software products being built today, I strongly feel that database programming and database design should be always included in software engineering courses.

Posted by: ahmedashfaque | January 21, 2016

Software design and data structures

When database programming is done, we need to create or manipulate or view data stored inside a database. From programming side, we use objects like resultsets and statements to view, edit or create data in the database. These objects are available in many programming languages like Java and provide special facilities for operations for database programming.

One such facility is the ability to provide a data structure which can make it easier to store data values stored in a single record in a database table. This means that the resultset will have the same structure as the database table. A resultset will have same number of columns as the database table. The datatype of each column will also be same as the defined in the database table. The size of data will also be same as the database table. This kind of facility makes it very easy to do any kind of database programming.

So far so good. But what about business logic implementation? How the business logic is implemented in the system and how this business logic implementation is related to these database programming objects like resultsets and statements?

Business logic implementation is in fact separate from database programming. The business logic implementation involves creating objects (from classes) and managing the computations through defining various methods and using the interaction among objects to do all the computation.

Once all this computation is done then the results of these computations need to be passed on to the database. It is exactly at this place that resultsets and statements objects are used.

For beginners, it may not seem obvious but experienced people know how to do database programming.



Posted by: ahmedashfaque | January 17, 2016

Why software testing is important?

Sometime back I was working on a project where interest rate needed to be computed for sold goods. The software developers wrote pieces of code for doing tax calculations. When unit testing was performed on that piece of code, it was found that the calculations were coming wrong.

After investigation it was found that though the tax formula used was correct, the error was in wrong data type of the variable used to store tax values coming from the method which used to compute tax amount. For example, suppose the tax rate was 7.5% for a an article which had a selling price of USD 25. The correct tax amount for this article will come at USD 1.875. But it was coming as USD 1.75.

It was due to the fact that the variable which was storing tax rate values (e.g. 7.5%) was defined as an integer data type which could not store decimal places. So it was storing only 7% and the 0.5% part was getting truncated. Thus the tax amount was coming as USD 1.75 instead of USD 1.875.

Thus even though the tax formula used was correct and most of the things were implemented correctly but still wrong tax calculations were coming out because of the wrong data type.

In eXtreme Programming, test driven development approach is used. Testing of business logic is done even before writing source code to implement business logic. This approach traps all kinds of errors which could creep in the source code.

Definitely this approach is the best approach for developing any software product.

Posted by: ahmedashfaque | January 16, 2016

How your software design can go wrong?

Recently I was working on a project. Some great software requirement documents were made. The software design was also looking great. So the project team worked and implemented the software design by writing the source code. The implemented software product was looking great.

The software testing team then started doing system testing. It was then that a major software defect got caught. One product feature was to delete account of any user from the system. But while testing it was discovered that this product feature was not working.

An investigation was made to find out the problem. It was then that it was discovered that it was happening due to faulty database design. Actually there were some child records related to users in some other database table. Due to presence of child records, the database was not allowing to delete master records related to user accounts.

This meant that the database design had to be changed. Now the foreign key definition was changed to include a clause which will delete child records when a master record was deleted from the database. After the database design was changed, the system was tested again. This time, the system worked fine and the defect was removed successfully.

No matter how good a software design looks, some unforeseen design problems are encountered on software projects. Even most experienced software designers make such mistakes.

Posted by: ahmedashfaque | January 2, 2016

source code for software product uploaded on this site

I have uploaded following material on the software engineering page:

  1. Complete source code for the OBAAS system (both version 1.1 and 1.2). Installation instruction is also provided.
  2. Teaching slides for my book “Foundations of software engineering”.
  3. Video Lecture series on a course on software engineering based on my book “Foundations of software engineering”.
  4. Videos on how to build a software product which is based on the case study provided in my book “Foundations of software engineering”.

All this material is a complete reference for learning software engineering. Anybody can learn software engineering from the material I have provided.


Posted by: ahmedashfaque | December 31, 2015

What is the product and what is the process?

When people work on projects, they often forget about the product as they get more involved with issues which deal with what things need to be done on the project. For example, it becomes difficult for people to realize that they are actually working to create a part of a software product in the huddle and muddle of project activities. Often they are given deadlines by the project manager to do some project work but it is not obvious to them as to how this project work is related to building the software product.

In today’s world of business, product quality is extremely important. To achieve high product quality, industry standards have been defined (e.g. Capability maturity model, ISO etc.). These standards define as to how to carry out processes on projects which will result in better product quality. Thus they stress on process quality to achieve product quality. For example, a good process adopted to create a software design will lead to have better software designs. Similarly a better process standard adopted for writing source code will lead to better quality of the software product as written source code will have less software defects.

These imposed process standards impede freedom of project teams to do things their own way. They are no longer allowed to write source code in an unstructured manner. They must write source code in a structured manner as defined by the process standard adopted for the project. For example you are not allowed to create a class for your convenience. You must use an existing class even though you may need to adopt this class as per your needs. Even though this may impeded your freedom but doing this way will lead to a better piece of source code with less chances of software defects.

This is the real issue when team members work on projects. Rigid process requirements may impede freedom of creativity. But it ensures a better product quality.


Older Posts »



Get every new post delivered to your Inbox.

Join 159,220 other followers