Mimir:Draft3 Chapter2

Back to Table of Contents

= Chapter 2: An Overview of Programming Languages = "It has been raining all day on my elephant skull and also elsewhere." - Diary of a genius, Salvador Dali

In this chapter you will learn...

 * ... About the generations of programming languages.
 * ... About programming language paradigms.
 * ... About compilers and interpreters.
 * ... About programming language variables.

Programming Language Generations
Our knowledge of computer programming has grown from generation to generation, each generation has built on the previous. The impact can be clearly seen in the advancements over previous generations of technologies. Every generation of programming languages faced certain challenges or downright failed to live up to expectations or needs. This provided the catalyst for change and given the rapidly changing technology landscape, new programming languages not only solved the challenges they faced, but also incorporated more powerful features which took advantage of the latest advancements in computing technology.

First Generation (1GL)
First Generation languages are Machine Code. These are sequences of binary instructions coded by the programmer as specific patterns of 1s and 0s, they are executed directly by a CPU and there is no compilation or interpretation of this code. The instruction consist of the lowest level tasks, they are hardware level tasks such as a jump, a copy of data from one register to another, a comparison of data, or an arithmetic function on data in a register. First generation languages are seldom used today for system programming. Yet, they still have their place among programmers who work on hardware drivers and device-direct commands since the resulting machine code is extremely fast and efficient due to the fact that it is executed immediately by the CPU. Subsequent higher level languages do not have this advantage. However, that represents the only real advantage of a first generation language. Their lack of readability prevent code errors from being easily found and fixed. Likewise, they are written specifically for a particular device and non-transferable. This prevents code from being portable from one CPU to another and a similar solution to the same problem must be written from ground up on two separate CPU architectures.

Second Generation (2GL)
Second Generation programming languages, also known as Assembly Languages, have a direct relationship to machine code, that is, one statement in assembly language translates to exactly one statement in machine language. The difference lies in their grammars. In assembly language, the code is much easier to read and write using labels and operations which carry readable and easily recognizable names. The collection of all the commands is called the Instruction Set. These consist of fairly intuitive mnemonics representing what the command does. In machine code, the only allowed values for a code statement are the 1's and 0's which are largely unreadable or at least not intuitive. The following example illustrates the difference in code statements in each language and how they would compare.

Machine Code: 01110001 10001111 01110010 Assembly Language: CPY 0x71 0x72

In this example, the command statement written in each language is a copy of memory contents from one location to another. Given only the machine code example, no one but the programmer would likely know what the statement was meant to do. Perhaps even the programmer would have to think twice. Contrasted with the same command written in assembly language, anyone can immediately recognize that some form of a copy command is being asked to execute because there is an abbreviated word, or mnemonic, which is better readable than just the binary numbers.

Second generation, or assembly, languages therefore introduce a layer of abstraction. This is good in that it aids readability but that also introduces an intermediary step in the source code to execution process. Although there is a one-to-one relationship between a second and first generation language line of code, a second generation language needs a pre-processing step which transforms the line of source code into machine code for the CPU to execute. This step is performed by a computer program called an assembler.

However, like first generation languages, the instruction set and specific grammars of a second generation programming language are CPU dependent. Programming on this type of platform requires detailed specification documents which include coding syntax, register names, the instruction set or operation mnemonics, and CPU electrical hardware information such as pin-outs, peripherals, etc. This is specific to and different in every CPU platform.

Example listing of assembly language source code (subroutine), generated by the NASM, an assembler for 32-bit Intel x86 CPU code Assembly Language:

See Code Sample

Third Generation (3GL)
Third Generation programming languages brought logical structure to software development and made the programming languages more programmer friendly. This was brought about by a next higher level of abstraction these languages offered and places them at a higher level then the previously discussed generations. Third generation languages offer a closer approach to natural humans speech and provided a much more efficient way to communicate what the code was intended to do.

Early examples of 3GL are Fortran, ALGOL, COBOL, first introduced in the late 1950's. These were later followed by C, C++, Java, BASIC, Pascal and others and even today, these later third generation languages are still widely used in all forms of software development from business applications to video games. 3GLs focus on structured programming and have no meaning for object encapsulation concepts. Later on, C++, Java, C# followed a different path than those originally mapped out by 3GL guidelines.

Third generation programming languages are characterized by a one-to-many mapping with assembly and machine language, i.e. one statement in a third generation language will translate to numerous statements in a first or second generation language. With the syntax of these new languages furthering themselves from the hardware level languages, there was a need for the source code to be translated in a more complex manner before it could be ready to be executed by the CPU. This characteristic led to the creation of compilers and interpreters. The role of the compiler and interpreter is to not only to translate the source code to machine code, but also to verify that the source code is syntactically correct based on the programming language's grammar.

Another key characteristic of a third generation language is its hardware independence. A specific compiler is still required to translate code written in 3GL down to a machine executable form but the grammar of the language itself is no longer dependent on any one CPU. Code is therefore portable across many platforms.

Fourth Generation (4GL)
The Fourth Generation of programming languages resulted as a new higher level of abstraction was created to generate a more natural flowing language which gave the equivalent of very complicated 3GL instructions but with few errors.

On early computing machines, it was common to use a few punched cards written in a 4GL languages to replace boxes of punched cards written in a 3GL language.

We can identify a few types of 4GLs:
 * Table-driven programming - the programmer uses control tables instead of code to define the logic. Example: PowerBuilder
 * Report-generator - generates a report, or a program to generate a report, based on data and report format. Examples: Oracle Reports, OpenOffice Base
 * Form-generator - manage online interactions or generate programs to do so
 * Data management - provide sophisticated commands for data manipulation/documentation, for statistical analysis and reporting. Examples: SAS, SPSS, Stata
 * Other - attempts to generate whole systems from CASE tool outputs have been made

Some engineering systems have been automated to use data flow diagrams, relationship diagrams and life history diagrams to generate an impressive number of lines of code as the final product.

4GL examples: DataFlex, FoxPro, IBM Rational EGL, LabView, Perl, Python, Ruby

A sample code written in a database access 4GL can be seen below: EXTRACT ALL CUSTOMERS WHERE "PREVIOUS PURCHASES" TOTAL MORE THAN $1000

Fifth Generation (5GL)
Fifth Generation programming languages are based on solving problems using constraints rather than an algorithm written by a programmer. Most logic programming languages are considered to be 5GLs. They are designed to make the computer solve a problem without the programmer and are used primarily in artificial intelligence research. Some examples of 5GL languages are: PROLOG, OPS5, Mercury. This section will restrict the discussion to PROLOG to illustrate the features of 5GL languages.

PROLOG is a declarative language, programs are created by entering facts and rules which govern the execution of the program. These facts and rules are called clauses. When a program is executed, some input is provided to the program and is queried against the clauses to produce a result. The following is an example of a simple program in PROLOG and the output it results when it is executed:

/* Clauses */ has(jack,apples). has(ann,plums). has(dan,money). fruit(apples). fruit(plums).

/* Sample queries */ ?- has(jack,_). yes ?- has(X,apples),has(Y,plums). X = jack Y = ann ?- has(X,apples),has(X,plums). no

In this example, first the clauses are declared against which the executing program queries against. The clauses are fairly arbitrary and just given as example. The "has" clauses indicates that a person has some item, which could read as: "Jack has apples", or "Dan has money". The "fruit" clauses indicate that apples and plums are considered a fruit or: "apples are fruit".

The sample queries show the PROLOG interpreter and how a user can query against the clauses. The first query asks whether Jack has anything, which based on the clause definition is true. Thus, the response would be, "yes". The second query asks to identify who has apples and who has plums. The answers are then returned as variables which are declared in the query whether they exist. In this case the answer is Jack and Ann. The last query asks if there is a singular person having both apples and plums as denoted by the X variable. Reviewing at the clauses, this is can be seen to not be true and the response is "no".

Obviously, this is an entirely different paradigm in programming. Only five clauses were provided. As would have been common in other language, there are no loops or any kind of variable initialization, etc. Given the initial set of clauses, numerous queries can be made which relate to the clauses and the answers are then provided.

Top of Page

A Few Words on Paradigms
What is a paradigm? A paradigm is a model or a pattern by which a programming language constructs its programs. It is a type of template. There are six main programming paradigms:


 * Imperative
 * Functional
 * Logic
 * Visual
 * Procedural
 * Object-oriented

Every programming language can be classified into at least one of these paradigm but sometime multiple paradigms apply depending on the features of the language. Most older programming languages are typically classified as having a single paradigm. Some of the newer languages can be classified as having multiple paradigms. Classifying programming languages using these paradigms helps in determining which languages may be appropriate for specific tasks or problems.

Compiler
A compiler translates, or compiles, a program written in a high-level programming language such as C, C++, or Java into the machine language which the CPU can run. This process of top-down abstraction is needed because attempting to write complex programs in machine language would be tedious and prone to committing natural human error. Furthermore, finding any such consequential errors in a program written in 0's and 1's would be again no less tedious situation. Thus people write programs in one of the high-level languages and then a compiler processes or translates that program into a form suitable for the machine architecture to read and to execute.

How does a Compiler Work? A better understanding of how a compiler works can be obtained by separating its overall task into phases and then further breaking down into the steps which each phase performs. The three main phases and the steps which occur are as follows:


 * The front end phase, where the first step consists of a lexical analysis, the second step is syntax analysis or parsing of the code and the third step is type checking.
 * The middle phase, where the intermediate code generation takes place.
 * The back end phase, where register allocations are assigned, the machine code is generated, and lastly, the assembly and linking occurs.

Conceptually these steps operate in sequence, each step (except the first) takes the output from the previous step as its input. Further reading here.

Below is a pictorial representation of a compiler:



Pros and Cons of a Compiler
Advantages
 * Bridges the semantic gap between modern high-level languages which usable programs would typically be written in and the low-level languages which the computer must have in order to execute the program.
 * Creates a program for the specific architecture which can execute efficiently.
 * Code can be optimized for most efficient execution on specific hardware.
 * Takes up less memory when the compiled program is executed.

Disadvantages
 * It may be necessary to develop and/or maintain several copies of source code to be compatible with different hardware platforms, for example when different peripheral sets are involved.
 * Potentially lengthy compile times due to large amounts of source code in projects

Interpreter
How Does an Interpreter Work?

An interpreter reads source code one instruction at a time, converts the current line into machine code and executes it, "live". That line of machine code is then discarded and the next line of source is read, thereby executing each line as it is converted.

This throughput is achieved by making the interpreter simulate a virtual machine using the base set of instructions of the programming language as its machine language. In other words, it is a high level machine which no longer reads just 0's and 1's but rather it reads the actual programming language itself. Another way to to look at this is to compare an interpreter to a program which implements a library containing the basic set of instructions of the programming language being interpreted in machine language. This is achieved by utilizing the interpreter's parsing abilities to translate the grammar which the user types in for the high-level language into a more abstract grammar which the machine can better utilize.

Below is a pictorial representation of an interpreter:



Pros and Cons of an Interpreter
Advantages
 * Easier for beginner programmers to become familiar with a language and start writing useful programs.
 * Flexibility to make changes at run-time and to execute new code immediately or interactively.
 * Faster to troubleshoot and fix issues since there is no need to save and re-compile the source code at every change.
 * Fairly portable across computers provided the interpreter software is installed on each new computer

Disadvantages
 * Slower execution time due to each line needing to be interpreted before it is executed. There is more background overhead.
 * Lack of an Integrated Development Environment to help manage code.
 * Not a good platform for commercial software as the source code is not hidden in any way.

Assembler
What is an Assembler?

An assembler is a program which reads in low level assembly language source code file and then translates that source code into the equivalent machine code for the specific processor to be able to read and run.

For example, one can write a line of code to execute a subroutine (i.e. assembly equivalent of a C/C++ function or an OOP method). The instruction "JSR $0F5A" might get assembled or translated to "1001101110" or the like. Every instruction would have a unique translation from mnemonic to binary code and these here are, of course, fabricated values only severing to illustrate the point.

Parsing
Parsing is an essential first step for both Interpreters and Compilers. Parsing is how text is extracted from files and translated into sensible commands. This also means that it is absolutely necessary to clearly define the grammar of a programming language to be able to parse or to translate into unique commands. Hence the general purpose of the parser but they also perform some additional tasks. An additional common parser feature is the ability to check for correct syntax and to flag errors.

Grammars
Context Free Grammars or CFG is grammar which is made up of a series of terminals and non terminals. Non terminals are denoted by a string placed in angle brackets like. Since non terminals are not used in terminating a sequence of execution, they can be recursively used if the grammar permits it. Terminals on the other hand are actual symbols and the most common example is the semi colon. They cannot be further expanded and serve a sole purpose to for a stopping point, hence the name, 'terminal'.

Below is an example of Backus Naur Form or BNF grammar. A production rule is how the grammar usage is defined. In BNF, a production rule states the form as having at least one non terminal on the left hand side but there can be any number of non terminals and/or terminals on the right hand side.

Note to reader: 'code' is a recognized command in wiki-dom and in order to avoid a wacky wiki, 'code' has been replaced with 'cde'.

::= ;   ::=   ;    ::= ;     ::= + | = | - | ::= |    ::= number ...    ::= ... ::= ;   ::= ;  ::= { }  ::= if ;  ::= if ;

This shows that code can be followed by a single statement or of more code. Every statement is terminated with a semi colon, as is the case in most programming languages. If a statement were equal a variable, then that variable traces back to a value because of the grammar rule stating that variables must have an identifier followed by a value. Due to way BNF is constructed, a programming language written with this grammar is visually independent of what the parsing requires. This means indents, spaces, or any other readability formatting to visually change appearances will not actually affect how anything works. This makes code visually easier to read.

Summary of Paradigms with Examples of Each

 * Imperative - language is a statement driven.
 * Perl, MS-DOS batch, csh/bash
 * Functional – language based on functions
 * LISP (times (plus 1 2) 3)
 * Logic – language that uses logic to perform computations
 * Prolog : A cheetah is a mammal, a mammal is an animal, is cheetah an animal?
 * Visual – language that uses graphics to express code/algorithm
 * AppInventor, scratch, snop
 * Procedural – language groups statements of code into a procedures (aka subroutines& functions)
 * C, Pascal, fortune BASIC
 * Object-Oriented – language organizes code into classes to encapsulate information
 * C++, Java, PHP
 * Multi paradigm – language that supports more than one paradigm
 * C++, Java, PHP, Perl, Python, C#, JavaScript

Variable Types
In computer programming, types or datatypes help classify one of the many different types of values a variable can hold. Some common datatypes include boolean (true or false), string, and integer. In a typed language such as Java, variables must have a type associated with them. The type of the variable is important because it determines a variety of characteristics of that specific variable such as:
 * Possible values which can be assigned
 * The actual meaning of the variable
 * The allowable operations with variables of a certain datatype

For example, some Java code below:

int count = 10;

This line of code says there is a variable called  and it is of type   which in Java and many other languages is short for Integer. The Integer type ONLY allows values which are whole numbers like: -5, 0, 5, 10, 12, 20 etc. Writing a Java program such as:   after attempting to compile, there will be a syntactical error and the compile will fail. The failure will be caused because the type of this variable  was specified as an.

Why are data types important? In computer memory, data as well as information about programs are stored. This memory is made up of binary digits known as bits and are stored as either a 0 or 1. Computers then access these bits not as individual bits, but rather as a collection of bits usually consisting of 8, 16, 32 or 64 bit aggregates. The aggregate bit collection length on a machine is determined by how many simultaneous operations the main CPU can process. A general rule of thumb is that the larger the number of bits a machine has, the larger the amount of data it can process.

With context of memory established, the next step is to look at how data is stored. Data, like a variable which might have the value of 10, is stored in memory at an actual physical location. Such a physical location is known as an address. It is difficult for us to know exactly where this location is or how to access it and this is why we assign a variable to this address. Variables are a much easier way of reserving a spot in memory to hold information on the program. Having to remember random numbers would make it a much more painful process.

When the datatype of a variable is specified, for example an, an actual block of memory is set aside for the data. The number of bytes which can be reserved for a data type can vary based on which programming language is in use, it varies from language to language. As mentioned above, the variable used is of type. When executed, this line of code tells the computer exactly how much memory to reserve because of the data type specified:.

Strong vs Weak Typed Languages
Revisiting the differences between a strong versus weakly typed language, strongly typed variables, such as, integers and strings have already been mentioned. When using a language with strong types, variable are known to have specific characteristics and those characteristics cannot change. Conversely, with weakly typed languages, the type and characteristics are determined by how it is used. For example, looking at the following expression: a = 5; b = "10"; c = a + b;

Depending on the language, the value of  will be 5 + 10 if the code can interpret "10" as the integer value 10.

We could also make the  variable equal to   and the code will convert the string value to the corresponding ASCII values it represents.

If this were attempted in a strongly typed language, the compiler would raise an error since any string value assigned to an integer typed variable is not allowed.

There is no better, there are pros and cons in each. For strongly typed languages, the programmer is forced to create the behavior of the program explicitly. This excludes the possibility of "hidden" behavior. Later on, some other programmer could be modifying legacy code. There would be no issue that the parameter names were not descriptive enough causing confusion what type of variable they might be working with.

For weakly typed languages, the advantage comes with the writing of less code. Also, it might execute faster because there is no overhead for processing involved in dealing with the unique data types a strongly typed language would have to.

Dynamic vs Static Typed Language
Dynamic typing and static typing can cause confusion among programmers. Below explanation will define in detail each one and also include an example snippet of code.

Dynamic typing
A dynamic type programming language is one that the type is interpreted at runtime. The pros of this type of language is that one can write less code quicker as they don't have to specify the type of each variable they use. The downside of this is when one has to error check. Because of the type being computed at runtime, there's no type-checking before hand so in order to test, a programmer must run the program and clean-up afterwards. Common languages that are dynamically typed include Python and PHP. Below is some Python code:

firstName = "Joshua" firstName = 10;

We have defined the variable name  as a string value "Joshua" - then right after that we changed the value to an integer 10. In a dynamically typed language, this would run perfectly fine and the value (until you change it again) of  would be 10.

Statically typing
In a statically typed language, the type is computed at compile time instead of runtime. If when compiled and there are no errors, then the code is guaranteed to work, syntactically. One of the many pros of a static typed language includes the speed at which a programmer can fix bugs and type mismatches because of this precision. A downside of course is this requires writing more code and making sure before compiling that all variables are typed properly. Popular programming languages that are statically typed include C, C++, Java, and C#.

// Example from above. firstName = "Joshua" firstName = 10;

string firstName = "Joshua"; int age = 22;

Above I put the example we had for a dynamically typed language and below that an example showing how it would be written correctly in Java, a statically typed language. This is a statically typed language because all variable names AND their types must be explicitly declared. If we attempted to assign the value of  to , we would get an error at compile time telling us that it cannot evaluate an integer to a string.

Primitive data types
Primitive data types are mostly found in statically typed languages such as Java. Like we said above, this means that all variable names and type must be declared explicitly in order to pass the type-check at compile time. Below is a table of the more common primitive types in statically typed languages.

Complex data types
When thinking of a common complex data type, the array has to be most popular. A complex datatype is any type that holds multiple independent values. An array is a complex data type because it is one object made of up a number of indepedent values. An example in Java:

int songs[10];

This statement is saying we want to define an array variable named  and to set aside 10 integer values in memory. It's important to note here that we did NOT initialize any of those 10 integer values inside the array, but rather just allocated the space we want to in memory.

Complex data types can also be types that you define yourself as the programmer. In Object-oriented based languages, we can create a new class which will have properties and functions inside it that actually define what the class is. For example we have this Java code:

// Creating the class Car. public class Car { private int year; private String make; private String model;

.	.	.

public setYear(int y) { year = y;	}

public getYear { return year; }

.	.	. }

// Using the Car class. Car myCar = new Car; myCar.setYear(2013); System.out.println(myCar.getYear);

This code above is creating a new class titled  which has 3 properties (year, make, model) and what I showed, 2 methods. Of course there would be "setters" and "getters" for the other properties as well. However, we now can use this Class and create a new object, or variable, of type. Once the line  compiles, we now have initialized a new object of type   and can use the public properties/methods inside that class throughout the program.

One can see exactly why the difference between a primitive data type and a complex data type is significantly different.

Variable Scope
The scope of a variable is what defines the availability of it during the executing of the program. Some programmers say that the scope of the variable is actually a property in it of itself. In other words, a variables scope is the area of a program where that specific variable has meaning and thus being visible to the code within that scope.

In most programming languages, there are different levels of scope - Global, Parametric, and Local.
 * Global variables are commonly known as variables with an indefinite scope, or visible to the entire program. Programming languages handle a global variable differently. For instance, many languages like C or C++ there is no actual  identifier, however if there is a variable defined outside a function, then that variable is treated as having "file scope" which means it's visible everywhere in the file. However in PHP, there is an actual   keyword you can place in front of defining a variable, and then you can use that variable anywhere that file is included in. For example:

// config.php 

// index.php 

In those two snippets of code, we are defining a GLOBAL variable $SITE_NAME in a file named config.php. Then in a completely different file, index.php, we are including that config.php file and using the global variable we defined.

Global variables are often viewed as bad practice because it can create confusing, and more error-prone code. Code in a more larger project can be both read and more importantly maintained when the scope of variables are limited to their specific code block or function. Using global variables can cause headaches because in any part of a program, they can be modified (if their not memory protected which some languages provide), which makes it difficult to remember exactly their intended use. Below we will introduce better ways to manage your variables within a program.


 * Local variables are ones that can be accessed and used within certain functions; it is given local scope to wherever it may be. For example, say we have a block of code like below:



When we try to run this PHP code, we will get an error. That is because we are trying to echo a variable that is NOT in scope.

In our function, we defined a variable. Then directly after the  which closes that scope, we went ahead and echo'd out that same variable $a. This is wrong because the PHP file doesn't have access to LOCAL variables inside functions.


 * Parametric variables can be accessed inside a method or function which accepts it as an argument. This is how you can solve the problem of using global variables in a program - by passing values to functions instead of creating global variables so that function can use them.



This snippet above demonstrates the power and simplicity of a parametric variable. We have a class which has a private property  ... the only way to set this value would be to create a custom setter function and pass that function a value we'd like to use. In the function  we specified arguments of   and   which we will accept in the function as parametric scoped variables - they can only be accessed within this function.

Using this concept, we can create a program that is harmonious because we always know what the variables are, and their intention. If we put  in a global variable, we may never know how that variable is even defined, never mind where. Using it this way, we are complete control of where that value is set and who has access to it.

Summary
In this chapter we have discussed the evolution of programming languages through its generations from most basic machine code to the more human readable forms. We introduced an overview and gave some examples of the existing programming paradigms and how they might be relevant to a particular situation. A few programming language characteristics were also discussed with regards to variables and how they can be represented.

Key Terms

 * Paradigm: Paradigm is a model or a pattern by which a programming language constructs its programs. There are six main programming paradigms: imperative, functional, logic, visual, procedural, and object-oriented. Most of older languages are based on one of the paradigms, modern languages on the other hand are developed to include multiple paradigms.
 * Variable: In computer programming, a variable or scalar is a storage location paired with an associated symbolic name (an identifier), which contains some known or unknown quantity or information referred to as a value. The variable name is the usual way to reference the stored value; this separation of name and content allows the name to be used independently of the exact information it represents.
 * Dynamic language: A term used in computer science to describe a class of high-level programming languages which, at runtime, execute many common programming behaviors that static programming languages perform during compilation. These behaviors could include extension of the program, by adding new code, by extending objects and definitions, or by modifying the type system. These behaviors can be emulated in nearly any language of sufficient complexity, but dynamic languages provide direct tools to make use of them. Many of these features were first implemented as native features in the Lisp programming language.
 * Static Typed Language: Static typed programming languages are those in which variables need not be defined before they’re used. This implies that static typing has to do with the explicit declaration (or initialization) of variables before they’re employed. Java is an example of a static typed language; C and C++ are also static typed languages. Note that in C (and C++ also), variables can be cast into other types, but they don’t get converted; you just read them assuming they are another type.
 * Data type: In computer science and computer programming, a data type or simply type is a classification identifying one of various types of data, such as real, integer or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored.
 * Variable Scope: The scope of a variable is what defines the availability of it during the executing of the program. Some programmers say that the scope of the variable is actually a property in it of itself. In other words, a variables scope is the area of a program where that specific variable has meaning and thus being visible to the code within that scope.

Problem Sets
A list of practice problems

Top of Page - Prev Chapter -  Next Chapter