Mimir:Draft5 Chapter2

From Openitware
Jump to: navigation, search

Back to Main Textbook Index



Contents

Chapter 2: An Overview of Programming Language

“It’s ridiculous to live 100 years and only be able to remember 30 million bytes. You know, less than a compact disc. The human condition is really becoming more obsolete every minute.” -Marvin Minsky

What You Should Learn In This Chapter:

  • What are the different generations of programming languages?
  • What are the programming language paradigms?
  • What is a compiler?
  • What is an interpreter?
  • What are programming language grammars?
  • What are the basic programming language concepts?
    • What are statements, including conditionals and loops?
    • What are variables, typed languages, and other data types?
    • What is control flow?
    • What are expressions?
    • What are operators, and the order of operations?

Programming Language Generations

Our knowledge of computer programming has grown through levels of sophistication that are referred to as generations. Each generation has built on the generation before it with improvements. The impact can clearly be seen in the advancements over previous generations of technologies. Every generation of programming languages has faced challenges, or have failed to live up to expectations or needs of its programmers. This has provided the model for changes, and has given the rapidly changing technology landscape new programming languages that not only solved difficult challenges, but also incorporated more powerful features that from the latest advancements in computing technology1.

Classification-of-Programming-Languages.png

First Generation (1GL)

First Generation languages are machine code. These are sequences of binary instructions coded by the programmer as specific patterns of 1's and 0's that are executed directly by a CPU, and there is no compilation or interpretation of this code. The instructions consist of the lowest level of tasks, called hardware level tasks, such as a jump to a different register, or to copy data from one register to another, or to compare data in two different registers, or an arithmetic function on data in a register. First generation languages are seldom used today for system programming. Yet, they still have their place among programmers who work on hardware drivers and device-direct commands since the resulting machine code is extremely fast and efficient due to the fact that it is executed immediately by the CPU. Subsequent higher level languages do not have this advantage. However, that represents the only real advantage of a first generation language. Their lack of readability makes code errors more difficult to find. Likewise, they are written specifically for a particular device and are non-transferable. This prevents code from being portable from one CPU to another. A solution to a problem must be re-written from ground up on separate CPU architectures2.

Machine Code:

01110001 10001111 01110010

Second Generation (2GL)

Second Generation programming languages, known as assembly languages, have a very strong, but not always one-to-one relationship with machine code, i.e. one statement in assembly language often translates to one statement in machine language. The difference between the two is in the grammar. In assembly language, the programmer can read and write code much easier using labels and operations that are represented as text, whereas in machine code, it would be represented only as binary code, or 1's and 0's which are largely unreadable or at least not intuitive. The collection of all the assembly commands is called the Instruction Set. These consist of fairly intuitive mnemonics representing what the command does. The following example illustrates the difference in code statements in each language and how they would compare.

In machine code to copy data from one register to another could look something like this:

01110001 10001111 01110010

In Assembly language the same instruction could look something like:

CPY 0x71 0x72

As you can see it is much easier to decipher what the code is doing in assembly language, i.e. copying the contents at memory location 0x71 to location 0x72.

Like first generation languages, the instruction set and specific grammars of a second generation programming language are CPU dependent. Coding on this type of platform requires specification documents that include coding syntax, register names, the built in operation names, and CPU electrical hardware information, that are all specific to the CPU. Unlike first generation languages, assembly language requires a conversion process called assembly, which converts the assembly code into machine language - the only language a CPU can understand.

Below is an example listing of assembly language source code (a subroutine), generated by the Netwide Assembler (NASM), which is an assembler and disassembler for the Intel x86 architecture:

                                                  ;-----------------------------------------------------------
                                                  ; zstr_count:
                                                  ; Counts a zero-terminated ASCII string to determine its size
                                                  ; in:   eax = start address of the zero terminated string
                                                  ; out:  ecx = count = the length of the string

zstr_count:                                       ; Entry point
00000030 B9FFFFFFFF      mov  ecx, -1             ; Init the loop counter, pre-decrement
                                                  ;  to compensate for the increment
.loop:
00000035 41              inc  ecx                 ; Add 1 to the loop counter
00000036 803C0800        cmp  byte [eax + ecx], 0 ; Compare the value at the string's
                                                  ;  [starting memory address Plus the
                                                  ;  loop offset], to zero
0000003A 75F9            jne  .loop               ; If the memory value is not zero,
                                                  ;  then jump to the label called '.loop',
                                                  ;  otherwise continue to the next line
.done:
                                                  ; We don't do a final increment,
                                                  ;  because even though the count is base 1,
                                                  ;  we do not include the zero terminator in the
                                                  ;  string's length
0000003C C3              ret                      ; Return to the calling program

The x86 instruction set applies to 32-bit Intel processors, while the x64 instruction set applies to the newer 64-bit Intel processors. x64 is a generic name for the 64-bit extensions to Intel's and AMD's 32-bit x86 Instruction Set Architecture (ISA). The updated instruction set is also grouped according to the processor's architecture, i.e. i386, i486, i686, and more generally is referred to as x86 32 and x86 64 (also known as AMD64)3 4 5 6 7.

Third Generation (3GL)

Third Generation programming languages brought logical structure to software development and made the programming languages more programmer friendly. This was brought about by a next higher level of abstraction these languages offered and places them at a higher level then the previously discussed generations. Third generation languages offer a closer approach to natural humans speech and provided a much more efficient way to communicate what the code was intended to do.

Early examples of 3GL are Fortran, ALGOL, COBOL, first introduced in the late 1950's. These were later followed by C, C++, Java, BASIC, Pascal and others and even today, these later third generation languages are still widely used in all forms of software development from business applications to video games. 3GLs focus on structured programming and have no meaning for object encapsulation concepts. Later on, C++, Java, C# followed a different path than those originally mapped out by 3GL guidelines.

Third generation programming languages are characterized by a one-to-many mapping with assembly and machine language, i.e. one statement in a third generation language will translate to numerous statements in a first or second generation language. With the syntax of these new languages furthering themselves from the hardware level languages, there was a need for the source code to be translated in a more complex manner before it could be ready to be executed by the CPU. This characteristic led to the creation of compilers and interpreters. The role of the compiler and interpreter is to not only to translate the source code to machine code, but also to verify that the source code is syntactically correct based on the programming language's grammar.

Another key characteristic of a third generation language is its hardware independence. A specific compiler is still required to translate code written in 3GL down to a machine executable form but the grammar of the language itself is no longer dependent on any one CPU. Code is therefore portable across many platforms.

Below is an example of a function written in functional notation from the Third Generation (3GL) for the C language:

 #include<stdio.h>
 int add_func(int array[], int N)
 {
     if (N <= 0)
         return 0;
     return (add_func(array, N -1) + array[N -1]);
 }
 void main()
 {
     int i, n=0, sum=0;
     int arr[]={1,2,3,4,5};
     n=sizeof(arr)/sizeof(arr[0]);
     printf("Sum of elements in array is %d\n", add_func(arr, n));
 }

Fourth Generation (4GL)

The Fourth Generation of programming languages resulted as a new higher level of abstraction was created to generate a more natural flowing language which gave the equivalent of very complicated 3GL instructions but with few errors.

On early computing machines, it was common to use a few punched cards written in a 4GL languages to replace boxes of punched cards written in a 3GL language.

We can identify a few types of 4GLs:

  • Table-driven programming - the programmer uses control tables instead of code to define the logic. Example: PowerBuilder
  • Report-generator - generates a report, or a program to generate a report, based on data and report format. Examples: Oracle Reports, OpenOffice Base
  • Form-generator - manage online interactions or generate programs to do so
  • Data management - provide sophisticated commands for data manipulation/documentation, for statistical analysis and reporting. Examples: SAS, SPSS, Stata
  • Other - attempts to generate whole systems from CASE tool outputs have been made

Some engineering systems have been automated to use data flow diagrams, relationship diagrams and life history diagrams to generate an impressive number of lines of code as the final product.

4GL examples: DataFlex, FoxPro, IBM Rational EGL, LabView, Perl, Python, Ruby

A sample code written in a database access 4GL can be seen below:

 EXTRACT ALL CUSTOMERS WHERE "PREVIOUS PURCHASES" TOTAL MORE THAN $1000

Fifth Generation (5GL)

Fifth Generation programming languages are based on solving problems using constraints rather than an algorithm written by a programmer. They are designed to make the computer solve a given problem without the programmer. Most programming languages of the logic paradigm are considered to be 5GLs. They are used primarily in artificial intelligence research. Some examples of 5GL languages include PROLOG, OPS5, and Mercury. This section will restrict the discussion to PROLOG to illustrate the features of 5GL languages.

PROLOG is a declarative language, programs are created by entering facts and rules which govern the execution of the program. These facts and rules are called clauses. When a program is executed, some input is provided to the program and is queried against the clauses to produce a result. The following is an example of a simple program in PROLOG and the output it results when it is executed:

 /* Clauses */
 has(jack,apples).
 has(ann,plums).
 has(dan,money).
 fruit(apples).
 fruit(plums).
 
 /* Sample queries */
 ?- has(jack,_). 
 yes 
 ?- has(X,apples),has(Y,plums).
 X = jack 
 Y = ann 
 ?- has(X,apples),has(X,plums).
 no

In this example, first the clauses are declared against which the executing program queries against. The clauses are fairly arbitrary and just given as example. The "has" clauses indicates that a person has some item, which could read as: "Jack has apples", or "Dan has money". The "fruit" clauses indicate that apples and plums are considered a fruit or: "apples are fruit".

The sample queries show the PROLOG interpreter and how a user can query against the clauses. The first query asks whether Jack has anything, which based on the clause definition is true. Thus, the response would be, "yes". The second query asks to identify who has apples and who has plums. The answers are then returned as variables which are declared in the query whether they exist. In this case the answer is Jack and Ann. The last query asks if there is a singular person having both apples and plums as denoted by the X variable. Reviewing at the clauses, this is can be seen to not be true and the response is "no".

Obviously, this is an entirely different paradigm in programming. Only five clauses were provided. As would have been common in other language, there are no loops or any kind of variable initialization, etc. Given the initial set of clauses, numerous queries can be made which relate to the clauses and the answers are then provided8.

Paradigms

What is a paradigm? A paradigm is a model or a pattern by which a programming language constructs its programs. It is a type of template. There are six main programming paradigms:

  • Imperative - these languages are driven by statements.
  • Functional – these languages are based on functions.
  • Logic – these languages perform computations based in logic.
  • Visual – these languages use graphics to express code/algorithms.
  • Procedural – these languages group statements of code into a procedures, which are also known as subroutines, or functions.
  • Object-Oriented – these languages organize code into classes or prototypes of objects that encapsulate information.

Every programming language can be classified into at least one paradigm. Sometimes multiple paradigms apply, depending upon the features of the language. Most older programming languages are typically classified as having a single paradigm. Newer languages can generally be classified as having multiple paradigms. Classifying programming languages using paradigms helps in determining which languages may be appropriate for specific tasks or problems.

  • Multi-Paradigm – these languages support more than one paradigm.

Imperative Programming

The imperative paradigm is associated with programming with an explicit sequence of commands that update state, with statements that perform the actions. The imperative paradigm borrows from the imperative mood in natural languages, which expresses commands, in the same way that an imperative computer program consists of commands for the computer to perform. Programming in the imperative paradigm focuses on describing how a program operates. The control flow in imperative programming is explicit, which means that commands show how the computations take place, line by line, with each line affecting the global state of the computation. Imperative languages generally use goto statements and branch tables for flow control.

The term "imperative programming" is often used to contrast declarative programming, which focuses on what the program should accomplish without specifying how the program should achieve the result. In this paradigm, the program's state is defined by the contents of its local memory, and the code statements are instructions in a native machine language that the computer understands. Recipes use similar concepts to the style of imperative programming. Each step in the recipe is an instruction, and the physical world holds the state.

The way the hardware works in nearly every computer is imperative in nature, which is designed to execute machine code, which is native to the computer hardware and is always written in the imperative style. From this perspective, this low level of programming is the basis of all programming languages, because all programming must run on computer hardware. A higher level to imperative languages uses variables and complex coded statements, but still follows the same structure and properties of this paradigm. Since the ideas of imperative programming are both familiar and directly attributed in the computer's hardware, most computer languages are built in the imperative style, especially at the lowest level9.

Examples of imperative languages include Perl, Batch and Bash scripts.

Imperative-Programming-Languages.png

Functional Programming

The functional programming paradigm, or applicative, is a style of building the structure and elements of computer programming with function calls, while avoiding any global state - treating computation as the evaluation of mathematical functions, while avoiding the changing of state and mutable data. It is a declarative programming paradigm, which means its programming is made up of expressions, or declarations, instead of statements. With functional programming, there are no commands - it is built from the functions and expressions themselves, without any side effects. The elimination of side effects, such as changes in state that don't change based on function inputs, can make a program much easier to understand, and creates the ability to predict its behavior, which was one of the key motivations for the development of functional programming.

Compared to source code from other paradigms, functional code is 1) much shorter, 2) less prone to errors, and 3) can be proven correct a lot easier than other paradigms. There is more inherent parallelism in its processing, so good compilers can produce much faster-running code. Lambda calculus, a mathematical abstraction, provides a theoretical framework for describing the evaluation of functions and their syntax, and forms the basis of almost all of the current functional programming languages. Function-level programming also combines functions with functionals, or combinators. A theoretical formulation similar to lambda, combinatory logic preceded it in its invention and is commonly perceived as more abstract than lambda. Both lambda calculus and combinatory logic were developed to achieve a clear approach to the foundations of mathematics.

In functional programming, control flow is expressed by combining function calls instead of assigning values to variables. Its programming is without assignment statements, and instead one applies functions to arguments that are passed into the function calls. The real power of this paradigm comes from passing functions into other functions, and also through returning functions from functions. In functional programming code, the output values of functions depends solely on the input arguments that are passed to the function. Calling a function named f two times with the same input argument value x produces the same result f(x) each time. In contrast, procedures that depend on a local or global state may produce different results in subsequent calls when the same arguments are used with a different program state10.

Functional-Programming-Languages.png

Examples of Functional Languages include LISP, Scheme, and Haskel.

Logic Programming

The logic paradigm, also known as rule-based, pertains to programming by specifying a set of facts and rules, largely based on formal logic. An engine infers the answers to questions based on inputs or assertions of facts that lack traditional control flow. A program written in a logic programming language, such as Prolog, is a set of sentences in logical form, expressing the facts and the rules about a problem domain.

The first proposal to use a computer program to represent the clausal form of logic was made by Dr. Cordell Green. He established the theoretical basis for the field of logic programming, as well as many formal, inference-based Artificial Intelligence (AI) systems. He used a reduction of a subset of LISP into a set of axioms, or fundamental assumptions that serve as a basis for deduction of theorems, with an input-output relation, to compute the relation through simulating the execution of the program in LISP.

Logic programming can be traced back to Artificial Intelligence (AI) debates in the late 1960s and early 1970s about declarative versus procedural representations of knowledge. Advocates of declarative representations were working at Stanford, associated with John McCarthy, Bertram Raphael and Cordell Green, and in Edinburgh, with John Alan Robinson, Pat Hayes, and Robert Kowalski. Advocates of procedural representations were working at the Massachusetts Institute of Technology (MIT), under direction from the pioneer of AI, Marvin Minsky - author of The Society of Mind, and Seymour Papert.

The use of a computer program to represent and execute mathematical logic is a feature of lambda calculus, which was developed by Alonzo Church in the 1930's. Foster and Elcock's Absys developed a combination of equations and lambda calculus in an assertional programming language, which doesn't place any constraints on the order in which the operations of the program are performed. An assertion is a statement that a predicate, or something that is affirmed or denied of the subject in a proposition in logic, is expected to always be true at that point in the code.

The use of predicate logic as a programming language is based on the interpretation of the procedures that are intended to be solved or reduced in form by the computer. This involves feeding the computer with procedures that represent a mathematical proof or problem in a way that the computer can understand what is to be done with the information. This process involves providing the computer with sets of facts and rules11.

In these logical languages, rules are written in clauses:

 F :- X1, …, Xn.

and are read declaratively as logical implications:

 F if X1 and … and Xn.

F is called the head of the rule, and X1, …, Xn is called the body.

Facts are rules that have no body, and are written in the simplified form:

 F.

The declarative form of a logic program can be used by a computer programmer to verify their correctness of the problem. Additionally, logic-based program transformation techniques can be used to transform the logic program into an equivalent logic program that has increased efficiency in its logic, while the program remains congruent in respect to the original program with regards to what their differing statements compute or reduce to.

Examples of major programming languages in the logic paradigm include Answer Set Programming (ASP), Datalog, and Prolog.

Logical-Programming-Languages.png

Visual Programming

A Visual Programming Language (VPL) is any programming language of the visual paradigm that allows users create programs through the manipulation of the program's graphical elements rather than by specifying them textually. They may be further classified into icon-based languages, form-based languages, and diagram-based languages, according to the type and extent of visual expressions used. A VPL allows programming with visual expressions, through spatial arrangements of text and graphical symbols, used either as elements of a program's syntax or of secondary notation. However, the "visual languages", including Visual Basic, Visual C#, and Visual J# of the Microsoft Visual Studio IDE are not visual programming languages - they are textual, even though they use graphics to create Graphical User Interfaces (GUIs), because their code is still written out textually.

Visual programming environments provide graphical icons, forms, or diagram elements which can be manipulated by users in an interactive fashion according to a specific set of spatial grammar possibilities for program construction. A visually transformed language is a non-visual language, such as Python, with a superimposed visual representation, such as Nuke - which is a node-based visual programming software for visual effects compositing created by The Foundry. Naturally visual languages have inherent visual expressions for which there are no textual equivalents.

Current developments in visual language paradigm are trying to integrate visual programming with dataflow programming languages, or programs that model a directed graph of the data flowing between operations, to either have access to the program state for instantaneous online debugging, or automatic program generation and documentation. Many VPLs, known as diagrammatic programming of the dataflow paradigm, are based on the idea of "boxes and arrows", where other screen objects are treated as entities, or boxes, connected by arcs, arrows, or lines, which represent relations between the objects12.

Graphical languages of the visual programming paradigm that use graphical code blocks to build visual programs include App Inventor, and Scratch.

Visual-Programming-Languages.png

Procedural Programming

The procedural paradigm is derived from structured programming, and associated with imperative programming. It is based upon the concept of the procedure call, or subroutine, which is based with functions vs. procedures. Procedures, also known as routines, subroutines, or functions - similar to those used in functional programming, contain a series of computational steps to be carried out by the computer. The focus of procedural programming is to break down a programming task into collections of data structures, including variables and subroutines. Any one of the procedures in a program might be called and executed at any point during that program's execution, including by other procedures or from itself.

Computer processors provide hardware support for procedural programming through 1) a stack register and 2) instructions for calling procedures in addition to the ability to return values from them. While there could theoretically be hardware designed for other types of programming languages, such as LISP machines or Java processors, not a single commercially successful attempt has been made. Procedural programming languages are also considered to be imperative languages, because they make explicit references to the state of the execution environment, including the use of variables that correspond to processor registers.

Because of the abilities to 1) specify a simple interface, 2) be self-contained, and 3) be reused, procedures are an easy way for making lots of different pieces of code come together when written by different people or organizations through libraries. In procedural programming, modularity is usually desirable, especially in large, complicated programs. The inputs to a procedure are usually defined syntactically in the form of arguments, while the outputs are usually delivered as return values. Scoping also helps to keep procedures modular, by preventing a procedure from accessing the variables of other procedures, sometimes including other instances of itself, without the explicit authorization from within a specific procedure's instance.

With less modular procedures, a quickly-written program tends to have its procedures interacting with a large number of variables in a global execution environment, which other procedures could conceivably modify, reducing the complexity of the program, and potentially creating semantic errors in the process. Even though modularized procedures may seem like they are object-oriented, the difference is that procedural programming uses procedures to operate on data structures, while object-oriented programming bundles the procedures and data structures together; so in object-oriented programming, an object operates on its own data structure13.

The first significant procedural programming languages first appeared in 1960, including Fortran, ALGOL, COBOL and BASIC. Pascal and C were published in the 1970s, and Ada was released in 1980. Go is an example of a more modern procedural language, first published in 2009.

Procedural-Programming-Languages.png

Object-Oriented Programming (OOP)

The Object-Oriented Programming (OOP) paradigm pertains to the process of programming through defining objects that send asynchronous messages to each other. The data objects have their own encapsulated internal state and public interfaces, as defined by their classes, which encompass their attributes and actions. The OOP objects respond to the messages by performing operations known as methods. The messages can have arguments, just like function calls. An OOP program has a world of objects, each with their own local (or private) memory and a set of operations that are unique to their classes which define their interaction with other objects. These programs have a different feel than non object-oriented programs - because of the way other paradigms execute on the processor with a single shared pool of memory.

In object-oriented programming, an object operates on its own data structure. OOP is to break down a programming task into methods, or objects that expose behavior, and data known as members or attributes using interfaces that connect to other object instances. One of the more recognizable aspects of object-oriented languages is that conditionals and loops become messages themselves, whose arguments are usually blocks of executable code.

Object-orientation can include:

  • Class-based: Objects get state and behavior based on membership in a class. The classes, or blueprints, are defined beforehand - and the objects, or houses, are instantiated from the classes.
  • Prototype-based: Objects get their behavior from a prototype object. Objects are the primary entities, and classes do not exist. The prototype of an object is just another object to which the new object is linked to. Every object has only one prototype link. New objects can be created based on existing object(s) as their prototype(s). The attributes and methods of a prototype are delegated to all of the objects to which the prototype is linked to. Attributes and methods are owned individually by the object, and may not be shared by other objects of the same prototype. Only a single inheritance can be implemented through the prototype.

The first object-oriented language was Simula-67. Smalltalk was next, and was known as the first “pure” object-oriented language, or a language that is 100% based in the object-oriented paradigm. A lot of languages developed beginning from the 1980s through today have been labeled as object-oriented, most notably Python, C++, Ada 95, C#, and Ruby. The first object-oriented programming language that was built from the ground up was Java in 199514.

Many popular object-oriented languages take some elements of OOP and mix them in to imperative-looking code. In the following example, we can see that .length and .toUpper are methods rather than top-level functions, but the for loop and if statement are back to being control structures:

 result = []
 for p in people {
     if p.name.length > 5 {
         result.add(p.name.toUpper);
     }
 }
 return result.sort;

Object-Oriented-Programming-Languages.png

Multi-Paradigm Programming

Multi-paradigm programming is a classification of languages by a collection of paradigms. Very few languages implement only one paradigm 100%, however when they do, they are known as "pure". It is incredibly rare to have a “pure Object-Oriented” language or a “pure Functional” language. Most languages facilitate programming in more than one paradigm. If a language is purposely designed to allow programming in many different paradigms, it is known as a multi-paradigm language. If a language accidentally supports multiple paradigms, there isn't a special word for that.

Very few languages implement OOP the way Alan Kay envisioned it in the 1970s while developing the Smalltalk language when he worked at Xerox PARC. He introduced the term Object-Oriented Programming (OOP) to represent the pervasive use of objects and messages as the basis for computation. Many programming languages that are in the object-oriented paradigm are also in the imperative, functional, and procedural paradigms as well. The diagram below shows these multiple paradigms for popular programming languages, including C#, Perl, Python, and Scala 15 16.

Other paradigms include:

  • Actor Programming – supports concurrent computations with actors that make decisions in response to their environment. The actors can be selfish or competitive in behavior.
  • Concurrent Programming – which has concurrency, including multi-threading, support for distributed computing, message passing, and shared resources such as CPU, memory, and persistent storage.
  • Constraint Programming – relations between variables are expressed as constraint networks, directing allowable solutions. It uses constraint satisfaction through the simplex algorithm.
  • Dataflow Programming – works through forced recalculation of formulas when data values change, i.e. Excel spreadsheets
  • Declarative Programming – i.e. HTML/CSS, which describes actions on a page but not how to actually display it
  • Distributed Programming – supports autonomous computers that communicate via computer networks
  • Metaprogramming – programs that manipulate other programs through their data, or computations that are completed during compile time that would normally be done at runtime
  • Template Metaprogramming – metaprogramming methods that use compiler templates to generate temporary source code, which is then merged by the compiler with the rest of the source code when it is compiled
  • Reflective Programming – metaprogramming method that a program modifies or extends itself
  • Pipeline Programming – a syntax change that adds syntax to nest function calls into a language that was not designed with any to begin with
  • Rule-based Programming – a network of rules that comprise a knowledge set that can be used for expert systems, problem deduction, and problem resolution

Multiparadigm-Languages.png

Compiler

A compiler is an essential and fundamental tool used by a programmer. Computer programs can become very complex, and in order to produce them in a productive way, instructions are abstracted into higher level languages. An understanding of the compiler mechanism is important to gaining a better understanding of the way programming languages are designed, which, in turn, will lead to a better mastery of their usage.

The compiler is a special type of computer program that translates, or compiles a program written in a familiar or preferred high-level programming language such as C, C++, or Java into executable machine language, or object code, which the CPU on a target device can run. This process of top-down abstraction is needed, because attempting to write complex programs in machine language would be tedious and prone to committing natural human errors. Furthermore, finding errors in a program written in 0's and 1's would again be another tedious situation. Thus people write programs in one of the high-level languages, and then a compiler processes or translates that program into a form suitable for the machine architecture to execute.

  • The word "compiler" usually refers to 'translating programs written in high-level source language down to a lower level machine-executable binary code'.
  • The reverse of a compiler, which would convert machine code up to a high-level language, is called a "decompiler".
  • A program that translates from one high level language to another high level language is called a "transpiler".
  • A program that compiles in its own language is called a "bootstrap compiler".

A proper compiler program will include more than just the final code generation logic. It will generally consist of several program blocks which follow the flow of formatting, error checking, processing, code generation and optimization. These are not representative of any hardware structure - they are logical structures, and they are also modular by nature so that they can be more easily updated and maintained.

How a Compiler Works

A better understanding of how a compiler works can be obtained by separating its overall task into phases and then further breaking down into the steps which each phase performs. The three main phases and the steps which occur are as follows:

Compiler-Structure.png

  • The front end phase, where the first step consists of a lexical analysis, the second step is syntax analysis or parsing of the code and the third step is type checking.
  • The middle phase, where the intermediate code generation takes place.
  • The back end phase, where register allocations are assigned, the machine code is generated, and lastly, the assembly and linking occurs.

Conceptually these steps operate in sequence, each step (except the first) takes the output from the previous step as its input. Here is a detailed view of the three phases and their steps:


 (Phase 1) The Front End
 
           (Step 1) Lexical Analysis: This is where the high-level language text is read in and divided into tokens, each of which corresponds
                    to a symbol in the programming language, for instance a keyword, identifier, or symbol name.
           (Step 2) Syntax Analysis or Parsing: Please refer to the Parsing section as it goes into greater detail.
           (Step 3) Type-Checking: This step analyses the syntax tree of the language to check if the program violates certain consistency 
                    requirements. For example if a  variable is used but not declared or if it is used but in a context that does not make 
                    sense for the given type of the variable, such that if you attempted to use a boolean value as a function pointer. 
 
 (Phase 2) The Middle End
 
           (Step 4) Intermediate Code Generation: The program then takes what the front end has built from this point and generates a simple
                    machine independent intermediate language. 
 
 (Phase 3) The Back End
 
           (Step 5) Register Allocation: The back end starts by taking the intermediate language that the middle end created and allocates the
                    symbolic variable names to numbers each of which corresponds to a register in the machines architecture code.
           (Step 6) Machine Code Generation: This is where the intermediate language created in the middle end gets translated into assembly 
                    language for the specified machine architecture.
           (Step 7) Assembly and Linking: The assembly-language code is translated into binary representation and addresses of variables, 
                    function, etc., are determined.
 

Below is a pictorial representation of a compiler:

Compiler2018.png

Compiler Structure

As we studied in the previous section, compilers help in the translation of high-level source code into a low-level machine code. The question is how is it constructed to do so? In the beginning of programming languages, it seemed to be a very complex problem to design a compiler for a language, since it had to process the language with the resources available to build one. But in today's world, there are many different tools that have been created for compilers. Even Computer Science students can build their own small language and design a compiler for it today.

In order to convert the source language into the target language, the compilers must go through the abstract representation of the source language, and then it should initiate its processing and construction of the target language. Four main elements are important for this purpose, as follows:

  • Procedures: Subroutines or a sequence of the information that acts as program instructions, which perform the specific task. A procedure may be a function, a routine, a method, or a small program.
  • Packages: A collection of actions. These can also be a written program on procedures or rules, and they are always associated with the documentation of operations being done on the computer system.
  • Abstract Data Types (ADT): This is an abstraction of the data in the language. Any number of actions can be performed by the ADT set.
  • Variables: This is the smaller part of the ADT, and only two operations can be performed on a single variable here.

COMPILER STRUCTURE.png

Pre-Processor

A pre-processor is a program that provides initial processing of the input source code to produce an output that is suitably formatted to be ready as the input to the next stage. It is usually not a standalone program. It performs many basic error checking functions as well as the removal of commented sections of code. Pre-processing can also replace global variables with their actual value at every occurrence, process macros, perform file inclusion, and include certain language extensions which might have been implied in the programming language but now need to be explicitly stated in the compiled code.

This initial stage will also include lexical analysis and tokenization. This is also called a lexer and is a scan routine in the pre-processor stage which reads the sequence of text characters of the program and generates a corresponding series of tokens based on the logical grouping. A token is a collection of characters which represent a single logical entity. Lexing is still a grouping mechanism and much of the complexity of semantic analysis is deferred to the next stage of parsing. An example of a lexer is lex, a program which generates lexical analyzers. Lex reads an input stream of characters which specify the lexical analyzer's functionality and outputs source code implementing the desired lexer in the C programming language.

Parser

Parsing is an essential first step for both interpreters and compilers. It is the processing step of analyzing the string of input characters of the program, and the building of a parse tree from the sequence of tokens produced by the lexical scanner of the previous stage. Parsing is how text is extracted from files and translated into sensible commands. This also means that it is absolutely necessary to clearly define the grammar of a programming language to be able to parse, or to translate, it into unique commands. The parser may also perform some additional tasks. An additional common parser feature is the ability to check for correct syntax and to flag errors. This stage produces the parse tree, which is the grammar of the program. Every expression is resolved into it's grammatical constituents. The result is an abstracted form of the original source.

Code Generator

At this stage, the source code has been pre-processed, tokenized, and checked for semantic correctness resulting in the parse tree, which is an internal abstracted version of the original source code. This abstracted intermediate representation of the source program is now ready to be converted into the object program output. The code generation stage is more specifically dependent on the target machine and operating system.

Optimizer

The optimizer uses the process of rearranging and adapting the compiled program to produce a more efficient object program output. The optimization can modify the original source program by streamlining the logical flow. This usually does not depend on the target machine, but is more of process of "cleaning up" the code. It is something the programmer could have done, but was perhaps limited by time constraints. The optimization can also modify the generated code with the intent to improve run efficiency on a particular target processor. This does depend on the target machine and it is something that the programmer would usually not even attempt.

Assembler

An assembler is a program that reads in a low level assembly language source code file and then translates that source code into the equivalent machine code for the specific processor, so the program can be read and run by that target device. For example, one can write a line of code to execute a subroutine (i.e. assembly equivalent of a C/C++ function or an OOP method). The instruction "JSR $0F5A" might get assembled or translated to "1001101110". Every instruction would have a unique translation from mnemonic to binary code and these here are, of course, fabricated values only severing to illustrate the point.

Linker

The Linker uses the process of combining object files and libraries into a single output image. Static linking simply combines all library routines used in a program into the final executable file. Dynamic linking postpones the final linking until run time and then combines them interactively while the process is running.

Phases of the Compiler

The compilation process is a sequential process. Each one of the processes in compilation takes place in different phases of the compiler.

Below is the representation of such phases of the compiler.

Phases-of-a-Compiler.png

  1. Lexical Analysis: The very first phase where the source code is analysed in the compiler. Here in this phase the source code will be scanned as the stream of the characters and then it will be converted into meaningful lexemes. Lexemes are nothing but the tokens represented in the lexical analyzer.
  2. Syntax Analysis: The net phase in the compiler where the syntax of the source code will be cross checked and the compilation is done. This phase is also called as the parsing. This will take the tokens from the lexical analyzer (lexemes) as the input and then cross check again the source code grammar and the syntax. If any error found in here will be notified as the syntax error later by the compiler. Output of this will be all checked by the parser and made sure that they are syntactically correct with the grammar of the language.
  3. Semantic Analysis: This phase of the compiler checks for the rules of the language being followed properly. The semantic analyzer produces an annotated syntax tree as the output. And then it is again checked for the correctness of the parse tree with the rules of the language.
  4. Intermediate Code Generation: After the semantic analysis the compiler generates and intermediate code for the source code which is the input for the target machine. It represents the program(source code) in the form of the abstract machine. This intermediate code will be in the middle of the high-level (source code) programming language and the low-level(target language) programming language. This code will be generated in such a way that it makes it easier for the translator to convert it to the machine language.
  5. Code Optimization: This phase optimizes the intermediate code. Optimization can be considered as the removing of the unnecessary code lines from the intermediate code, and re-arranging the lines in them. The end result of this phase we will be having the effective and much more optimized intermediate code,which is much more easier for the machine to convert it to the machine code.
  6. Code Generation: In this phase, the output of the code optimizer is taken into consideration and then it is generated as the machine code for the target machine. This generator sequentially translate the intermediate code to the machine language which is relocatable machine code. This machine code would perform all the tasks which was specified in the source code.
  7. Symbol Table: Symbol table is a table which contains all the entries which might be either linear or the hash table.It will be maintaining the entry for each of the name specified in it. In this compiler phases, we came across the symbol table, which actually stores the data structures throughout all the phases of the compiler. The symbols, tokens, identifiers etc are all stored in here with their names along with the data types specified in this table. This table actually makes it easier for the compilers to access the elements for the search and the retrieval effective. This symbol table is also used for the scope management purpose as well. This will also check on the Verification and the assignment operations happening on the variables. this also checks for the semantica correction of the source code as well. The symbol table can perform the operation like search (lookup), insert.

Pros and Cons of a Compiler

Advantages

  • Bridges the semantic gap between modern high-level languages which usable programs would typically be written in and the low-level languages which the computer must have in order to execute the program.
  • Creates a program for the specific architecture which can execute efficiently.
  • The compilers can identify some of the mistakes done by the programmer while coding.
  • Code can be optimized for most efficient execution on specific hardware.
  • Takes up less memory when the compiled program is executed.

Disadvantages

  • It may be necessary to develop and/or maintain several copies of source code to be compatible with different hardware platforms, for example when different peripheral sets are involved.
  • Potentially lengthy compile times due to large amounts of source code in projects

Interpreter

An interpreter is a computer program which translates high-level programs (source code) into an intermediate language, which can then later be converted into an executable language. An interpreter is often called an abstract machine or a virtual machine. In an interpreter, the set of programs will be executed directly without being previously compiled into machine language.

How an Interpreter Works

An interpreter reads source code one instruction at a time, converts the current line into machine code and executes it, "live". That line of machine code is then discarded and the next line of source is read, thereby executing each line as it is converted.

This throughput is achieved by making the interpreter simulate a virtual machine using the base set of instructions of the programming language as its machine language. In other words, it is a high level machine which no longer reads just 0's and 1's; it reads the actual programming language itself. Another way to to look at this is to compare an interpreter to a program which implements a library containing the basic set of instructions of the programming language being interpreted into machine language. This is achieved by utilizing the interpreter's parsing abilities to translate the grammar that the user types in for the high-level language into a more abstract grammar that the machine can better utilize.

Below is a pictorial representation of an interpreter:

Interpreter2018.png

Structure of an Interpreter

An interpreter will typically consist of set of functions for the particular syntactic category. Each function will have its own role to play while executing the high-level program. These functions in the interpreter can be implemented in any language that it supports. Eventually all we need to make this work is the interpreter, which converts the high level language to machine language, or the compiler, which does the same tasks.

Pre-Processor

A pre-processor is a program that provides initial processing of an input to produce an output that is suitably formatted to be ready as the input to the next stage. It is not usually a standalone program. It performs many basic error checking functions, as well as the removal of commented sections of code. Pre-processing can also include the replacement of global variables with their actual value at every occurrence, the processing of macros, file inclusion and certain language extensions which might have been implied in the programming language but now need to be explicitly stated in the compile.

This initial stage called the pre-processor will also include lexical analysis and tokenization. This is also called a lexer, and is a scan routine in the pre-processor stage which reads the sequence of text characters of the program, and generates a corresponding series of tokens based on the logical grouping. A token is a collection of characters which represents a single logical entity. Lexing is still a grouping mechanism and much of the complexity of semantic analysis is deferred to the next stage of parsing. An example of a lexer is lex, a program which generates lexical analyzers. Lex reads an input stream of characters which specify the lexical analyzer's functionality and it outputs source code implementing the desired lexer in the C programming language.

Parser

Parsing is the processing step that analyzes the string of input characters that make up the program, and it builds a parse tree from the sequence of tokens produced by the lexical scanner of the previous stage. The parse tree is the grammar. Every expression is resolved into its grammatical constituents. The result is an abstracted form of the original source.

The parser is mainly used as the component in both the interpreters and the compilers, the overall process of the parsing involves the following stages:

  • Lexical analysis: A lexical analyzer is used to produce tokens from a input string characters.Later the input character are broken into small components to give the end results as the meaningful expressions.
  • Syntactic Analysis: Checks for the errors in the output of the lexical analysis process. This makes use of a context-free grammar that defines algorithmic procedures for components.
  • Semantic Parsing: The final stage of parsing which actually outputs meaningful expressions of the validated expressions. errors will be corrected here and actions will be taken on the previously created expressions.

Main purpose of the parser is to determine the input data be derived from the grammar of the language. If this condition is satisfied, it will check in what ways this condition works and goes accordingly. This working of the parser again is achieved in two ways:

  • Top-Down-Parsing:This phase involves, searching of the parse tree from the leftmost derivations of the input stream and then proceed to the next ones. This methods includes the LL parser and the recursive descent parser in them.
  • Bottom-Up-Parsing: Involves rewriting the input back to the symbol. This type of parsing is also known as the shift reduce parsing. Example for this would be the LR parsing.

Optimizer

Optimization is the process of rearranging or adapting the parse tree to produce a more efficient program flow, or to result in faster execution of the code. The optimization can modify the original source program by streamlining the logical flow to a more optimized abstract syntax tree. This step can also include partial compiles of repeated steps, or recursive statements.

Engine

There is no executable output file from an interpreter. An interpreter engine performs the actions described by the high level program. The results of the program's execution is displayed in the interpreter or command prompt.

Pros and Cons of an Interpreter

Advantages

  • Easier for beginner programmers to become familiar with a language and start writing useful programs.
  • Flexibility to make changes at run-time and to execute new code immediately or interactively.
  • Faster to troubleshoot and fix issues since there is no need to save and re-compile the source code at every change.
  • Fairly portable across computers provided the interpreter software is installed on each new computer

Disadvantages

  • Slower execution time due to each line needing to be interpreted before it is executed. There is more background overhead.
  • Lack of an Integrated Development Environment to help manage code.
  • Not a good platform for commercial software as the source code is not hidden in any way.

Applications of the Interpreter

  • These applications are used in the execution of the language, and they help to stitch up the language's grammar all together into the interpreted results.
  • Interpreters use virtualization, which refers to the acts of creating virtual versions. Interpreters can help in analyzing and creating the desired task by using the virtual concepts. For instance, virtual storage devices and virtual computer networks can be made.
  • Self-modifying code can be generated by interpreters, which helps in the research of Artificial Intelligence (AI) through the interpretation of logical languages such as LISP.

Compilers versus Interpreters

  • Compilers take more time to process the source code, because they go through the entire set of source code, and then analyze it before compiling it to the machine code.
  • Interpreters take less time than compilers to analyze statements, since they only consider a single statement at a time.
  • With an interpreter, no object code is generated, so it can be considered as memory efficient, while compilers generate objects for linking object code, which occupies more space in memory.
  • Interpreter debugging is easier than a compiler's, as it scans every statement separately.
  • With compilers, error messages are only sent out after compilation of the entire program.

Language Concepts

Programming language concepts basically cover the concepts of abstract syntax, interpretations, stack machines, compilations, typechecked, real machine code and garbage collectors.

Statements

A statement can consist of a single line for the code or multiple lines of the code which are often enclosed in between brackets, i.e. Statements may contain internal components as well. The statements are mainly given as the input to the computer by the programmer. Few things about the statements:

  • The parameter sin the statements are surrounded by the parenthesis.
  • Each statements is terminated by the semicolon.
  • The Verb print is called puts, for "put string". The word string is the computer term for the some of the texts.
  • Statement block is always enclosed in {} brackets and can contain nested blocks.
 y = 1;
 if (y == 1) {
   do_something();
 }

Multiple lines of statements are usually referred to as statement blocks.

There are many different types of the statements in programming languages:

 int main()
 {
     int n = 1;                        // declaration statement
     n = n + 1;                        // expression statement
     std::cout << "n = " << n << '\n'; // expression statement
     return 0;                         // return statement
 }

Declaration Statements

Also known as assignment statements - these statements are for the declaration of a new variable or constant in a programming block. These statements initialize a variable for the first time it is referred to in the program. Examples:

 y = 97
 my_variable = "Hello, World!"
 checkCondition = True
Variables

Variables are used to store information in memory. They act as containers for values that will be used at the time of the program's execution. They also help in labeling data with a name, which helps the programmer understand the code better. Their sole purpose is to store data in memory and assign a name to the data. They provide easy access to the data throughout the program just by calling the name of the variable.

Variable Types

In computer programming, types or data types help classify the many different types of values a variable can hold. Some common data types include Boolean (TRUE or FALSE), String, and Integer. In a typed language such as Java, variables must have a type associated with them. The type of the variable is important because it determines a variety of characteristics of that specific variable such as:

  • Possible types of values that can be assigned
  • The actual meaning of the variable
  • The allowable operations with the variables based on its data type

For example, in the Java code below:

 int count = 10;

This line of code says there is a variable called count, and it is of type int, which in Java and many other languages is short for Integer. The Integer type ONLY allows values which are whole numbers like: -5, 0, 5, 10, 12, 20 etc. Writing a Java program with code such as: int count = "10"; will produce a syntactical error and the build will fail at compile time. The failure will be caused because the type of this variable count was specified as an int, while a String was provided17

Data Types are Important

In computer memory, data as well as other information about programs is stored. This memory is made up of binary digits known as bits which are stored as either a 0 or 1. Computers then access these bits - not as individual bits, but rather as a collection of bits usually consisting of 8, 16, 32 or 64 bit aggregates. The aggregate bit collection length on a machine is determined by how many simultaneous operations the main CPU can process. A general rule of thumb is that the larger the number of bits a machine has, the larger the amount of data it can process.

With the context of memory established, the next step is to look at how our data is stored. Data, like a variable that may have the value of 10, is stored in memory at an actual physical location known as an address. It is difficult for us to know exactly where this location is or how to access it and this is why we assign a variable to this address. Variables are a much easier way of reserving a spot in memory to hold information on the program. Having to remember random numbers would make it a much more painful process.

When the data type of a variable is specified, for example an int, an actual block of memory is set aside for the data. The number of bytes which can be reserved for a data type can vary based on which programming language is in use. As mentioned above, the variable used is of type int. When executed, this line of code tells the computer exactly how much memory to reserve because of the data type specified: int.

Strong versus Weak Typed Languages

Revisiting the differences between a strong versus a weakly typed language: strongly typed variables, such as integers and strings, have already been mentioned. When using a language with strong types, variables are known to have specific characteristics, and those characteristics cannot change. Conversely, with weakly typed languages, the type and characteristics are determined by how the variable is used. For example, looking at the following expression:

 x = 5;
 y = "10";
 z = x + y;

Depending on the language, the value of z will be 5 + 10 if the code can interpret "10" as the integer value 10. However, in JavaScript - a weak typed language, z would equal "510" - which is just the string value of the integer of x concatenated with the string value of y, while if this were attempted in a strongly typed language such as Python, the compiler would raise an error since any string value assigned to an integer typed variable is not allowed. A way around this would be to use the following code:

 z = x + int(y)     //Converts y to an integer

We could also make the y variable equal to "ten" and the code will convert the string value to the corresponding ASCII values it represents. There is no better language - there are pros and cons in each scenario. For strongly typed languages, the programmer is forced to create the behavior of the program explicitly. This excludes the possibility of "hidden" behavior.

Later on, some other programmer could be modifying legacy code. There would be no issue that the parameter names were not descriptive enough causing confusion of what type of variable they might be working with. For weakly typed languages, the advantage comes with the writing of less code. Also, it might execute faster because there is no overhead for processing involved in dealing with the unique data types a strongly typed language would have to.

Dynamic versus Static Typed Languages

Dynamic typing and static typing can sometimes cause confusion among programmers. A dynamic typed programming language is one that the type is interpreted at runtime. The pros of this type of language is that one can write less code quicker, as they don't have to specify the type of each variable they use. The downside of this is with error checking, because when the type is being computed at runtime, there's no type-checking before hand - so in order to test, a programmer must run the program and clean-up afterwards. Common languages that are dynamically typed include Python and PHP. Below is some Python code:

 firstName = "Joshua"
 firstName = 10;

We have defined the variable name firstName as a string value "Joshua" - then right after that we changed the value to an integer 10. In a dynamically typed language, this would run perfectly fine and the value of firstName would be 10, unless if you changed it again.

In a statically typed language, the type is computed at compile time instead of at runtime. If there are no errors when compiled, the code is guaranteed to work, syntactically. One of the many pros of a static typed language includes the speed at which a programmer can fix bugs and type mismatches because of this precision. A downside is that this requires writing more code, and making sure that all variables are typed properly before compiling the program. Popular programming languages that are statically typed include C, C++, Java, and C#.

 string firstName = "Joshua";
 int age = 22;

In the above example, we have a statically typed language, because all of the variable names and their types are explicitly declared. If we attempted to assign the value of 10 to firstName, we would get an error at compile time telling us that it cannot assign an integer to a variable of type string.

Data Types

Primitive data types are mostly found in statically typed languages such as Java. This means that all variable names and their types must be declared explicitly in order to pass the type-check at compile time. Below is a table of the more common primitive data types in statically typed languages.

Type Description Example
int Whole number integers 10, 45, 109
double Real numbers 10.0, 5.67, -98.5
char Single character 'j', 's', 'a'

Complex data types include the array, and the hash. A complex data type is any type that holds multiple independent values. An array is a complex data type because it is one object made up of a number of independent values. In the Java code below, this statement is saying that we want to define an array variable named songs and to set aside memory to store 10 integer values. It's important to note that we are not setting any of the 10 integer values inside the array - we are just allocating the space we need for them in memory.

 int songs[10];

Complex data types can also be types that you can define yourself with a class. In Object-Oriented Programming (OOP) languages, we can create a new class which will have properties and functions inside it that actually define what the data type is. In the following Java code, we are creating a new class named Car, which has three properties: year, make, and model, and 2 methods. We can use this class to create a new object, or variable, of type Car. Once the line Car myCar = new Car() compiles, we have initialized a new object of type Car, and we can use the public properties and methods inside the class throughout the program18.

 // Creating the class Car.
 public class Car {
   private int year;
   private String make;
   private String model;
 
     public setYear(int y) {
       year = y;
     }
     public getYear() {
       return year;
     }
 }
 // Using the Car class.
 Car myCar = new Car();
 myCar.setYear(2013);
 System.out.println(myCar.getYear());

We can see the vast differences between a primitive data type and a complex data type.

Variable Scope

The scope of a variable is defined by its availability during the execution of the program. Some programmers say that the scope of the variable is actually a property in it of itself, or in other words, a variable's scope is in the area of a program where that specific variable has meaning, and thus is visible to the code within that scope. In most programming languages, there are three different levels of scope - Global, Parametric, and Local.

Global Variables

Global variables are commonly known as having an indefinite scope, or in other words, they are visible to the entire program. Different programming languages handle global variables differently. For an instance, many languages such as C and C++ have no global identifier, however if there is a variable defined outside of a function, then that variable is treated as having "file scope", which means it is visible everywhere in the file. In PHP, there is an actual global keyword you must place in front of the variable name, which then allows you to use that variable anywhere in the program. For example:

 // config.php
 <?php
 
     global $SITE_NAME = "Josh's CD Shop";
 
 ?>
 // index.php
 <?php
 
     require_once('config.php');
     echo $SITE_NAME;
 
 ?>

In these two snippets of code, we are defining a global variable named $SITE_NAME in a file named config.php. Then in a completely different file, index.php, we are including the config.php file and using the global variable we defined.

Global variables are often viewed as a bad practice, because they can create confusing, and more error-prone code. In a larger project, code can be read and maintained when the scope of the variables are limited to their specific code block or function. Using global variables can cause errors, because in any part of a program, they can be modified if not in protected memory, which makes it difficult to understand their intended use.

Local Variables

Local variables can be accessed and used only within certain functions; they are given a local scope within their function(s). For example:

 <?php
 
     function test() {
       $a = 5;	
     }
     echo $a;
 
 ?>

If we try to run this PHP code, we will get an error, because we are trying to echo a variable that isn't in scope. In our function, test() defined a variable with the assignment statement $a = 5. Then, after the }, which closes that scope, we tried to echo that same variable $a. This produced an error because the global scope of the PHP file doesn't have access to local variables inside functions.

Parametric Variables

Parametric variables can be accessed inside a method or function which accepts it as an parameter in a function definition, or an argument in a function call. This is how you solve the problem of needing global variables in a program - by passing values to functions instead of creating global variables that function can use. The following code demonstrates the power and simplicity of a parametric variable. We have a class, which has a private property $price, and the only way to set this value would be to create a custom setter function and pass that function a value we'd like to use. In the function setPrice($myPrice, $amountOff) we specified the arguments of $myPrice and $amountOff, which we will accept into the function as parametric scoped variables. They can only be accessed within this function.


 <?php
     class Item {
       private $price;
       function setPrice($myPrice, $amountOff) {
         if($amountOff != null) {
           // We have a sale.
           $price = ($myPrice - $amountOff);
         } else {
           $price = $myPrice;
         }
       }
       function getPrice() {
         return $price;
       }
     }
 
     //Somewhere else in the program
     $item = new Item();
     $item.setPrice(100, 20);
     echo $item.getPrice();
 
 ?>

Using this concept, we can create a program that is harmonious, because we always know what the variables are and what their intentions are. If we put $amountOff in a global variable, we may never know how that variable is even defined, never mind where. Using it this way, we are in complete control of where that value is set and what function(s) have access to it.

Expression Statements

Expressions are a combination of one or more values, constants, variables, operators, and/or functions that the programming language interprets and computes to return another value. Sometimes expressions can also include method invocations as well. These statements, which can be assignments or function calls, give the result of the expression when they are evaluated. After the statements are evaluated, the output might be stored in a variable or further used in the next programming block.

Variables and operators are the basic building blocks of programs. Certain expressions can be made into statements which are complete units of execution. Expressions perform the work of a program. Expressions only contain identifiers, literals and operators. Operators include arithmetic and Boolean operators, the function call operator () the subscription operator [], and can be reduced to a value, which can be any type of data object. Example:

 4 + 6                            //Arithmetic
 map(lambda y: y*y, range(25))    //Function Calls
 [a.y for a in this_list]         //Subscription Operator

Expressions are used to compute and assign values to variables, and to help control the flow of a program's execution. The job of an expression is to perform the computation indicated by the elements of the expression and to return a value that is the result of the computation. In languages that mix imperative and functional programming, such as LISP, distinction between expressions and statements is not made. In functional programming, there are no statements; everything is an expression. The distinction is usually observed in wording: a statement is executed, while an expression is evaluated.

In Python, an expression is a combination of values, variables, and operators. If you type an expression on the command line, the interpreter evaluates it and displays the result:

 >>> 1 + 1
 2

The evaluation of an expression produces a value, which is why expressions can appear on the right hand side of assignment statements. A value all by itself is a simple expression, and so is a variable.

 >>> x = 2
 >>> 17
 17
 >>> x
 2
Boolean Expressions

Boolean expressions are logical statements that are either TRUE or FALSE. Boolean expressions can compare data of any type as long as both parts of the expression have the same basic data type. You can test data to see if it is equal to, greater than, or less than other data.

A Boolean expression can consist of Boolean data, such as:

  • BOOLEAN values (TRUE and FALSE)
  • BOOLEAN variables or formulas
  • Functions that yield BOOLEAN results
  • BOOLEAN values calculated by comparison operators
 >>> yes = True
 >>> yes == False
 False
 >>> yes
 True
Numeric Expression

Numerical expressions are based in arithmetic that results in a value. There are two kinds of numeric values: integers, which are whole numbers like 1 and -17, and floating point numbers, which contain a decimal point, like 2.5 and -0.0001. Operators act on operands to yield a result. Basic arithmetic operations are addition, subtraction, multiplication and division. There are also more advanced operations, such as modulus, square roots, exponents, and logarithmic functions19.

Arithmetic expression contains operators and operands, such as:

 1 + 5
 8 * 7
 7.3 / 2
 80 % 25

Null Statements

An expression statement without an expression is referred to as a null statement. It is often used to provide an empty body to a for or while loop, such as for a wait timer. It can also be used to carry a label in the end of a compound statement.

 null print_number(int n)
 {
     System.out.println("Your magic number is " + n);
 }

Compound Statements

Compound statements, also referred to as "blocks", are sometimes brace-enclosed sequences of statements - such as in C++ and Java, or they can be indented lines like in Python. When one statement is expected, but multiple statements need to be executed in sequence - such as in a conditional statement or a loop, a compound statement may be used. Each compound statement introduces its own block scope, which means that variables that are declared inside that block are destroyed at the closing brace in reverse order.

 int main()
 {
     {                                // start of block
         std::ofstream f("test.txt"); // declaration statment
         f << "abc\n";                // expression statement
     }                                // end of block: f is flushed and closed
     std::ifstream f("test.txt"); 
     std::string str;
     f >> str; 
 }

Selection Statements

Selection statements, also known as conditional statements, include different options based on the evaluation of Boolean conditions. Conditional statements perform different computations or actions depending on whether a Boolean condition evaluates to TRUE or FALSE. IF one condition is true, then the proceeding statements will be executed - otherwise and optionally, if the ELSE IF condition evaluates to TRUE, then its block of code is executed - otherwise and also optionally, the ELSE block of statements may be executed. Selection statements can also include SWITCH and CASE statements. Conditional statements generally check the value of a variable or expression in the program's environment, and causes it to execute statements that correspond with the condition. This mutates the program state by updating the value of variables. There is no return value from a conditional statement - only the program state mutation.

In imperative programming languages, conditional statements are usually used, whereas in functional programming, the terms conditional expression or conditional construct are used due to different meanings.

IF

IF.png

The simplest form of IF statement associates a condition with a sequence of statements enclosed by the keywords THEN and END IF (not ENDIF), as follows:

IF condition THEN
  sequence_of_statements
END IF;

The sequence of statements above are executed only if the condition is true. If the condition is false or null, the IF statement does nothing. In either case, control passes to the next statement. Here is an example:

IF x > y THEN
  high := x;
END IF;
IF-ELSE

IF-ELSE.png

The second form of IF statement adds the keyword ELSE followed by an alternative sequence of statements, as follows:

IF condition THEN
  sequence_of_statements1
ELSE
  sequence_of_statements2
END IF;

The sequence of statements in the ELSE clause is executed only if the condition is false or null. Thus, the ELSE clause ensures that a sequence of statements is executed.

IF-THEN-ELSEIF

IF-ELSEIF-ELSE.png

Sometimes you want to select an action from several mutually exclusive alternatives. In many programming languages, the third form of IF statement uses the keyword ELSIF (ELSE IF) to introduce additional conditions, as follows:

IF condition1 THEN
  sequence_of_statements1
ELSIF condition2 THEN
  sequence_of_statements2
ELSE
  sequence_of_statements3
END IF;

If the first condition is false or null, the ELSIF clause tests another condition. An IF statement can have any number of ELSIF clauses; the final ELSE clause is optional. Conditions are evaluated from top to bottom. If any condition is true, its associated sequence of statements is executed and control passes to the next statement. If all conditions are false or null, the sequence in the ELSE clause is executed.

SWITCH

Switch statements in a programming language compare a given value with specified constants and take action according to the first constant to match. There is usually a provision for a default action to be taken if no match succeeds, but it’s optional. SWITCH uses CASE and DEFAULT within its structure to perform a conditional task. The SWITCH statement is preferred in cases where there is a lengthy list that needs to be compared with the variable.

SWITCH (conditional expression) {
      case 1:
            statement 1;
      case 2:
            statement 2;
      case 3:
             statement 3;
      default:
             statement 4;
}

SWITCH.png

Iteration Statements

Iterations types like do WHILE, FOR, FOREACH, DO WHILE, and RANGE FOR are included in these statements, which mainly enable us to iterate through a collection of data structures such as arrays, queues, etc. A looping statement executes a set of statements repeatedly, mutating the loop variable and the program state with each iteration. There is no return value from looping statements, however they do mutate the program's state. There are a few side effects when working with loop statements. A for loop usually specifies the number of times it is iterated, whereas a while loop is an infinite loop that continues to execute until a condition is met. Both options offer some flexibility as to how they are executed.

FOR LOOP

A FOR loop is a statement which allows code to be repeatedly executed. A FOR loop contains 3 parts: Initialization, Condition, and Increment or Decrements.

  • Initialization: This step is executed first, and only once, when we are entering into the loop first time. This allows us to declare and initialize any loop control variable(s).
  • Condition: This is the next step after initialization step. If it is true, the body of the loop is executed - otherwise it is false, and the body of the loop does not execute and flow of control goes outside of the for loop.
  • Increment or Decrements: Increment or Decrement step is executed, after completion of the Initialization and Condition steps, where the loop body code is executed. This statement allows to update any loop control variables.
for (initialization; condition; increment) {
  statement(s);
}

FOR-LOOP.png

An example of for loop:

arr = [];
for (count = 0; count < 5; count++) {
  arr.push(count * 10);
}
//arr = [0 10 20 30 40 50] 

The looping statement mutates the state of the count variable and executes the arr mutation statement multiple times. If the count variable is modified with an assignment statement within the body of looping statement, the side effect of the same looping statement is different:

arr = [];
for (count = 0; count < 5; count++) {
 arr.push(count * 10);
 count = 6; // side effect changes program environment
}
//arr = [ 0 ] 
WHILE LOOP

The WHILE-LOOP statement associates a condition with a sequence of statements enclosed by the keywords LOOP and END LOOP, as follows:

WHILE condition LOOP
  sequence_of_statements 
END LOOP;

Before each iteration of the loop, the condition is evaluated. If the condition is true, the sequence of statements is executed, then control resumes at the top of the loop. If the condition is false or null, the loop is bypassed and control passes to the next statement.

WHILE-LOOP.png

DO WHILE LOOP

Do while loop is a control flow statement that executes a block of code at least once, or not, depending on Boolean condition at the end of the block.

The do while construct consists of statements and a condition. First, the statement code within the block is executed, and then the condition is evaluated. If the condition is true, the code within the block is executed again. This repeats until the condition becomes false. Because do while loops check the condition after the block is executed, the control structure is often also known as a post-test loop. Do-while loop is also referred to as an exit-condition loop.

do {
  do_work();  
} while (condition);

is equivalent as:

do_work();
while (condition) {
  do_work();
}

It is possible for the condition to always evaluate to true, creating an infinite loop. When such a loop is created intentionally, there is usually another control structure (like a break statement) that allows termination of the loop.

Break statement provides an exit out of the loop:

while (true) {
  do_work();
  if (!condition) break;
}
RANGE FOR LOOP

In C++ and Python, among other languages, there is such a thing as a for loop that uses a range, such as the following Python code:

 for index in range(5):
     print("This is iteration #" + index)

Which will output:

 This is iteration #0
 This is iteration #1
 This is iteration #2
 This is iteration #3
 This is iteration #4

Jump Statements

Jump statements help in transferring the control of the program's execution to another section of the programming block. Jump statements cause a jump, unconditionally, to another statement elsewhere in the code. They are used primarily with switch statements and loops to interrupt the program's execution20.

The jump statements we will mainly use are:

BREAK Statements

A break statement terminates the execution of the immediately enclosed do, for, switch, or while statement. Control passes to the statement immediately following the loop body or the compound statement of a switch statement.

The break statement has the following syntax:

 break;
CONTINUE Statements

The continue statement passes control to the end of the immediately enclosing do, for, or while statement. The continue statement has the following syntax:

 continue;

The continue statement is equivalent to a GOTO statement within an iteration statement that passes control to the end of the loop body.

 while(1)
 {
   do_something();
   do_something_else();
   continue;
   something();
   something_else();
 }

The above statements would execute do_something() and do_something_else() in an infinite loop, but would never execute something() or something_else() because of the continue statement. These statements are equivalent to the following as well:

while(1)

 {
   do_something();
   do_something_else();
   goto label_1;
   something();
   something_else();
   label_1:
   ;
 }

The continue statement can be used in a loop. If used inside a switch statement that is inside a loop, the continue statement causes continued execution of the enclosing loop after exiting from the body of the switch statement.

GOTO Statements

The GOTO statement has the following syntax:

 goto identifier;

The GOTO statement transfers program control to a labeled statement, unconditionally, whereas the label identifier is in the scope of the function containing the GOTO statement. The labeled statement is the next statement executed. When branching into a block by using a GOTO statement, care must be taken, because storage is allocated for automatic variables declared within a block when the block is activated. Automatic variables declared in the block are not initialized when a GOTO statement branches into a block.

RETURN Statements

The return statement terminates the execution of a function, and returns control to the calling function, with or without a return value. A function may contain any number of return statements.

The return statement has the following syntax:

 return <expression>;
 
        -or-
 
 return;


If present, the expression is evaluated and its value is returned to the calling function. When used with an expression, its value either can be, or must be converted into the declared type of the containing function's return value - depending upon the programming language's requirements. A return statement with an expression cannot appear in a function whose return type is void. In Python, it is possible to return None.

If there is no expression and the function is not defined as void, the return value is not defined. For an example, the following function, main, returns an unpredictable value to the operating system:

 main ( )
 {
   return;
 }

Reaching a closing brace that terminates the function is equivalent to executing a return statement without an expression, so it can be omitted in this instance.

Exception Handling Statements

These are the type of statements that help us in the handling of exceptions in the program. Exception handling statements helps us to recover from the exceptional conditions that might occur at run time. Otherwise, there would be a runtime error, and the program would break.

Here are some of the exceptional handling cases which might be used in program sections:

  • Throw
  • Try-Catch
  • Try-Finally
  • Try-Catch-Finally

Here is an example of an exception handling statement in Python:

 try:
     open(filename, 'w')
 except IOError:
     return None
 except KeyboardError:
     do_something()

This example tries to open a file to be written to, and if there is an I/O Error detected by Python as a result of the statements that are inside the try block, the statements inside the except block are then executed. You can also specify multiple types of except blocks to handle different types of errors in Python - so in this instance if there is a KeyboardError, then the do_something() code is executed instead of the return None, or in other words the return None only executes if there is an I/O Error.

Sometimes exception handling statements can interfere with debugging a program, if you have a lot of code inside a try with an except that just prints an error. Instead, when developing a program, it may be better to leave out the exception handling statements so that you can see the actual errors and where they are occurring. This way, you can fix the actual errors rather than just exit out of the program.

Atomic and Synchronized Blocks (TM TS)

Atomic and synchronized blocks are used to implement transactional memory. Atomic means that only one change can occur at a time, which is used in instances such as threading, where multiple threads may have access to the same variable. In such a case, a mutex may be used.

In Java, you can declare a variable as volatile to define it as read and write atomic, like this:

 volatile long
 
     -or-
 
 volatile double

In C++, the following commands may be used for Atomic and Synchronized Blocks:

 synchronized <compound-statement>

These are synchronized blocks, executed in single total order with all synchronized blocks.

 atomic_noexcept <compound-statement>

This is an atomic block that aborts on exceptions.

 atomic_cancel <compound-statement>

This is an atomic block that rolls back on exceptions.

 atomic_commit <compound-statement>

This is an atomic block that commits on exceptions.

An operation executing a command to modify or read shared memory is atomic if and only if it completes in a single step relative to all other threads. When an atomic store is performed on a shared variable, no other thread can observe half-complete modifications. Non-atomic loads and stores do not guarantee that when an operation is performed on a shared variable, it reads the entire value as it appeared at a single moment in time.

 "Any time two threads operate on a shared variable concurrently, and one of those operations performs a write, both threads must use atomic operations."

If this rule is violated, and either thread doesn't use an atomic operation, you’ll have what the C++11 standard refers to as a data race, which is a race condition. The C++11 standard doesn’t tell you why data races are bad; only that if you have one, “undefined behavior” will result (§1.10.21). Such data races are bad because they result in torn reads and/or torn writes21 22.

Data Structure Concepts

Data Structures are a form of organizing and storing data in a computer, so that it can be accessed and modified efficiently. They are collections of data values, any relationships between them, and any functions, methods, or procedures that can be applied to the collections of data. Data structures implement Abstract Data Types (ADT), which specify the operations, or procedures, that can be performed on the data structure, and the computational complexity of those methods. A data structure is a concrete implementation of the space provided by an abstract data type.

Various types of data structures are matched to different types of programs, and some are specialized to specific tasks to be performed, such as B-tree indexes for data retrieval from relational databases, and hash tables for compilers to look up identifiers. Efficient data structures are needed when developing efficient algorithms23.

Sequences

Chapter 2 Sequence.png

Sequences are enumerated collections of objects in which repetitions are allowed, such as Strings, Arrays, and Lists24.

Examples of Sequences include:

Arrays:

  • Bit Field - consists of a number of adjacent computer memory addresses that have been registered to hold a sequence of bits, stored in a manner so that any single bit or group of bits within the set can be recalled.
  • Bitmap - a mapping from some domain, such as a range of integers, to bits.
  • Circular Buffer - a data structure that uses a single, fixed-size buffer as if it were connected end-to-end, for the purpose of buffering data streams.
  • Control Table - tables that control the control flow, or deal with a program's control.
  • Image - a serialized copy of a computer system's state stored in some non-volatile form, such as a file.
  • Gap Buffer - a dynamic array that allows insertion and deletion of elements clustered near the same relative location.
  • Heightmap - a raster image for storing values for display in 3D computer graphics.
  • Lookup Table - a simpler array indexing operation that replaces an array's runtime computation.
  • Matrix - a rectangular array arranged of rows and columns consisting of numbers, symbols, or expressions.
  • Sparse Matrix - a matrix in which most elements have a zero value.
  • Iliffe Vector - a data structure, known as a display, used to implement multi-dimensional arrays.

Lists:

  • Doubly Linked List - a linked data structure that consists of nodes, which are a set of sequentially linked records.
  • Array List - a variable-sized list data structure that is accessed randomly, allowing elements to be added or removed in any order.
  • Linked List - a linear collection of data elements that doesn't receive its linear order by their physical placement in memory.
  • Self-Organizing List - a list that improves its average access time by reordering its elements based on a self-organizing heuristic.
  • Skip List - a list that allows fast searching within an ordered sequence of elements.
  • Unrolled Linked List - a list that stores multiple elements in each node (a variation of the linked list).
  • Conc-Tree List - a data-structure that stores element sequences, and provides amortized time append and prepend operations, time insert and remove operations, and time concatenation.
  • Doubly Connected Edge List - also known as a Half-Edge, its a data structure that represents an embedding of a planar graph in the plane, and polytopes in 3D.
  • Difference List - contains two lists, and represents the difference of those two lists.
  • Free List - a list type used in a scheme for dynamic memory allocation.

Arrays

Chapter 2 Arrays.png

An Array is comprised of any number of elements, which depending upon the language can either be all of the same type or of any type, in a specific order. The elements of an array are accessed using an index of an integer value, in order to specify which element is needed. Arrays can either be of a fixed-length, or a resizable data structure. A language may allocate a contingency of memory for the elements of arrays25 26.


Operations of an associative arrays allow:

  • Adding a pair of data to the collection
  • Removing a pair of data from the collection
  • Modifying an existing pair of data
  • Looking up a value associated with a particular key


An example of an associative array, or dictionary in Python:

 {
     "hello": "world",
     "Monday": 2,
     "Friday": 6
 }

Queues

Chapter 2 Queue.png

A Queue is an abstract data type, or a sequential collection of entities, in which the elements in the linear data structure are organized, and the only methods performed on the collection are 1) enqueuing, or the adding elements to the rear terminal position, and 2) dequeuing, or the removing of elements from the front terminal position. The queue operates as a First-In-First-Out (FIFO) data structure, where the first element added to the queue will be the first one to be removed. This means that when new elements are added to the queue, all of the elements that were added before it have to be removed before the new element is removed.

In computer science, queues provide services where various objects, holding data that represents subject matter such as persons or events, are stored and held to be processed later. The queue performs the function of a buffer in these contexts. Queues are actually common in computer applications, in which they are used as data structures with their own access routines, as an abstract data structure or as classes when used with object-oriented programming languages. Common implementations of queues are linked lists and circular buffers27.

Queues are used with asynchronous messaging in systems integrations in publish / subscribe systems, where a buffer of messages is published to a set of consumers, and the consumers subscribe to the messages - either collectively or through partitioning. Below is a simple Queue implemented in Ruby:

 class Queue
   def initialize
     @list = Array.new
   end
 
   def enqueue(element)
     @list << element
   end
 
   def dequeue
     @list.shift
   end
 end

Stacks

Chapter 2 Stack.png

A Stack is an abstract data type that serves as a collection of elements, with two main operations: (1) push, which adds an element to the collection, and (2) pop, which removes the most recently added element that was not yet removed. The stack data structure uses the Last-In, First-Out (LIFO) method of adding and removing elements. A peek operation can give a glimpse of the top of a stack without popping the element off of the stack. The stack is considered to be a linear data structure. A stack is more abstractly known as a sequential collection, in which the push and pop operations occur only at one end of the structure, known as the top of the stack. This makes it possible to implement a stack as a single linked list with a pointer at the top element. A stack can also be implemented to have a bounded capacity - or in other words, if the stack is full and does not contain enough space to accept an entity, the stack is then considered to be in overflow state28.

The stack can be implemented as a three-element structure:

 structure stack:
     maxsize : integer
     top : integer
     items : array of item
 procedure initialize(stk : stack, size : integer):
     stk.items ← new array of size items, initially empty
     stk.maxsize ← size
     stk.top ← 0

The push operation adds an element and increments the top index, following the checking for overflow:

 procedure push(stk : stack, x : item):
     if stk.top = stk.maxsize:
         report overflow error
     else:
         stk.items[stk.top] ← x
         stk.top ← stk.top + 1

The pop operation decrements the top index after checking for underflow, and returns the item that was previously the top one:

 procedure pop(stk : stack):
     if stk.top = 0:
         report underflow error
     else:
         stk.top ← stk.top − 1
         r ← stk.items[stk.top]

Trees

Chapter 2 Tree.png

In computer science, a tree is a widely used abstract data type (ADT)—or data structure implementing this ADT—that simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes.

A tree data structure can be defined recursively (locally) as a collection of nodes (starting at a root node), where each node is a data structure consisting of a value, together with a list of references to nodes (the "children"), with the constraints that no reference is duplicated, and none points to the root29 30.

Formal tree definition

Tree T is defined as the set of nodes' storing the elements in a parent-child relationship with the following properties:

  • If T is nonempty , it has a special node, called the root node in the tree, this has no parent.
  • Each node v of the tree is different from the root has the unique parent node w every node with parent w is a child of w.

According to this definition the tree can be empty, meaning that if doesn't have any nodes in it.


Terminologies used in the trees Node - Fundamental unit of the tree. Tree is build on one or more nodes.

Root - The very first node in any tree. It has no parents.

Child node -A node connected to any other node in a tree from moving from top to bottom.

Parent node -These are the nodes which have child/ children to their node.

Siblings node - These are the group of node which are in same level of the other nodes. They have the same parent.

Descendant node - A node reachable by repeated proceeding from the parent to child node.

Ancestor node - A node which is reachable by repeated proceding from child to parent.

Leaf node - A node with no child/children. (End node)

Branch node - A node with minimal of one child.

Degree - The number of subtrees a node have.

Edge - The connection between one node and the another.

Path - A sequence of nodes and the edges which leads to the connecting a node with a descendant.

Depth -The depth of a node is the number of the edge from the trees root node to the node.

Tree Traversal Algorithms A traversal is a systematic way to visit all nodes of tree T.Traversing a tree involves iterating the tree all over the nodes in the tree.

Mainly there are three ways to do the tree traversal. 1) Depth-First search 2) Breadth-First Search

Depth First Search (DFS) These types of traversal/tree search involves recursion mechanism at each node starting with the root node and then recursively follow the child node. Depth First traversals involves :

  • Inorder (left node, root node, right node)
  • Preorder(root node, left node, right node)
  • Postorder (Left node, right node, root node)

Breadth-First Search (BFS) These tree traverse in level-Order mechanism, that is they visit every node on the same level before going on to the child(next lower level). This kind of tree traversal happens in breadth wise, that's why the name Breadth First search.

Graphs

Chapter 2 Graph.png

A graph is a pictorial representation of a set of objects where some pairs of objects are connected by links. The interconnected objects are represented by points termed as vertices, and the links that connect the vertices are called edges.

Formally, a graph is a pair of sets (V, E), where V is the set of vertices and E is the set of edges, connecting the pairs of vertices.

Terminology used in Graph

  • Vertex − Each node of the graph is represented as a vertex. In the following example, the labeled circle represents vertices. Thus, A to G are vertices. We can represent them using an array as shown in the following image. Here A can be identified by index 0. B can be identified using index 1 and so on.
  • Edge − Edge represents a path between two vertices or a line between two vertices. In the following example, the lines from A to B, B to C, and so on represents edges. We can use a two-dimensional array to represent an array as shown in the following image. Here AB can be represented as 1 at row 0, column 1, BC as 1 at row 1, column 2 and so on, keeping other combinations as 0. Here edges are of three types
    • Undirected Edges.
    • Directed Edges.
    • Weighted Edges.
  • Adjacency − Two node or vertices are adjacent if they are connected to each other through an edge. In the following example, B is adjacent to A, C is adjacent to B, and so on.
  • Path − Path represents a sequence of edges between the two vertices. In the following example, ABCD represents a path from A to D.
  • Undirected Graph -A graph with only undirected edges is said to be undirected graph
  • Directed Graph -A graph with only directed edges is said to be directed graph.
  • Mixed Graph - A graph with both directed and undirected edges is said to be mixed graph.
  • End Vertices or Endpoints - The two vertices joined by and edge are known as the end vertices.
  • Origin -If an edge is directed, its first endpoint is said to be origin of it.
  • Destination -If an edge is directed, its first endpoint is said to be origin of it and the other endpoint is said to be the destination of the edge.
  • Incident -An edge is said to be incident on a vertex if the vertex is one of the endpoints of the edge.
  • Degree -Total number of edges connected to a vertex is said to be degree of that vertex.
  • Indegree -Total number of incoming edges connected to a vertex is said to be indegree of that vertex.
  • Outdegree -Total number of outgoing edges connected to a vertex is said to be outdegree of that vertex.
  • Self-loop -An edge (undirected or directed) is a self-loop if its two endpoints coincide.
  • Simple Graph -A graph is said to be simple if there are no parallel and self-loop edges.
  • Path -A path is a sequence of alternating vertices and edges that starts at a vertex and ends at a vertex such that each edge is incident to its predecessor and successor vertex.

Graph Traversal Graph traversal is technique used for searching a vertex in a graph. The graph traversal is also used to decide the order of vertices to be visit in the search process. A graph traversal finds the egdes to be used in the search process without creating loops that means using graph traversal we visit all verticces of graph without getting into looping path.

There are two graph traversal techniques and they are as follows...

1)Depth First Search (DFS)

2)Breadth First Search (BFS)

Depth First Search (DFS) DFS traversal of a graph, produces a tree with no loop(Spanning Tree) as final result. We use Stack data structure with maximum size of total number of vertices in the graph to implement DFS traversal of a graph.

Breadth First Search (BFS) BFS traversal of a graph, produces a spanning tree as final result. Spanning Tree is a graph without any loops. We use Queue data structure with maximum size of total number of vertices in the graph to implement BFS traversal of a graph.

Hashes

Chapter 2 Hash Table.png

In computing, a hash table (hash map) is a data structure which implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash collisions where the hash function generates the same index for more than one key. Such collisions must be accommodated in some way.

Hash tables combine the random access ability of an array along with the dynamism of a linked list. The core of the hash tables is having a proper hash function for it to perform better31.

Choosing a hash function A good hash function and the implementation algorithm are essential for the good hash table performance.

A basic requirement for the hash function is that the function should provide the uniform distribution of the hash values.

•Must have the following properties to be effective

• Deterministic (same resulting index for same key)

• Fast execution (O(1) and fast in practice)

• Uses the whole range of indexes allowed

• Distributes indexes evenly (uniform distribution)

•Uniformly distributed output across table

•Maps similar keys to very different hash values

•Goal is speedy insertion, deletion, searches so it uses only very fast operations


Hash table

Hash tables are the Data structure that have following properties:

• Stores keys in an array (table)...

• At an index determined by a hash function

•Ideally, array is relatively sparse

• Keys are not ordered within the array

• Structure supports insert, find, and maybe remove

•Does not support min, max, or traversal

•Ideal for unordered sets and maps


Collision resolution Hash collisions are practically unavoidable when hashing a random subset of a large set of possible keys. For example, if 2,450 keys are hashed into a million buckets, even with a perfectly uniform random distribution, according to the birthday problem there is approximately a 95% chance of at least two of the keys being hashed to the same slot.

Therefore, almost all hash table implementations have some collision resolution strategy to handle such events. Some common strategies are described below. All these methods require that the keys (or pointers to them) be stored in the table, together with the associated values.

Here is a list of collision resolution techniques:

  • Separate chaining
  • Linear probing
  • Quadratic probing
  • Double hashing

Uses of hashing tables

Hash tables are useful in any situation where we have an unpredictable set of symbols (keys) and we want to associate some information with each one, in such a way that we can get at that information in O(1) time. We are assuming that the symbols don’t have any inherent ordering, or we don’t need any ordering-information (don’t need the “largest” or “smallest” symbol).

Hash functions are useful in any situation where we want to substitute a smaller, easier-to-manage proxy for some otherwise big/complex object. We can compare proxies to get a quick check for inequality, or use them to determine whether an object has changed.

  • Hash tables are good for doing a quick search on things.
  • They are used in Associated arrays, they are used to implement many types of in memory table.
  • Database indexing, Hash tables may also be used as disk based structures and the database indices although B-trees are more popular in these applications.
  • Caches, hash tables can be used to implement caches, auxiliary database that are used to speed the access to data that is primarily stored.
  • Sets Besides recovering the entry that has a given key, many hash table implementations can also tell whether such an entry exists or not.
  • Object Representation, Several dynamic language, such as perl, python, use hash tables to implement object. In this representation, the keys are the names of the members of the object,and the values are pointer to corresponding member or method.

Lists

Chapter 2 Lists.png

A linked list is a sequence of data structures, which are connected together via links.

Linked List is a sequence of links which contains items. Each link contains a connection to another link. Linked list is the second most-used data structure after array. Following are the important terms to understand the concept of Linked List.

  • Link -Each link of a linked list can store a data called an element.
  • Next -Each link of a linked list contains a link to the next link called Next.
  • LinkedList-A Linked List contains the connection link to the first link called First.

Types of Linked List Following are the various types of linked list.

Simple Linked List − Item navigation is forward only.

Doubly Linked List − Items can be navigated forward and backward.

Circular Linked List − Last item contains link of the first element as next and the first element has a link to the last element as previous.

Basic Operations on Linked list Following are the basic operations supported by a list.

Insertion − Adds an element at the beginning of the list.

Deletion − Deletes an element at the beginning of the list.

Display − Displays the complete list.

Search − Searches an element using the given key.

Delete − Deletes an element using the given key.

Different Types of the linked list

  • Singly linked list
  • Doubly linked list
  • Multiply linked list
  • Circular linked list
  • Sentinel nodes
  • Empty lists
  • Hash linking
  • List handles
  • Combining alternatives

Advantages of linked list

  • Dynamic Data Structure. Linked list is a dynamic data structure so it can grow and shrink at runtime by allocating and deallocating memeory.
  • Insertion and Deletion. Insertion and deletion of nodes are really easier.
  • No Memory Wastage.
  • Implementation.
  • Memory Usage.
  • Traversal.
  • Reverse Traversing.

Disadvantages of the Linked List

  • They use more memory than arrays because of the storage used by their pointers.
  • Nodes in a linked list must be read in order from the beginning as linked lists are inherently sequential access.
  • Nodes are stored contiguously, greatly increasing the time periods required to access individual elements within the list, especially with a CPU cache.
  • Difficulties arise in linked lists when it comes to reverse traversing. For instance, singly linked lists are cumbersome to navigate backwards and while doubly linked lists are somewhat easier to read, memory is consumed in allocating space for a back-pointer.

Control Flow

Programming-Flow-Control-Diagram.png

The above diagram shows the flow control for the Python function cube_first_two(input_list). Considering the test function cube_first_two([2, 4, 6, 8]), the following flow control takes place:

  1. The Python interpreter executes line 1, which tells the engine that a function named cube_first_two is being defined with a single parameter, input_list.
  2. The Python interpreter skips over the docstring located on lines 2 through 8.
  3. The Python interpreter executes line 9 (Sequential Flow 1), creating an empty list named new_list.
  4. The Python interpreter executes line 10 (Sequential Flow 2), initializing item_count as an integer with the value of 0.
  5. The Python interpreter executes line 11 (Sequential Flow 3), which tells it to start a loop until the condition ‘item_count is less than the length of input_list’ is True.
  6. The Python interpreter executes line 12 (Sequential Flow 4), setting item_count to 1.
  7. The Python interpreter executes line 13 (Sequential Flow 5), which tells it to check the condition ‘item_count is less than or equal to 2’ – which it is (1).
  8. The Python interpreter executes line 14 (Conditional Flow 1), initializing a variable named new_value to the value associated with the expression of the (item_count minus 1)th item in the input_list raised to the power of three (cubed), which sets new_value to 8.
  9. The Python interpreter executes line 15 (Sequential Flow 6), which appends the new_value of 8 to new_list.
  10. The Python interpreter checks line 16 (Conditional Flow 2), and finds that the else statement doesn’t apply in this iteration.
  11. The Python interpreter skips line 17, and goes to line 18 (Conditional Flow 3), which is blank, which tells the interpreter to go back to the while loop to check the condition again.
  12. The Python interpreter executes line 11 (Loop Flow 1), which tells it to continue the loop because the condition ‘item_count (1) is still less than the length of input_list’ (4).
  13. The Python interpreter executes line 12 (Sequential Flow 4), setting item_count to 2.
  14. The Python interpreter executes line 13 (Sequential Flow 5), which tells it to check the condition ‘item_count is less than or equal to 2’ – which it is (2).
  15. The Python interpreter executes line 14 (Conditional Flow 1), updating the variable named new_value to the value associated with the expression of the (item_count minus 1)th item in the input_list raised to the power of three (cubed), which sets new_value to 54.
  16. The Python interpreter executes line 15 (Sequential Flow 6), which appends the new_value of 54 to new_list.
  17. The Python interpreter checks line 16 (Conditional Flow 2), and finds that the else statement doesn’t apply in this iteration.
  18. The Python interpreter skips line 17, and goes to line 18 (Conditional Flow 3), which is blank, which tells the interpreter to go back to the while loop to check the condition again.
  19. The Python interpreter executes line 11 (Loop Flow 1), which tells it to continue the loop because the condition ‘item_count (2) is still less than the length of input_list’ (4).
  20. The Python interpreter executes line 12 (Sequential Flow 4), setting item_count to 3.
  21. The Python interpreter executes line 13 (Sequential Flow 5), which tells it to check the condition ‘item_count is less than or equal to 2’ – which it is not (3).
  22. The Python interpreter advances to the else statement on line 16 (Conditional Flow 4), and then executes line 17, which appends the value of the expression of the (item_count minus 1)th item (6) to new_list (Sequential Flow 7).
  23. The Python interpreter advances to line 18 (Sequential Flow 8), which is blank, which tells the interpreter to go back to the while loop to check the condition again.
  24. The Python interpreter executes line 11 (Loop Flow 1), which tells it to continue the loop because the condition ‘item_count (3) is still less than the length of input_list’ (4).
  25. The Python interpreter executes line 12 (Sequential Flow 4), setting item_count to 4.
  26. The Python interpreter executes line 13 (Sequential Flow 5), which tells it to check the condition ‘item_count is less than or equal to 2’ – which it is not (4).
  27. The Python interpreter advances to the else statement on line 16 (Conditional Flow 4), and then executes line 17, which appends the value of the expression of the (item_count minus 1)th item (8) to new_list (Sequential Flow 7).
  28. The Python interpreter advances to line 18 (Sequential Flow 8), which is blank, which tells the interpreter to go back to the while loop to check the condition again.
  29. The Python interpreter executes line 11 (Loop Flow 1), which tells it to end the loop because the condition ‘item_count (4) is not less than the length of input_list’ (4).
  30. The Python interpreter advances to line 19 (Loop Flow 3) and executes the return statement to return the value of a list with the value of [8, 64, 6, 8]. The program has now completed all steps.

Operators

An operator in a programming language is a symbol that tells the compiler or interpreter to perform specific mathematical, relational or logical operation and produce final result. This chapter will explain the concept of operators and it will take you through the important arithmetic and relational operators available in C, Java, and Python.

There are different types of operators listed as follows:

  • Basic arithmetic operators
  • Relational operators
  • Logical operators
  • Bitwise operators
  • Assignment operators
  • Misc operators

Basic Arithmetic Operators

Operators are special symbols that represent computations, such as addition and multiplication. The values that the operators use are called operands. Commonly used arithmetic operators include +, -, *, / and %32.

Operator Description Example
+ Addition Operator 1 + 2 == 3
- Subtraction Operator 4 - 2 == 2
* Multiplication Operator 5 * 5 == 25
/ Division Operator 10 / 1 == 10
% Modulus (Remainder) Operator 4 % 3 == 1
// Integer Division Operator 4 // 3 == 1
** Exponent Operator 2 ** 2 == 4
++ Increment Operator 2++ == 3
-- Decrement Operator 4-- == 3

Relational operators : These operators tests or defines the relation between two entities in the equation. The expression created using a relational operators forms what is called as the relational expression or a relational condition 33.

The relative precedence levels of many of the operators found in C-style (i.e. C++, Perl and PHP) languages are as follows:

Operators Description
(), [], ->, .,  :: Function calls, scope, array / member access.
 !, ~, -, +, *, &, sizeOf, type cast, ++, -- Unary Operators, sizeof and type casts (right to left).
*, /, %, MOD Multiplication, division, modulus.
+, - Addition and Subtraction.
<<, >> Bitwise shift left and right.
<, <=, >, => Comparisons: less than, greater than.
==,  != Comparisons: equal, not equal.

Logical operators These are the operators are usually operated on the boolean values, when they are returned they return the boolean values. However, the && and the || operators actually return the values of the one pf the specific operands , so if these operators are used with the non boolean values, they may actually return the non boolean values.

Operators Usage Description
Logical AND (&&) a&&b Return the value of a if it can be converted to false; otherwise , return the b value. Thus when used with the boolean values && return true

if both the operands are true, otherwise return the false

Logical Or a // b Returns a if it can be converted to true , otherwise it return b. Thus when used with the booleans values

return true if either operand is true.

Logical NOT (!)  !expression Return false if its single operand can be converted to true, otherwise return the true value.

Here are some of the examples that can be convereted to false:

  • null
  • Nan
  • 0
  • empty string ("" or )
  • Undefined.

Bitwise operators Bitwise operators works on bits and performs the bit-by bit opertion. The truth table are as follows:

p q p&q p/q p^q
0 0 0 0 0
0 1 0 1 1
1 1 1 1 0
1 0 0 1 1


The following are the Bitwise operators supported in c++

Operators Description Usage
& binary And operator copies a bit to the result if this exists in both operands (A & B)
/ Binary Or operator copies a bit if it exists in either operand (A\B)
^ Binary XOR Operator copies the bit if it is set in one operand but not in both. (A^B)
~ binary Ones compliment Operators is unary had the effect of 'flipping' bits. (~A)
<< Binary Left Shift Operator. The Left operands values is moved left by the number of bits specified by the right operand. A<<2
>> Binary Right Shift Operator. The Left operands values is moved right by the number of bits specified by the right operand. A>>2


Assignment Operators

Operators Description Usage
= Simple assignment operator , Assigns values from right side operands to left side operand. C=A+B will assign the values of A+b into C
+= And ADD assignment operator , it adds right operand to the left operand assign the results to the left operand C+= A is equivalent to C=C+A
-= Subtraction AND assignment operator , it subtracts right operand to the left operand assign the results to the left operand C-= A is equivalent to C=C-A
*= Multiplication AND assignment operator , it multiplies right operand to the left operand assign the results to the left operand C*= A is equivalent to C=C*A
/= Division AND assignment operator , it divided right operand to the left operand assign the results to the left operand C/= A is equivalent to C=C/A
 %= Modulus AND assignment operator, it takes modulus using two operands and assignment the result to left operand. C%=A is equivalent to C = C%A

Order of Operations

PEMDAS is an acronym for the words Parenthesis, Exponents, Multiplication, Division, Addition, and Subtraction. Given two or more operations in a single expression, the order of the letters in PEMDAS tells you what to calculate first, second, third and so on, until the calculation is complete. If there are grouping symbols in the expression, PEMDAS tells you to calculate within the grouping symbols first. However, other operators such as modulus are often left out. * and % have equal precedence, so left takes precedence over right, as with multiply and divide.

Without PEMDAS, there are no guidelines for the order in which operations should take place. As an example, to calculate 2 * 5 + 7, you could first multiply 2 * 5 and then add + 7 to get 17. You also have the option to add 5 + 7 first and then multiply by * 2, which would give you 24. But which answer is correct? Using PEMDAS as the rules of operation – the correct answer is 17, because the order of the letters in PEMDAS tell me that multiplication should be performed before addition.

Here's an explanation of the rules given in PEMDAS:

  • Parentheses – complete any calculations in grouping symbols first, i.e. ( ).
  • Exponents – Ignore any other operation, and take any numbers with exponents to their respective powers, i.e. 2 ** 5.
  • Multiplication and Division operations actually have the same priority. Complete only those two operations in the order that they occur from left to right. i.e. 2 * 5
  • Addition and Subtraction operations also have the same priority. You look for these last two operations from left to right and complete them in that order. i.e. 1 + 2

Some programming languages use precedence that conform to arithmetic used in mathematics, while others, such as APL, Smalltalk, and LISP have no operator precedence rules.

Here is an example for to understand the mathematical operators:

  #include <stdio.h>
  int main() {
  int a, b, c;
  
  a = 10;
  b = 20;
  
  c = a + b;   
  printf( "Value of c = %d\n", c);
  
  c = a - b;   
  printf( "Value of c = %d\n", c);
  
  c = a * b;   
  printf( "Value of c = %d\n", c);
  
  c = b / a;   
  printf( "Value of c = %d\n", c);
  
  c = b % a;   
  printf( "Value of c = %d\n", c);
 }


Other Order of Operations Method: Immediate Execution

Calculator input methods are different than the PEMDAS rule. Calculators have two category types - immediate execution and expression formula calculations. On a formula calculator, someone types in an expression, and the value is computed. On an immediate-execution calculator, the user presses a key for each operation, by pressing keys to calculate all the intermediate results, before the final value is shown. In fact, the order of operations is not taking into account on immediate execution. Scientific calculators have buttons for brackets and can take order of operations into account. With operations such as the square root (√) or squared (x2), the number is entered first, and then the operator. Simple four-function calculators, such as those included with most operating systems, usually use this input method34

Summary

In this chapter we have discussed the evolution of programming languages through its generations from most basic machine code to the more human readable forms. We introduced an overview and gave some examples of the existing programming paradigms and how they might be relevant to a particular situation. A few programming language characteristics were also discussed with regards to variables and how they can be represented.

Key Terms

  • Array – a data structure comprised of any number of elements, which depending upon the language can either be all of the same type or of any type, in a specific order, that are accessed using an integer index value, in order to specify which element is needed.
  • Assembler – a language processor that translates symbolic assembly language into equivalent machine language.
  • Assembly Language – a low-level computer language in hexadecimal code whose expressions are symbolic equivalents of the machine-language instructions of a particular computer’s CPU.
  • Boolean Expression – an expression that reduces to either the value True or False.
  • Code Generation – the process by which a compiler's code generator converts some intermediate representation of source code into a machine code form that can be readily executed by a machine.
  • Code Optimization (Software Optimization) – a process of modifying a software system to make some aspect of it work more efficiently or use fewer resources – such as executing more rapidly, or operating with less memory storage or other resources, or drawing less power.
  • Compiler – a computer program that translates a program written in a high-level language into another language, usually machine language.
  • Complex Data Types (Composite Data Types) – a synonym for the composite data type array, in contrast to primitive data types, includes hashes, dictionaries, classes, and lists.
  • Conditional Statement – features of a programming language, which perform different computations or actions depending on whether a programmer-specified boolean condition evaluates to true or false.
  • Data Type – a classification of data in a programming language that tells the compiler or interpreter how the programmer intends to use the data.
  • Declarative Programming – a style of programming (that is not imperative) associated with building the structure and elements of computer programs, while expressing the logic of a computation without describing its control flow.
  • Dynamically Typed Language – a programming language that doesn’t have static type-checking – for an example, the same variable in a program might store either a number or the Boolean value "false". I.e. JavaScript
  • Expression – any legal combination of symbols that represents a value.
  • Fifth Generation programming languages- are based on solving problems using constraints rather than an algorithm written by a programmer. They are designed to make the computer solve a given problem without the programmer.
  • First Generation languages - are machine code. These are sequences of binary instructions coded by the programmer as specific patterns of 1's and 0's that are executed directly by a CPU, and there is no compilation or interpretation of this code.
  • Flow Control – the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated.
  • Fourth Generation of programming languages- resulted as a new higher level of abstraction was created to generate a more natural flowing language which gave the equivalent of very complicated 3GL instructions but with few errors.
  • Functional Language – a programming language that belongs to the functional paradigm, which is a style of building the structure and elements of a program with function calls, while avoiding the changing of state and mutable data.
  • Global Variable – a variable with global scope, meaning that it is visible and accessible throughout the entire program, as long as it isn’t shadowed.
  • Graph – a pictorial representation of a set of objects where some pairs of objects are connected by links, and the interconnected objects are represented by points termed as vertices, while the links that connect the vertices are called edges.
  • Hash Table – also known as a hash map, this is a data structure which implements an associative array abstract data type, a structure that can map keys to values.
  • High-Level Language – a problem-oriented programming language that uses English-like statements and symbols to create sequences of computer instructions and identify memory locations.
  • Imperative Language – a programming language that belongs to the imperative paradigm, associated with an explicit sequence of commands that update state, with statements that perform the actions.
  • Intermediate Code Generation – a process of generating intermediate code that is used for pre-compiling a program for different operating systems, which eliminates the need for a full compiler for every unique machine.
  • Interpreter – a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program.
  • Lexical Analysis – the process of converting a sequence of characters in a computer program into a sequence of tokens, which are strings with an assigned and identified meaning.
  • Linker – a computer program that adjusts two or more machine-language program segments so that they may be simultaneously loaded and executed as a single unit.
  • List – a sequence of data structures, which are connected together via links.
  • Local Variable – a variable that has a local scope in the function or block in which it is declared, and overrides the same variable name in the larger scope within that function or block.
  • Logical Language – a programming language that belongs to the logical paradigm, that pertains to programming by specifying a set of facts and rules and then asking the computer questions, largely based on formal logic.
  • Loop Statement – a programming statement that executes a sequence or block of statements multiple times.
  • Machine Code – a low-level computer language that has instructions for the processing of data in a binary code that can be understood and executed by a computer’s CPU.
  • Multi-Paradigm Language – any programming language that isn’t purely based in one programming paradigm.
  • Numeric Expression – an expression that reduces to a numeric value.
  • Object-Oriented Language – a computer programming language that supports the use of objects, as an entire image, a routine, or a data structure.
  • Operator – constructs which behave generally like functions, such as arithmetic like addition (+), comparison (>), and logical operations (such as AND or &&), assignment (= or :=), and field access in a record or object with dot notation (.).
  • Optimizer – a program that writes or rewrites the instructions in a program in order to maximize efficiency and speed in retrieval, storage, or execution.
  • Order of Operations – a collection of rules that reflect conventions about which procedures to perform first in order to evaluate a given mathematical expression.
  • Package – a directory containing programming modules.
  • Paradigm- is a model or a pattern by which a programming language constructs its programs. It is a type of template.
  • Parametric Variable – a variable that must be given a specific value during the execution of a program or of a procedure within a program.
  • Parser – a program that analyzes a string of characters in order to associate groups of characters with the syntactic units of the underlying grammar of a computer language.
  • PEMDAS – pertaining to order of operations, it stands for Parentheses, Exponents, Multiplication/Division, Addition/Subtraction.
  • Pre-Processor – a program that performs some type of processing, as organization of data or preliminary computation, in advance of another program that will perform most of the processing.
  • Primitive Data Types – types of data that are built-in or basic to a language implementation, including int, double, char, and float.
  • Procedural Language – a programming language that belongs to the procedural paradigm, that is based upon the concept of the procedure call, or subroutine, which contains a series of computational steps to be carried out by the computer.
  • Procedure – a sequence of actions or instructions to be followed in solving a problem or accomplishing a task.
  • Programming Language – a high-level language or an assembly language used to write computer programs.
  • Queue – an abstract data type, or a sequential collection of entities, in which the elements in the linear data structure are organized, and the only methods performed on the collection are 1) enqueuing, or the adding elements to the rear terminal position, and 2) dequeuing, or the removing of elements from the front terminal position.
  • Second Generation programming languages- known as assembly languages, have a very strong, but not always one-to-one relationship with machine code, i.e. one statement in assembly language often translates to one statement in machine language.
  • Semantic Analysis (Context Sensitive Analysis) – a process in compiler construction, usually after parsing, to gather necessary semantic information from the source code, including type checking, or making sure a variable is declared before its use.
  • Sequence - an enumerated collection of objects in which repetitions are allowed, such as strings, arrays, and lists.
  • Stack – an abstract data type that serves as a collection of elements, with two main operations: (1) push, which adds an element to the collection, and (2) pop, which removes the most recently added element that was not yet removed.
  • Static Typed Language – a programming language that is described as having no possibility of an unchecked runtime type error, which is referred to as type safety. I.e. Java
  • Strong Typed Language – the use of programming language types in order to capture invariants of the code and ensure its correctness, which excludes certain classes of programming errors.
  • Symbol Table - an implementation of a data structure used by a programming language translator (such as a compiler or interpreter), where each identifier or symbol in a program's source code is associated with information relating to its declaration or appearance.
  • Syntax Analysis (Parsing) – is the process of analyzing a string of symbols in computer languages or data structures, while conforming to the rules of a formal grammar.
  • Third Generation programming languages - brought logical structure to software development and made the programming languages more programmer friendly. This was brought about by a next higher level of abstraction these languages offered and places them at a higher level then the previously discussed generations.
  • Tree – a widely used abstract data type (ADT)—or data structure implementing this ADT—that simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes.
  • Variable – a symbol, quantity, or function that may assume any given value or set of values.
  • Visual Programming Language (VPL) – a programming language that belongs to the visual paradigm, that allows users to create programs through the manipulation of the program's graphical elements rather than by specifying them textually.
  • Weak Typed Language – a programming language that makes it easy to use a value of one type as if it were a value of another type, such as a string for an integer.

Problem Sets

1. How many different generations of programming languages are there? What are they?
2. What is a paradigm? What are six main programming paradigms?
3. What are compiler and decompiler?
4. What are some advantages and disadvantages of a compiler?
5. What is an interpreter?
6. List some advantages and disadvantages of an interpreter
7. What are data structures?
8. What is a queue?

References

  1. https://en.wikipedia.org/wiki/Programming_language_generations
  2. https://en.wikipedia.org/wiki/Machine_code
  3. http://en.wikipedia.org/wiki/Assembly_language
  4. https://en.wikipedia.org/wiki/X86_assembly_language
  5. https://en.wikipedia.org/wiki/X86_instruction_listings
  6. https://software.intel.com/en-us/articles/introduction-to-x64-assembly
  7. https://en.wikipedia.org/wiki/Netwide_Assembler
  8. https://en.wikipedia.org/wiki/Fifth-generation_programming_language
  9. https://en.wikipedia.org/wiki/Imperative_programming
  10. https://en.wikipedia.org/wiki/Functional_programming
  11. https://en.wikipedia.org/wiki/Logic_programming
  12. https://en.wikipedia.org/wiki/Visual_programming_language
  13. https://en.wikipedia.org/wiki/Procedural_programming
  14. https://en.wikipedia.org/wiki/Object-oriented_programming
  15. https://en.wikipedia.org/wiki/Comparison_of_multi-paradigm_programming_languages
  16. https://en.wikipedia.org/wiki/Programming_paradigm
  17. https://www.tutorialspoint.com/computer_programming/computer_programming_variables.htm
  18. https://en.wikipedia.org/wiki/Data_type
  19. https://stackoverflow.com/questions/4728073/what-is-the-difference-between-an-expression-and-a-statement-in-python
  20. https://www.cs.auckland.ac.nz/references/unix/digital/AQTLTBTE/DOCU_075.HTM
  21. http://en.cppreference.com/w/cpp/language/statements
  22. http://preshing.com/20130618/atomic-vs-non-atomic-operations/
  23. https://en.wikipedia.org/wiki/Data_structure
  24. https://en.wikipedia.org/wiki/List_of_data_structures
  25. https://en.wikipedia.org/wiki/Data_structure
  26. https://en.wikipedia.org/wiki/Associative_array
  27. https://en.wikipedia.org/wiki/Queue_(abstract_data_type)
  28. https://en.wikipedia.org/wiki/Stack_(abstract_data_type)
  29. https://en.wikipedia.org/wiki/Tree_(data_structure)
  30. Data Structure and Algorithms in C++ by Michael T Goodrich.
  31. https://en.wikipedia.org/wiki/Hash_function_security_summary
  32. https://www.oreilly.com/library/view/computer-science-programming/9781449356835/threedot4_basic_arithmetic_operators.html
  33. https://en.wikipedia.org/wiki/Operator_(computer_programming)
  34. https://en.wikipedia.org/wiki/Calculator_input_methods


Top of Page - Prev Chapter - Next Chapter