Mimir:Draft2 Chapter2

From Openitware
Jump to: navigation, search

Back to Table of Contents


CHAPTER TODOs

  • write summary
  • exercises on variables and grammar

Chapter 2: An Overview of Programming Languages

"It has been raining all day on my elephant skull and also elsewhere." - Diary of a genius, Salvador Dali

Programming Language Generations

As our power of knowledge grows from generation to generation, so does that of which we create. Every generation of programming language is built on top of its predecessor's accomplishments, spawning more powerful grammar and therefore more powerful functionality with each epoch.

First Generation (1GL)

First Generation languages are Machine Code. These cryptic instructions coded by the programmer as certain patterns of 1s and 0s are executed directly by a CPU, there is no compilation or interpretation. The instruction is a very low level task, the lowest level--a hardware level task--such as a jump, a copy of data from one register to another, a comparison of data, or an arithmetic function on data in a register. Furthermore, not all CPUs have the same machine code instruction set, though some families of processor may.

Second Generation (2GL)

Second Generation programming languages, known as assembly languages, have a very strong, but not always one-to-one relationship with machine code, i.e. one statement in assembly language often translates to one statement in machine language. The difference between the two is in the grammar. In assembly language, the programmer can read and write code much easier using labels and operations that are represented as text, whereas in machine code, it would be represented only as binary code.

In machine code to copy data from one register to another could look something like this:

01110001 10001111 01110010

In Assembly language the same instruction could look something like:

CPY 0x71 0x72

As you can see it is much easier to decipher what the code is doing in assembly language, i.e. copying the contents at memory location 0x71 to location 0x72.

Like first generation languages, the instruction set and specific grammars of a second generation programming language are CPU dependent. Coding on this type of platform requires specification documents that include coding syntax, register names, the built in operation names, and CPU electrical hardware information, that are all specific to the CPU. Unlike first generation languages, assembly language requires a conversion process called assembly, which converts the assembly code into machine language - the only language a CPU can understand.

Below is an example listing of assembly language source code (a subroutine), generated by the Netwide Assembler (NASM), which is an assembler and disassembler for the Intel x86 architecture:

                                                  ;-----------------------------------------------------------
                                                  ; zstr_count:
                                                  ; Counts a zero-terminated ASCII string to determine its size
                                                  ; in:   eax = start address of the zero terminated string
                                                  ; out:  ecx = count = the length of the string

zstr_count:                                       ; Entry point
00000030 B9FFFFFFFF      mov  ecx, -1             ; Init the loop counter, pre-decrement
                                                  ;  to compensate for the increment
.loop:
00000035 41              inc  ecx                 ; Add 1 to the loop counter
00000036 803C0800        cmp  byte [eax + ecx], 0 ; Compare the value at the string's
                                                  ;  [starting memory address Plus the
                                                  ;  loop offset], to zero
0000003A 75F9            jne  .loop               ; If the memory value is not zero,
                                                  ;  then jump to the label called '.loop',
                                                  ;  otherwise continue to the next line
.done:
                                                  ; We don't do a final increment,
                                                  ;  because even though the count is base 1,
                                                  ;  we do not include the zero terminator in the
                                                  ;  string's length
0000003C C3              ret                      ; Return to the calling program

The x86 instruction set applies to 32-bit Intel processors, while the x64 instruction set applies to the newer 64-bit Intel processors. x64 is a generic name for the 64-bit extensions to Intel's and AMD's 32-bit x86 Instruction Set Architecture (ISA). The updated instruction set is also grouped according to the processor's architecture, i.e. i386, i486, i686, and more generally is referred to as x86 32 and x86 64 (also known as AMD64).

Resources:

Third Generation (3GL)

Third Generation programming languages brought logical structure to the software, in order to make the languages more programmer friendly. The level of abstraction is higher, thus can be considered higher level languages than first and second generation counterparts.

New features have been implemented, like improved support for aggregate data types and expressing concepts in a way that favors the programmer, not the computer.

Early examples for 3GL are Fortran, ALGOL, COBOL, first introduced in the late 1950's, with C, C++, Java, BASIC, Pascal and others to follow. 3GLs focus on structured programming and have no meaning for object encapsulation concepts. C++, Java, C# followed a different path than the one mapped by 3GL guidelines.

Third generation programming languages are characterized by a one-to-many mapping with assembly and machine language, i.e. one statement in a third generation language will translate to numerous statements in a first or second generation language. At this time compilers and interpreters were created; their role is to handle the conversion of source code to machine code, where the one-to-many translation occurs.

Another key characteristic of a third generation language is its hardware independence, as the grammar of the language is no longer dependent on the CPU.

Fourth Generation (4GL)

The need for a Fourth Generation of programming languages appeared as a new higher level of abstraction was envisioned, one that could generate from a more natural language, the equivalent of the very complicated 3GL instructions with fewer errors.

On the early machines, by using a few cards 4GL language, replacing boxes of punched cards written in a 3GL language was common.

We can identify a few types of 4GLs:

  • Table-driven programming - the programmer uses control tables instead of code to define his logic. Example: PowerBuilder
  • Report-generator - generates a report, or a program to generate a report, based on data and report format. Examples: Oracle Reports, OpenOffice Base
  • Form-generator - manage online interactions or generate programs to do so
  • Data management - provide sophisticated commands for data manipulation/documentation, for statistical analysis and reporting. Examples: SAS, SPSS, Stata
  • Other - attempts to generate whole systems from CASE tool outputs have been made

Some engineering systems were automated to use data flow diagrams, relationship diagrams and life history diagrams to generate an impressive number of lines of code as the final product.

4GL examples: DataFlex, FoxPro, IBM Rational EGL, LabView, Perl, Python, Ruby

A sample code written in a database access 4GL can be seen below:

EXTRACT ALL CUSTOMERS WHERE "PREVIOUS PURCHASES" TOTAL MORE THAN $1000

Fifth Generation (5GL)

Fifth Generation programming languages are based on solving problems using constraints rather than an algorithm written by a programmer. Most logic programming languages are considered to be 5GLs. They are designed to make the computer solve a problem without the programmer, and are used mainly in artificial intelligence research.

Examples: PROLOG, OPS5, Mercury.

A Few Words on Paradigms

What is a paradigm?

Paradigm is a model or a pattern by which a programming language constructs its programs. There are six main programming paradigms: imperative, functional, logic, visual, procedural, and object-oriented. Most of older languages are based on one of the paradigms, modern languages on the other hand are developed to include multiple paradigms.

Compiler

A compiler translates, "compiles", a program written in a high-level programming language such as C, C++, or Java into machine language. This process is needed because attempting to write complex programs in machine language for humans is tedious and can be error prone. So humans write a program in one of the high-level languages and then compiles that language into a language that the computers architecture can read an execute. How does a Compiler Work

To better understand how a compiler works we are going to separate the tasks that it does into pahses and then steps that each phase performs. The three phases and their steps that occur are the front end where the first step consists of lexical analysis, the second step is syntax analysis or parsing of the code and the third step is Type checking. The middle end is where the intermediate code generation takes place. and the back end is where register allocation gets assigned, the machine code generation, and lastly the assembly and linking occurs. Conceptually these steps operate in sequence, each step (except the first) takes the output from the previous step as its input. See here

Pictorial Representation

Pictoral View of Compiler.PNG

Pros and Cons of a Compiler

   Advantages
       A compiler bridges the semantic gap between high-level languages that programs are written in now and the low-level languages that the computer still needs to execute a program.
       Creates a program for the specified architecture that can execute efficiently.
       Since the programmer knows the architecture of the hardware they are able to optimize the code to run best for that specific hardware.
       Take up less memory when the compiled program is executed. 
   Disadvantages
       Although having a specific hardware as an advantage it can also be a disadvantage for software development companies. If they want to reach the largest audience possible then they will need to maintain multiple copies of the same software to allow the program to work on Mac OS, Windows, x32, x64, and the plethora of other OS's in existence that people use.
       The time is take to compile. If a large program that is being developed needs to be troubleshot for glitches then the repeated compiling can add up and delay the development of the project. 

Interpreter

How Does an Interpreter Work

An interpreter reads the source code one instruction at a time, converts the line into machine code and executes it. The machine code is then discarded and the next line is read. Executing each line as it is converted. This is achieved by having the interpreter simulate a virtual machine using the base set of instructions of the programming language as its machine language. To state this more clearly think of an interpreter as a program that implements a library containing the basic set of instructions of the programming language being interpreted in machine language. This is achieved by utilizing the interpreters parsing abilities to translate the grammar that the user types in for the high-level language into a more abstract grammar that the computer can better understand. Pictoral Representation

Pictoral View of Interpreter.JPG

Pros and Cons of an Interpreter

   Advantages
       Interpreters are easier to use for beginner programmers.
       You have the ability to interrupt the the process while it is running and change something in the program, then have the option to continue or start over again.
       Helpful for troubleshooting small portions of code faster than compiling an entire program during development of large software packages.
       Cross-platform, this is assumed that the interpreter exists on the platform you are running it on. 
   Disadvantages
       The program being executed must be interpreted every time you run it making an interpreted language execution 5 to 10 times slower than that of a compiled program. 

Assembler

What is an Assembler

An assembler is a program which takes in low level assembly language code. The assembler then translates that code into its equivalent machine code for that specific processor. An example of this would be if you were to write a line of code to execute a subroutine (i.e. assembly equivalent of a C/C++ function or an OOP method) like this JSR $0F5A. That instruction might get translated to something like this 1001101110. Of course these are made up values but they are just there to give you an idea of how it works. That i

Parsing

Parsing is an essential step for both Interpreters and Compilers. It is how we can pull text from files and make sense of it. This also means that it clearly defines the grammar for programming languages because how else can you parse it if you do not know the grammar for that language. That is the general purpose of most parser's but they also come with other features. A common feature is one where they check for correct syntax.

Grammars

Context Free Grammars or CFG is grammar that is made up of a series of terminals and non terminals. Non terminals are denoted by a string placed in angle brackets like <this>. Since non terminals are not used in terminating a sequence of execution, they can be recursively used if the grammar permits it. Terminals on the other hand are actual symbols and the most common example is the semi colan. They do not expand any further and serve a sole purpose to let you know when to stop. Below is an example of Backus Naur Form or BNF grammars. A production rule is something where you define how the grammar is used. In BNF, a production rule is that you need at least a non terminal on the left hand side but can have an infinite amount of non terminals and terminals on the right hand side. Note: code is a recognized command by this wiki, so cde is used in place of it.

   <cde> ::= <statements>; <cde> <-- production rule in BNF
   <cde> ::= <statement>;
   <statement> ::= <variable> <operator> <value>;
   <variable> ::= <identifier>;
   <op> ::= + | = | - |
   <value> ::= <variable> | <number>
   <identifier> ::= number ...
   <number> ::= ...
   <statement> ::= ;
   <statement> ::= <if – clause>;
   <cde - block> ::= {<cde>}
   <if – statement> ::= if(<expression>) <statement>;
   <if – statement> ::= if(<expression>) <block – statement>; 

This shows you that code can be followed by a single statement or more code. Every statement is terminated with a semi colan, like most programming languages. If you were to have a statement equal a variable. Then you know that variable traces back to a value because a grammar rule is that variables must have an identifier which is also followed by a value of some kind. Due to way BNF is constructed, a programming language written with this grammar is visually independent of what the parsing requires. Meaning that you can indent, add spaces, or anything else to visually change how the code looks without actually affecting how anything works. This makes code easier to read visually.

Paradigms with examples

  • Imperative - language is a statement driven.
    • Perl, MS-DOS batch, csh/bash
  • Functional – language based on functions
    • LISP (times (plus 1 2) 3)
  • Logic – language that uses logic to perform computations
    • Prolog : A cheetah is a mammal, a mammal is an animal, is cheetah an animal?
  • Visual – language that uses graphics to express code/algorithm
    • AppInventor, scratch, snop
  • Procedural – language groups statements of code into a procedures (aka subroutines& functions)
    • C, Pascal, fortune BASIC
  • Object-Oriented – language organizes code into classes to encapsulate information
    • C++, Java, PHP
  • Multi paradigm – language that supports more than one paradigm
    • C++, Java, PHP, Perl, Python, C#, JavaScript

Variables

Variable Types

In computer programming, types or datatypes help classify one of the many different types of values a variable can hold. Some common datatypes include boolean (true or false), string, and integer. In a typed language such as Java, variables must have a type associated with them. The type of the variable is important because it determines a variety of characteristics of that specific variable like:

  • Possible values that can be assigned
  • The actual meaning of the variable
  • The operations one can do with variables of a certain datatype

For example we have some Java code below:

int count = 10;

That one line of code is saying we have a variable called count and it is of type int which in Java and many other languages is short for Integer. The Integer type ONLY allows values that are whole numbers like: -5, 0, 5, 10, 12, 20 etc. If we were writing a Java program and we do something like int count = "10"; and then we compile this up, we're going to get a syntactical error and the compile will fail. It will fail because we specified the type of this variable count as an int.

Why are data types important?

In computers, memory is where data and information about programs is stored. That memory is made up from binary digits known as bits stored as either a 0 or 1. Computers access these bits not as individual bits, but rather as a collection of bits usually consisting of 8, 16, 32 or 64 bits. The collection length on a machine is determined by how many simultaneous operations a CPU can process. A basic rule of thumb is the larger the number of bits, the larger chunk of data can be processed.

Now that we have context of memory, let's move onto how data is stored. Any piece of data, like a variable that has the value of 10, is stored in memory at an actual physical location - a location is known as an address. Obviously it's difficult for us to know exactly where this location is or how to access it. This is why we name this address with a variable.

Variables are a much easier way of reserving a spot in memory to hold our information for our program. This way we don't have to remember random numbers like 234098.

When we specify the datatype of a variable, for instance an int, we are setting aside an actual size of memory for our data. The number of bytes that can reserved for a data type can vary based on which programming language you're using. Like we said above, we have a variable that is of type int. When this line of code gets hit, the computer knows exactly how much memory to reserve because we used the data type int.

Strong vs Weak Typed Languages

There is a significant difference between a strong vs weakly typed language. We have already gone over a lot regarding strongly typed variables like integers and strings. When using a language with strong types, that value is known to have specific characteristics and those characteristics cannot change.

Conversely with weakly typed languages, the type and characteristics are determined on how it is used. For example, if we have an expression like this:

a = 5;
b = "10";
c = a + b;

Depending on the language, the value of c will be 5 + 10 if the code can interpret "10" as the integer value 10.

We could also make the b variable equal to "ten" and the code will convert that string value to whatever ASCII values that represents.

If we did this in a strongly typed language, the compiler would through an error as any string value assigned to an integer typed variable isn't allowed.

There are of course advantages to each.

For strongly typed languages, the programmer is forced to create the behavior of the program explicitly. There is no "hidden" behavior where another programmer could be modifying legacy code and the parameter names are not descriptive enough causing that programmer to have no idea what type of variable their working with.

For weakly typed languages, the advantage is writing less code. Also, it maybe faster because there is no overhead for processing and remembering the unique data types a strongly typed language has to do.

Dynamic vs Static Typed Language

Dynamic typing and static typing cause a good amount of confusion among programmers. Below will define in detail each one and also include an example snippet of code.

Dynamic typing

A dynamic type programming language is one that the type is interpreted at runtime. The pros of this type of language is that one can write less code quicker as they don't have to specify the type of each variable they use. The downside of this is when one has to error check. Because of the type being computed at runtime, there's no type-checking before hand so in order to test, a programmer must run the program and clean-up afterwards. Common languages that are dynamically typed include Python and PHP. Below is some Python code:

firstName = "Joshua"
firstName = 10;

We have defined the variable name firstName as a string value "Joshua" - then right after that we changed the value to an integer 10. In a dynamically typed language, this would run perfectly fine and the value (until you change it again) of firstName would be 10.

Statically typing

In a statically typed language, the type is computed at compile time instead of runtime. If when compiled and there are no errors, then the code is guaranteed to work, syntactically. One of the many pros of a static typed language includes the speed at which a programmer can fix bugs and type mismatches because of this precision. A downside of course is this requires writing more code and making sure before compiling that all variables are typed properly. Popular programming languages that are statically typed include C, C++, Java, and C#.

// Example from above.
firstName = "Joshua"
firstName = 10;
string firstName = "Joshua";
int age = 22;

Above I put the example we had for a dynamically typed language and below that an example showing how it would be written correctly in Java, a statically typed language. This is a statically typed language because all variable names AND their types must be explicitly declared. If we attempted to assign the value of 10 to firstName, we would get an error at compile time telling us that it cannot evaluate an integer to a string.

Data Types

Primitive data types

Primitive data types are mostly found in statically typed languages such as Java. Like we said above, this means that all variable names and type must be declared explicitly in order to pass the type-check at compile time. Below is a table of the more common primitive types in statically typed languages.

Type Description Example
int Whole number integers 10, 45, 109
double Real numbers 10.0, 5.67, -98.5
char Single character 'j', 's', 'a'
Complex data types

When thinking of a common complex data type, the array has to be most popular. A complex datatype is any type that holds multiple independent values. An array is a complex data type because it is one object made of up a number of indepedent values. An example in Java:

int songs[10];

This statement is saying we want to define an array variable named songs and to set aside 10 integer values in memory. It's important to note here that we did NOT initialize any of those 10 integer values inside the array, but rather just allocated the space we want to in memory.

Complex data types can also be types that you define yourself as the programmer. In Object-oriented based languages, we can create a new class which will have properties and functions inside it that actually define what the class is. For example we have this Java code:

// Creating the class Car.
public class Car {
	private int year;
	private String make;
	private String model;

	.
	.
	.

	public setYear(int y) {
		year = y;
	}

	public getYear() {
		return year;
	}

	.
	.
	.
}

// Using the Car class.
Car myCar = new Car();
myCar.setYear(2013);
System.out.println(myCar.getYear());

This code above is creating a new class titled Car which has 3 properties (year, make, model) and what I showed, 2 methods. Of course there would be "setters" and "getters" for the other properties as well. However, we now can use this Class and create a new object, or variable, of type Car. Once the line Car myCar = new Car() compiles, we now have initialized a new object of type Car and can use the public properties/methods inside that class throughout the program.

One can see exactly why the difference between a primitive data type and a complex data type is significantly different.

Variable Scope

The scope of a variable is what defines the availability of it during the executing of the program. Some programmers say that the scope of the variable is actually a property in it of itself. In other words, a variables scope is the area of a program where that specific variable has meaning and thus being visible to the code within that scope.

In most programming languages, there are different levels of scope - Global, Parametric, and Local.

  • Global variables are commonly known as variables with an indefinite scope, or visible to the entire program. Programming languages handle a global variable differently. For instance, many languages like C or C++ there is no actual global identifier, however if there is a variable defined outside a function, then that variable is treated as having "file scope" which means it's visible everywhere in the file. However in PHP, there is an actual global keyword you can place in front of defining a variable, and then you can use that variable anywhere that file is included in. For example:
// config.php
<?php

global $SITE_NAME = "Josh's CD Shop";

?>
// index.php
<?php
require_once('config.php');

echo $SITE_NAME;
?>

In those two snippets of code, we are defining a GLOBAL variable $SITE_NAME in a file named config.php. Then in a completely different file, index.php, we are including that config.php file and using the global variable we defined.

Global variables are often viewed as bad practice because it can create confusing, and more error-prone code. Code in a more larger project can be both read and more importantly maintained when the scope of variables are limited to their specific code block or function. Using global variables can cause headaches because in any part of a program, they can be modified (if their not memory protected which some languages provide), which makes it difficult to remember exactly their intended use. Below we will introduce better ways to manage your variables within a program.

  • Local variables are ones that can be accessed and used within certain functions; it is given local scope to wherever it may be. For example, say we have a block of code like below:
<?php

function test() {
	$a = 5;	
}

echo $a;

?>

When we try to run this PHP code, we will get an error. That is because we are trying to echo a variable that is NOT in scope.

In our function test(), we defined a variable $a = 5. Then directly after the } which closes that scope, we went ahead and echo'd out that same variable $a. This is wrong because the PHP file doesn't have access to LOCAL variables inside functions.

  • Parametric variables can be accessed inside a method or function which accepts it as an argument. This is how you can solve the problem of using global variables in a program - by passing values to functions instead of creating global variables so that function can use them.
<?php
class Item {
	private $price;

	function setPrice($myPrice, $amountOff) {
		if($amountOff != null) {
			// We have a sale.
			$price = ($myPrice - $amountOff);
		} else {
			$price = $myPrice;
		}
	}	

	function getPrice() {
		return $price;
	}
}

.
.
.

// Somewhere else in the program ...
$item = new Item();
$item.setPrice(100, 20);

echo $item.getPrice();

?>

This snippet above demonstrates the power and simplicity of a parametric variable. We have a class which has a private property $price ... the only way to set this value would be to create a custom setter function and pass that function a value we'd like to use. In the function setPrice($myPrice, $amountOff) we specified arguments of $myPrice and $amountOff which we will accept in the function as parametric scoped variables - they can only be accessed within this function.

Using this concept, we can create a program that is harmonious because we always know what the variables are, and their intention. If we put $amountOff in a global variable, we may never know how that variable is even defined, never mind where. Using it this way, we are complete control of where that value is set and who has access to it.

Summary

A quick summary of the chapter should go here

Key Terms

  • Paradigm: Paradigm is a model or a pattern by which a programming language constructs its programs. There are six main programming paradigms: imperative, functional, logic, visual, procedural, and object-oriented. Most of older languages are based on one of the paradigms, modern languages on the other hand are developed to include multiple paradigms.
  • Variable: In computer programming, a variable or scalar is a storage location paired with an associated symbolic name (an identifier), which contains some known or unknown quantity or information referred to as a value. The variable name is the usual way to reference the stored value; this separation of name and content allows the name to be used independently of the exact information it represents.
  • Dynamic language: A term used in computer science to describe a class of high-level programming languages which, at runtime, execute many common programming behaviors that static programming languages perform during compilation. These behaviors could include extension of the program, by adding new code, by extending objects and definitions, or by modifying the type system. These behaviors can be emulated in nearly any language of sufficient complexity, but dynamic languages provide direct tools to make use of them. Many of these features were first implemented as native features in the Lisp programming language.
  • Static Typed Language: Static typed programming languages are those in which variables need not be defined before they’re used. This implies that static typing has to do with the explicit declaration (or initialization) of variables before they’re employed. Java is an example of a static typed language; C and C++ are also static typed languages. Note that in C (and C++ also), variables can be cast into other types, but they don’t get converted; you just read them assuming they are another type.
  • Data type: In computer science and computer programming, a data type or simply type is a classification identifying one of various types of data, such as real, integer or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored.
  • Variable Scope: The scope of a variable is what defines the availability of it during the executing of the program. Some programmers say that the scope of the variable is actually a property in it of itself. In other words, a variables scope is the area of a program where that specific variable has meaning and thus being visible to the code within that scope.

Problem Sets

A list of practice problems