Chapter 1: Introduction
"A journey of a thousand miles begins with a single step."
What You Should Learn In This Chapter:
- What is a computer programming language?
- What is syntax vs. semantics?
- What is the history of programming languages?
- Where did programming languages come from?
- How did programming languages evolve?
- What are the modern refinements?
- What makes a programming language good or bad?
- How are programming languages made?
- What are the benefits of programming languages?
- Why study programming languges?
- Who uses programming languges?
What Is A Programming Language?
A __computer programming language__ is a formal language, similar to natural languages, such as English and French, that people use every day. Computer programming has become a means to communicate. A programming language expresses a set of instructions for a digital computer, and those instructions are used to create a program which implements a specific set of tasks.1 Programming languages are used to create an implement specific algorithms that produce various kinds of output2.
It is considered a formal language as it abides by two sets of rules3:
- Syntax: a precise set of rules that determine the structure of statements, allowed symbols, and the combination of legal expressions.
- Semantics: a precise set of rules that tell you the meanings of the symbols and legal expressions.
There are an estimated 2,000 computer programming languages1 with new languages or adaptions to languages developed throughout each year. TIOBE, a software quality assurance company, developed their TIOBI Index starting in June of 2001, analyzing the popularity of the most widely used programming languages4. The TIOBE Index defines a programming language worthy of analysis with the following criteria:
- The programming language should have its own entry on Wikipedia, and the entry should clearly state that it concerns a programming language.
- The programming language should be Turing complete, able to compute any calculation that a programmable computer can.
- The programming language should have at least 5,000 hits for +"<language> programming" for Google.
The TIOBE Index ratings are calculated by the number of hits on popular search engines around the world. In March 2018, the TIOBE Index determined Java as the most widely-used programming language in the world, followed by C and C++5.
History of Programming Languages and Paradigms
The earliest known programmable machine, preceding the digital computer, was an automatic flute player. It was described by the Musa brothers in Baghdad in the 9th century during the Islamic Golden Age. In the early 1800s, "programs" were used to make machines including Jacquard looms, music boxes, and self-playing pianos make sound.
The earliest computers evolved out of these programmable machines, often programmed without the help of a programming language, by writing programs in machine language. The programs, usually in decimal or binary form, were read into the early computers from punched cards, magnetic tape, or toggled in on switches on the front panel of the computer. Machine languages were later called first-generation programming languages (1GL)6.
Here is the brief details on the evolution of the programming languages:
- 1957 - Fortran (short for "The IBM Mathematical Formula Translating System"). This was a General-purpose, high-level language.This was mainly introduced to solve the Numeric and scientific computing. It is one of the oldest Programming languages still in use to the present day.
- 1958 - Lisp (short for "List Processor") It is a High level language. This mainly was used for the Mathematical notations. Several new computer science topics like tree data structures, automatic storage management, dynamic typing, and self hosting computer they make use of this language concepts in them.
- 1959- Cobol(short for "Common Business-Oriented Language) High-level language. primarily used for the business computing. First programming language to be mandated by the Us Department of the defence.
- 1964- BASIC ( "Beginner's All-Purpose Symbolic Instruction Code") It is a General Purpose, High level language.
- 1970- Pascal (after French mathematician/Physicist Blaise Pascal) Its a high -level language. This was mainly used for the teaching structured programming and data structuring topics. commercial versions of this Pascal were widely used in the 80's
- 1980- Ada(After Ada Lovelace, inventor of the first programming language) It is a high level language. This language is basically derived from Pascal. Constructed by the Us Department of the Defense in 1977 for developing large softwares.
- 1983- C++(formally "C with the classes"; ++ is the increment of the next version of C) This is the intermediate language, Object oriented language. An extension of C, with enhancements such as the classes, virtual functions and the templates.
- 1983- Objective -C (objective oriented extension of "C") It is a general purpose,high-level. This language is expanded on C, adding message-passing functionality based on Smalltalk language.
- 1987- Perl(a language named "PEARL" already existed, so "Pearl" wasn't an option) This language is General-purpose, high-level. This was mainly created for report processing on Unix systems. Today it’s known for high power and versatility.
- 1991- Python (for British comedy troupe Monty Python – tutorials, sample code, and instructions often reference them) Python is General-purpose, high-level. This language was mainly created to support a variety of programming styles and be fun to use.
- 1993- Ruby(the birthstone of one of the creator's collaborator) General-purpose, high-level. This language is teaching language influence by Perl, Ada, Lisp, Smalltalk, etc. Designed for productive and enjoyable programming.
- 1995- Java (for the amount of coffee consumed while developing the language) This language is General-purpose, high-level language. Made for an interactive TV project. Cross-platform functionality. Second most popular language (behind C).
- 1995- PHP ("Personal Home Page") Open-source, general-purpose language. For building dynamic web pages. Most widely used open-source software by enterprises.
Where Did Programming Languages Come From?
The first computer programming language was credited to Ada Lovelace in 1853. Lovelace wrote an algorithm for the analytical engine designed by Charles Babbage, who is known as the father of the computer. Babbage invented the difference engine, which was meant to perform mathematical calculations, and the analytical engine, designed to handle more complex calculations. 27
Luigi Menabrea transcribed Babbage's lecture about his Analytical Engine into French, and Ada Lovelace was later commissioned to translate the publication into English. Intrigued by Babbage's work, she also added extra notes about her thoughts which made her work longer than the original publication. 27
Her notes were labelled from note A to G, and in note G, she describes an algorithm for the Analytical Engine to compute Bernoulli numbers. Ada Lovelace also created the first algorithm to compute letters and symbols along with numbers, and also a method to repeat series of calculations, which is currently known as loop. It is considered the first published algorithm ever specifically tailored for implementation on a computer, and Ada Lovelace has often been cited as the first computer programmer.28
How Did Programming Languages Evolve?
- 1940’s : Machine Language
Known as the computer’s own language. Data is inputted in binary form which makes it very difficult for humans to understand. It is the only language the computer can read and interpret. Newer languages like Java and C are compiled into machine language before the computer can understand it. Advantage: It was efficient.29
Binary Example of Hello World:
01001000 01100101 01101100 01101100 01101111 00100000 01010111 01101111 01110010 01101100 01100100
- 1950’s : Assembly Language
Assembly Language (ASM) is a low-level language which was the first human readable programming language. Assembly languages were converted to machine languages using Assemblers. Assembly languages are specific to a particular computer architecture. As mentioned earlier, assembly languages were more human readable, as names were used instead of binary code.29
count dw 0
MOV AL, 10
- Late 1950s - 1960s: High Level Programming Languages (HLL)
High level languages are unlike low level languages, because they are more portable across machine architecture. They are more human readable than Assembly Language, easier to use, used more natural notations, and they had more efficient time. The first high level languages built for computers was Plankalkül but it was not as widely implemented as FORTRAN (Formula Translator). Others were LISP, ALGOL, COBOL (Common Business Oriented Language) and BASIC (Beginners All Purpose Symbolic Instruction Code).
- FORTRAN was developed in 1957 to perform scientific and statistical computations.
- COBOL was developed in 1959 for data processing in business, finance and administrative systems.
- Basic was developed in 1954 for people without a strong scientific and mathematical background which was required for most computers that existed at that time.29
COBOL Hello World example:
:PROCEDURE DIVISION. ::DisplayPrompt. :::DISPLAY "Hello World". :::STOP RUN.
- Late 1960s - 1970s: System Programming Languages
These are languages which were built for developing system software such as Operating systems, linkers, compilers, assemblers, etc. They are portable, fast and very accessible. An example is C which was developed in 1969 and was used to write most operating systems kernel.29
- 1980s: Object Oriented Languages
OO languages came about when developers saw the need to break down code into smaller bits. Breaking codes into bits helped to control large programs. Simula and Smalltalk introduced such concepts but C++ was the first widespread OO language. C++ was written in 1982 by Bjarne Stroustrup.29
- 1990s: Scripting, Component Based & Web Based Langauges
From the 2000s, the languages created are either more specialized e.g. Actionscript for web pages animation, or they are refinements from previous languages. Also, some languages are created as a mix of different languages component to make programming more easier.29
Increased use of higher-level programming languages has introduced necessities for lower-level programming languages, also known as system programming languages, including assembly languages. The lower-level languages provide abilities between the assembly level and higher levels of programming, and can be used to perform functions which require immediate or direct access to hardware, while still providing higher-level control structures and error-checking2.
From the period during the 1960s to the late 1970s, development brought major language paradigms that are now in use, including:
- Array Programming – which is also known as vector or multi-dimensional programming languages, perform operations on scalars to apply transparently to vectors, matrices, and higher-dimensional arrays6. It was introduced by APL2. Array programming expresses broad concepts about data manipulation. The level of concision can be dramatic in certain cases. For example, it is possible to find array programming language one-liners that require a couple of pages or more of Java code6.
- Functional Programming - which is a declarative programming paradigm that treats the execution of code as if it is the evaluation of mathematical functions, while avoiding changing the program’s state or mutable data. Functional programming is done with expressions or declarations instead of statements. The output value of a function only depends on the arguments that are passed to the function. This programming paradigm eliminates side effects, which makes it easier to understand and predict the behavior of a program 7. Its development was first influenced by APL. Then later, in 1958, Lisp was implemented as the first dynamically typed functional language. In 1978, ML built a polymorphic type system for Lisp, as the first statically typed functional programming language 2.
- Procedural Programming – which is derived from structured programming, is based upon the computer programming concept of the procedure call. Procedures, which are also known as routines, subroutines, or more commonly as functions (similar to those used in functional programming), are simply a series of steps to be computed. A procedure can be called at any point during a program's execution, by other procedures, or recursively by itself. The first major procedural programming languages first appeared in the 1960s 8. Procedural programming was majorly refined by ALGOL 2. Other procedural languages from the 1960s include Fortran, COBOL and BASIC. Pascal and C were published during the 1970s, and Ada was released in 1980. An example of a modern procedural language is Go, which was first published in 2009 8.
- Object-Oriented Programming (OOP) - which is based on the principle of defining “objects”, which are like houses in this analogy, through “classes”, which are like the blueprints for the houses. The objects may contain data in the form of fields, known as attributes, and code in the form of procedures, also known as methods. The objects can communicate with each other using messages in-between their services, which are defined by the classes, and are like Application Programmer Interfaces (APIs) that allow other objects to communicate with them in specific ways. In OOP, computer programs are created by making their objects interact with one another 9. In the 1960s, Simula was the first programming language that supported OOP. Later, in the mid-1970s, Smalltalk was developed as the first purely object-oriented language 2.
- Logic Programming – which is a programming paradigm that is largely based on formal logic. A program that is written in a logic programming language is simply a set of statements in logical form, expressing facts and rules about a problem domain 11. Prolog was designed in 1972 as the first logic language 2. Other major logic programming language families include Answer Set Programming (ASP) and Datalog 11.
Also, the following disciplines were established during the same period:
- Programming Language Specification – ALGOL became a model for how later language specifications were written. 2
- System Programming – the activity of programming a computer’s operating system software. In contrast to application programming, which aims to produce software that provides services to the user directly, such as a word processor or a video game, systems programming aims to produce software and software platforms which provide services to other software that are performance constrained. System programming requires a great deal of awareness of the computer’s hardware, so that it can achieve efficient use of available resources, either because the software itself is performance critical, or because even minor improvements in efficiency directly transform into significant monetary savings for the service provider 10. C is an example of a system programming language, which was developed between 1969 and 1973 for the Unix operating system. It remains popular today 2.
These languages were followed by descendants. Nearly all modern programming languages count at least one of the above languages in their family tree 2.
During the 1960s and 1970s, there was a considerable debate over structured programming, whether programming languages should be designed to support it. In a famous 1968 letter published in the Communications of the ACM, Edsger Dijkstra argued that GOTO statements should be removed from all high level programming languages 2.
What Makes A Good Programming Language?
A programming language can be classified as good based on its mode of development, usability and execution efficiency. A programming language is what you will use to write a computer program. A good programming language can lead you to the correct result quickly, and in a naturally and easily manner. A bad language might add so much complexity that you abandon the attempt and move on to try another approach. Some of the criteria of a good programming language are highlighted below:
- Program Design - deciding how to implement the flow of logic and how the data are to be handled. A good programming language should have a proper structure and well-defined semantics that makes it easy to reason about what your program will do.
- Simplicity - being able to easily explain and communicate the syntax, structure and semantics to others.
- Correctness - making sure that the program is correct and will produce the expected results.
- Expressiveness - less is more meaning shorter code can yield effective performance.
- Readability - programs written in the language are easy to read and comprehend. Spending time trying to understand a language could be discouraging.
- Security - we all want at some point to write secure programs. A good programming language should help enforce security.
- Modularity - a good programming language should allow you separate your program into different modules which could interfere with themselves. Modular programming allows manageability and code re-usability.
- Error Handling - being able to detect at run-time when a data structure is storing information about exceptional conditions.
- Documentation - a good programming language should have documentation that helps users know how to use every function.
How Programming Languages Are Made: (From a Software Perspective)
Just like human languages, each programming language is made up of tokens and grammar. Token deals with the lexical part of a language while Grammars deals with the syntactic part of a language.
Tokens are sets of symbols or strings with meanings and they form together to build up the language. Tokens are usually defined using name, value pairs. A token name is a category of symbols, character or set of characters (words). These symbols or characters make up the token values.
Some common token name and values used are:
|Token Name||Token values|
|Comment token||//comment, /*comment*/|
Generally, values that are assigned to the same token name should exhibit the same behavior. This means it would be awkward having the if keyword as a operator. Operators should be symbols that operate on arguments and produces results, such as "+","-","/","*", etc.
Tokens are covered in more details in Chapter 10 of the textbook.
Grammars are set of rules that define the syntax or arrangement of tokens to provide meaningful phrases of the programming language. In programming languages, grammars could define aspects like "What makes an expression or a statement valid". Grammars are covered in more details in Chapter 9 of the textbook.
In this section, we give a brief summary about how tokens and grammars are generated and compiled together.
Tokens are generated with a tool known as a Lexical Analyzer (Lex). Lex reads input stream and converts it to source code in C programming language (.c file). The extension for lex files is .l. There are several advantages of using lex, come including the fact that it's faster and that it handles error. Lex files are divided into three sections using two percent signs (%%): the definition section for defining macros and importing headers, the rules section for associating regular expression with C statements and the C code section which contains C code. 31 Lex is covered in more details in Chapter 12 of the textbook.
Once we have tokens defined, the grammars are generated using a tool called yacc (Yet Another Compiler-Compiler). The generated .c file from lex is taken and parsed to a parser (phrase analyzer) written in yacc. This produces a tree of nodes and it's what determines if sequence of tokens follow the rules of the grammar and takes predefined actions including produces syntax errors when tokens don't match rules. Yacc is covered in more details in Chapter 12 of the textbook.
The last part involves interpreting or compiling the language and determining runtime configuration.
Benefits of Programming Languages
Why Study Programming Languages?
Studying programming languages will increase your productivity and success at your job. You will learn to comprehend the benefits and hindrance of a language based on the project at hand. While learning programming languages, you will provide the following assets to your team:11
- An Increased capacity to express programming concepts. These concepts are present in the majority of computer programming languages and are fundamental to the programming process.
- Algorithms - a set of instructions designed to perform a specific task. Algorithms are often created as functions that serve as small programs that can be referenced by a larger program12
- Array - a list of related values18.
- Class - a set of instructions to build a specific type of object21.
- Compiler - a software program that translates source code of high-level programming language into a low-level object code (binary code) in machine language14.
- Conditional - an expression that evaluates to either true or false to determine the flow through if and while statements17.
- Datatype - tells what kind of data that value can have15.
- Function - a module of code that performs a specific task, usually taking in data, processing it, and returning a result20.
- Loop - a function that iterates through a statement until the statement becomes false19.
- Source Code - the set of instructions and statements written in a programming language. The source code will contain declarations, instructions, functions, loops and other statements, which act as instructions for the program13.
- Variable - a suggestive name that represents a value to be used independently of the information it represents16.
- An improved background for choosing appropriate languages. Understanding the degree of complexity, flaws, documentation, ease and flexibility of use, and the technical characteristics of different programming languages will allow you as an employee to provide insight for the company you are working for to utilize the correct amount of resources, whether that be time or money1b.
- Increased ability to learn new languages. As you saw above, the increased changes in programming languages and the addition of new types of programming languages calls for the utmost importance on your transitional basis of executing projects in the optimal language.
- Overall advancement of computing 22.
Who Uses Programming Languages?
Manufacturers and engineers have traditionally utilized FORTRAN, or formulated translation. Released to the public in 1957, FORTRAN was a digital code interpreter, designed to approximate human language and could guarantee reasonable compatibility between different computer systems24. In the 1970's the C language portability matched FORTRAN and is one of the most common programming language used in engineering. C is used in mechanical engineering as it is commonly used for data acquisition and real-time robotic control. C is also used in more than 90% of desktop computer programs, from operating systems to word processors25.
Programming languages can also be used in human sciences, specifically in chemistry and biology. A new study published in Wiley VCH showcases molecular informatics. The study showcases the ability to store and process information using molecules. The computer-assisted strategy would be used in the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics 26.
1. Explain the difference between syntax and semantics.
2. Who is credited for the first computer programming language? What year was it created?
3. How does machine language differ from assembly language?
4. What makes a good programming language?
5. What are tokens? How are they defined?
6. What are grammars?
7. Define source code. What does it contain?
8. What is logic programming?
- Algorithms: is an algorithm is a set of instructions designed to perform a specific task. Algorithms are often created as functions that serve as small programs that can be referenced by a larger program.
- Array: is a list of related values.
- Array Programming: which is also known as vector or multi-dimensional programming languages, perform operations on scalars to apply transparently to vectors, matrices, and higher-dimensional arrays
- Class: is a set of instructions to build a specific type of object.
- Conditional: is an expression that evaluates to either true or false to determine the flow through if and while statements.
- Compiler: is a software program that translates source code of high-level programming language into a low-level object code (binary code) in machine language.
- Computer Programming Language: expresses a set of instructions for a digital computer.
- Datatype: tells what kind of data that value can have.
- Formal Language: abides by the rules of syntax and semantics.
- Function: is a module of code that performs a specific task, usually taking in data, processing it, and returning a result.
- Functional Programming: is a declarative programming paradigm that treats the execution of code as if it is the evaluation of mathematical functions, while avoiding changing the program’s state or mutable data.
- Grammars: are set of rules that define the syntax or arrangement of tokens to provide meaningful phrases of the programming language.
- Lexical Analyzer: is also known as, Lex. Lex reads input stream and converts it to source code in C programming language (.c file).
- Logic Programming: is a programming paradigm that is largely based on formal logic. A program that is written in a logic programming language is simply a set of statements in logical form, expressing facts and rules about a problem domain.
- Loop: is a function that iterates through a statement until the statement becomes false.
- Natural Language: is the language used in everyday conversations with humans.
- Object-Oriented Programming (OOP): is based on the principle of defining “objects”, which are like houses in this analogy, through “classes”, which are like the blueprints for the houses.
- Paradigm: the style of building the structure and elements of a computer program.
- Procedural Programming: is derived from structured programming, is based upon the computer programming concept of the procedure call.
- Programming Language Specification: ALGOL became a model for how later language specifications were written.
- Semantics: a precise set of rules that tell you the meanings of the symbols and legal expressions.
- Side Effects: changes in state that do not depend on the function inputs.
- Source code: is the set of instructions and statements written in a programming language. The source code will contain declarations, instructions, functions, loops and other statements, which act as instructions for the program.
- Syntax: a precise set of rules that determine the structure of statements, allowed symbols, and the combination of legal expressions.
- System Programming: is the activity of programming a computer’s operating system software.
- TIOBE Index: ranks the popularity of programming languages.
- Tokens: are sets of symbols or strings with meanings and they form together to build up the language.
- Turing Complete: are able to compute any calculation that a programmable computer can.
- Variable: is a suggestive name that represents a value to be used independently of the information it represents.