4.1. The OSI model
Last updated: 1 February 2013.
This page discusses the OSI model which is an important concept in network architecture. Some may argue that programmers do not need to fully understand network mechanisms, but I believe that certain concepts like the OSI model are a prerequisite for efficient network programming.
Networking is communication between different computers across the internet. Rather than being a single unified network, the internet is an aggregation of numerous subnetworks interconnected by routers. For example, a university or a company generally has its own subnetwork. A subnetwork is also called an intranet or LAN (local area network).
To communicate with each other, computers rely on a standardized layered model called the OSI (Open Systems Interconnection) model. The OSI model is composed of 7 layers and each of them is an operating system program. The following table lists the 7 layers of the OSI model:
|7. Application||Data||Softwares run at this level (browsers, instant messaging softwares, e-mail softwares and so forth).|
|6. Presentation||Data||Encryption and decryption of data take place in this layer which is also known as the syntax layer.|
|5. Session||Data||This layer allows a process to establish, manage and terminate a dialog with a remote process across the network.|
|4. Transport||Segment||This layer provides end-to-end communication. It divides the data into smaller pieces and offers flow control and optionally error recovery.|
|3. Network||Packet||This layer provides routing and logical addressing technologies that allow packets to be transfered across the network.|
|2. Data link||Frame||This layer offers physical addressing, transmission error detection, synchronization and flow control.|
|1. Physical||Bit||This layer transfers binary digits through copper cables, fiber optical cables or radio links.|
Applications run at the layer 7. For example, every time you use an instant messaging software to send a message to another person across the internet, the message is passed from the layer 7 down to the layer 6 on your computer. Before passing the message to the layer 6, the layer 7 adds to that message a header and optionally a trailer. The added header contains a protocol which is a set of rules that are used to control communication between computers. When the layer 6 receives the message, it adds its own header and passes the obtained data to the layer 5.
The process continues that way down to the layer 1 on your computer. Then the layer 1 on your computer conveys the data across the Internet to the layer 1 on the remote computer. When the data is received by the remote computer's layer 1, the layer 1 header is read and if the protocol it contains says that the data is destined for a higher layer (which is the case in this example) , then the header is discarded and the rest of the data (the payload) is passed up to the layer 2 which in turn services the obtained data appropriately before passing it to the layer 3 (after having discarded the header). That process continues up to the layer 7 as shown in the following figure:
At each layer on the receiver, the corresponding header is analyzed and if the protocol it contains says that the payload is destined for the current layer, then the payload is processed (according to the rules specified by the protocol). If the payload is a request that requires a response, then an appropriate response is sent back to the remote computer (the response is wrapped in a header and the obtained data is passed down to the lower layer and so on until the layer 1 which conveys the response to the remote computer's layer 1). Thus, two computers are physically bound to each other through the layer 1 whereas the other layers are logically related to each other.
Corresponding layers on two computers use protocols to communicate with each other and only corresponding layers can communicate with each other. For example, the layer 4 on the sender can only communicate with the layer 4 on the receiver with a protocol which determines what to do with the payload.
A number of standardized protocols are defined by the International Organization for Standardization. For example, several well known protocols are listed in the table below:
|IP||3||Internet Protocol: IP defines addresses and is used for routing packets between two computers via a multitude of routers throughout the Internet.|
|TCP||4||Transmission Control Protocol: TCP provides reliable and ordered transfer of packets across the Internet. It is said to be reliable because if a packet is lost, it is retransmitted.|
|UDP||4||User Datagram Protocol: UDP provides unreliable and possibly unordered transfer of packets across the Internet. Packets may be lost or arrive out of order.|
|FTP||7||File Transfer Protocol: FTP relies on TCP to provide reliable file transfer from one computer to another across the Internet.|
|HTTP||7||Hypertext Transfer Protocol: HTTP relies on TCP and allows communication between a browser and a web server. Typically, the browser sends a HTTP request to the server in order to ask for a given HTML page and the web server sends the requested page to the browser.|
The transport protocols (TCP and UDP at layer 4) are the key to network programming. Generally, before writing a program involving the network, you have to choose a transport protocol: either TCP or UDP. In case you don't care about losing data when sending or receiving packets, you can write a program that relies on the UDP protocol. You may also prefer to rely on UDP for performance reasons because UDP is faster than TCP since TCP aknowledges packets whereas UDP does not. If you can't afford losing packets, writing a program that relies on TCP is the solution.
Packets can be lost in many ways while being transfered across the Internet. For example, when a packet arrives at a router, it is queued in a buffer until the router can forward it to another router or to the final recipient, that is, when the router has finished forwarding the preceding packets in the queue. When there is a lot of traffic, a given router may not forward packets as fast as they arrive and eventually, that router's buffer gets overwhelmed. In that case, the router is likely to discard certain packets. Usually, routers take preventive action by discarding packets beyond a given threshold.
The Java networking API is at the level 4 and higher in the OSI model (Transport, Session, Presentation and Application). Therefore, you can only do network programming in Java at the layer 4 and higher layers of the OSI model. Typically, you write a program that relies either on TCP or UDP and the operating system handles the lower level protocols.
In fact, the layers 5, 6 and 7 of the OSI model (Session, Presentation and Application) are often merged into a single layer called the Application layer as shown below:
|Application||FTP (File Transfer Protocol), HTTP (Hypertext Transfer Protocol),
SMTP (Simple Mail Transfer Protocol), DHCP (Dynamic Host Configuration Protocol),
SNMP (Simple Network Management Protocol), DNS (Domain Name System),
NTP (Network Time Protocol), Telnet, ...
|Transport||TCP (Transmission Control Protocol), UDP (User Datagram Protocol), ...|
|Network||IP (Internet Protocol), ICMP (Internet Control Message Protocol),
OSPF (Open Shortest Path First), ...
|Data Link||PPP (Point-to-point protocol), 802.3 (Ethernet), ...|
|Physical||DSL (Digital subscriber line), ...|
The model shown above is termed the TCP/IP model because TCP (at layer 4) and IP (at layer 3) were the first defined protocols in network standards. Most often, network programming consists in writing code at the application layer of the TCP/IP model and choose a tranport protocol to rely on (either TCP or UDP).