P 소켓 프로그램을 작성해 보겠습니다. 본 포스트는 동영상을 먼저 보신 후, 본문 내용을 보시면 이해하시기 편합니다.

TCP/IP 통신 함수 사용 순서

TCP/IP 예제 소개

TCP/IP 예제를 서버와 클라이언트로 나누어서 설명을 드리도록 하겠습니다.

  1. 서버와 클라이언트는 port 4000번을 사용
  2. 클라이언트프로그램에서 서버에 접속하면 실행할 때 입력받은 문자열을 전송
  3. 서버는 클라이언트로부터 자료를 수신하면 문자열 길이와 함께 수신한 문자열을 클라이언트로 전송

서버 프로그램

서버 프로그램에서 사용해야할 함수와 순서는 아래와 같습니다.

우선 socket 부터 만들어야 합니다. TCP/IP에서는 SOCK_STREAM을 UDP/IP에서는 SOCK_DGRAM을 사용하는 것을 참고하여 주십시오. socket()에 대한 더 자세한 말씀은 "Unix C Reference의 11장 7절 소켓 열고 닫기"를 참고하십시오.

int server_socket;

server_socket = socket( PF_INET, SOCK_STREAM, 0);
if (-1 == server_socket)
{
printf( "server socket 생성 실패");
exit( 1) ;
}

bind() 함수를 이용하여 socket에 server socket 에 필요한 정보를 할당하고 커널에 등록

  1. 만들어진 server_socket 은 단지 socket 디스크립터일 뿐입니다.
  2. 이 socket에 주소를 할당하고 port 번호를 할당해서 커널에 등록해야 합니다.
  3. 커널에 등록해야 다른 시스템과 통신할 수 있는 상태가 됩니다.
  4. 더 정확히 말씀드린다면 커널이 socket 을 이용하여 외부로부터의 자료를 수신할 수 있게 됩니다.
  5. socket에 주소와 port 를 할당하기 위해 sockaddr_in 구조체를 이용합니다.

    struct sockaddr_in server_addr;

    memset( &server_addr, 0, sizeof( server_addr);
    server_addr.sin_family = PF_INET; // IPv4 인터넷 프로토롤
    server_addr.sin_port = htons( 4000); // 사용할 port 번호는 4000
    server_addr.sin_addr.s_addr = htonl( INADDR_ANY); // 32bit IPV4 주소

    if( -1 == bind( server_socket, (struct sockaddr*)&server_addr, sizeof( server_addr) ) )
    {
    printf( "bind() 실행 에러n");
    exit( 1);
    }

  6. htonl( INADDR_ANY) 는 주소를 지정해 주는 것으로 inet_addr( "내 시스템의 IP ")로도 지정할 수 있습니다. 그러나 프로그램이 실행되는 시스템 마다 IP 가 다를 것이므로 주소 지정을 고정 IP로 하지 않고 htonl( INADDR_ANY) 를 사용하는 것이 편리합니다.

이제 listen() 함수로 클라이언트 접속 요청을 확인합니다.

if( -1 == listen( server_socket, 5))
{
printf( "대기상태 모드 설정 실패n");
exit( 1);
}

  1. listen() 함수를 호출하면 클라이언트의 접속 요청이 올 때 까지 대기 상태가 됩니다. 즉, 블록된 모습이 되죠.
  2. 함수가 리턴이 되었을 때에는 클라이언트의 접속이 요청 되었다든지, 아니면 에러가 발생했을 경우입니다.
  3. 에러 없이 함수가 복귀했다면 클라이언트의 접속 요청입니다.
  4. 접속 요청을 허락합니다.

클라이언트 접속 요청에 따라 accept()로 접속을 허락합니다.

  1. accept()로 접속 요청을 허락하게 되면 클라이언트와 통신을 하기 위해서 커널이 자동으로 소켓을 생성합니다.
  2. 이 소켓을 client socket이라고 하겠습니다.
  3. client socket 정보를 구하기 위해 변수를 선언합니다. 그리고 client 주소 크기를 대입합니다.

    int client_addr_size;

    client_addr_size = sizeof( client_addr);

  4. accept()를 호출 후에 에러가 없으면 커널이 생성한 client socket 을 반환해 줍니다.

    client_socket = accept( server_socket, (struct sockaddr*)&client_addr,
    &client_addr_size);

    if ( -1 == client_socket)
    {
    printf( "클라이언트 연결 수락 실패n");
    exit( 1);
    }

이제 client socket까지 만들어 졌으므로 read(), write() 함수를 이용하여 자료를 송수신 할 수 있습니다. read() 함수를 이용하여 클라이언트로부터 전송되어 오는 자료를 읽어 들입니다.

read ( client_socket, buff_rcv, BUFF_SIZE);

  1. read() 를 이용하여 클라이언트로부터 전송된 자료를 읽어 들입니다.
  2. 만일 클라이언트로부터 전송된 자료가 없다면 송신할 때 까지 대기하게 됩니다. 즉, 블록된 모습이 됩니다.

이번에는 wirte() 함수를 이용하여 클라이언트도 데이터를 전송합니다.

  1. 수신된 데이터의 길이를 구하여 전송 데이터를 준비합니다.

    sprintf( buff_snd, "%d : %s", strlen( buff_rcv), buff_rcv);

  2. write() 를 이용하여 클라이언트로 자료를 송신합니다.

    write( client_socket, buff_snd, strlen( buff_snd)+1); // +1: NULL까지 포함해서 전송

작업이 완료되면 close() 를 이용하여 client socket 을 소멸 시켜 데이터 통신을 종료합니다.

close( client_socket);

클라이언트 프로그램

클라이언트 프로그램은 서버에 비해 간단합니다. 바로 설명 들어갑니다.

socket() 을 이용하여 소켓을 먼저 생성합니다.

int client_socket;

client_socket = socket( PF_INET, SOCK_STREAM, 0);
if( -1 == client_socket)
{
printf( "socket 생성 실패n");
exit( 1);
}

connect()를 이용하여 서버로 접속을 시도합니다.

  1. 주소 정보에 서버의 주소와 포트번호를 지정하고
  2. 서버와의 연결을 시도합니다.
  3. 예제에서는 시스템 자기를 가르키는 IP, 127.0.0.1 을 사용했습니다.

    struct sockaddr_in server_addr;

    memset( &server_addr, 0, sizeof( server_addr));
    server_addr.sin_family = AF_INET;
    server_addr.sin_port = htons( 4000);
    server_addr.sin_addr.s_addr= inet_addr( "127.0.0.1"); // 서버의 주소

    if( -1 == connect( client_socket, (struct sockaddr*)&server_addr, sizeof( server_addr) ) )
    {
    printf( "접속 실패n");
    exit( 1);
    }

  1. 접속에 성공하면 데이터를 전송합니다.

    write( client_socket, argv[1], strlen( argv[1])+1); // +1: NULL까지 포함해서 전송

  2. 자료를 수신하고 화면에 출력합니다.

    read ( client_socket, buff, BUFF_SIZE);
    printf( "%sn", buff);

  3. socket 을 소멸하여 통신 작업을 완료합니다.

    close( client_socket);

서버 프로그램 소스

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>

#define  BUFF_SIZE   1024

int   main( void)
{
   int   server_socket;
   int   client_socket;
   int   client_addr_size;

   struct sockaddr_in   server_addr;
   struct sockaddr_in   client_addr;

   char   buff_rcv[BUFF_SIZE+5];
   char   buff_snd[BUFF_SIZE+5];



   server_socket  = socket( PF_INET, SOCK_STREAM, 0);
   if( -1 == server_socket)
   {
      printf( "server socket 생성 실패n");
      exit( 1);
   }

   memset( &server_addr, 0, sizeof( server_addr));
   server_addr.sin_family     = AF_INET;
   server_addr.sin_port       = htons( 4000);
   server_addr.sin_addr.s_addr= htonl( INADDR_ANY);

   if( -1 == bind( server_socket, (struct sockaddr*)&server_addr, sizeof( server_addr) ) )
   {
      printf( "bind() 실행 에러n");
      exit( 1);
   }

   while( 1)
   {
      if( -1 == listen(server_socket, 5))
      {
         printf( "대기상태 모드 설정 실패n");
         exit( 1);
      }

      client_addr_size  = sizeof( client_addr);
      client_socket     = accept( server_socket, (struct sockaddr*)&client_addr, &client_addr_size);

      if ( -1 == client_socket)
      {
         printf( "클라이언트 연결 수락 실패n");
         exit( 1);
      }

      read ( client_socket, buff_rcv, BUFF_SIZE);
      printf( "receive: %sn", buff_rcv);
      
      sprintf( buff_snd, "%d : %s", strlen( buff_rcv), buff_rcv);
      write( client_socket, buff_snd, strlen( buff_snd)+1);          // +1: NULL까지 포함해서 전송
      close( client_socket);
   }
}

클라이언트 프로그램 소스

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>

#include "sample.h"

#define  BUFF_SIZE   1024

int   main( int argc, char **argv)
{
   int   client_socket;

   struct sockaddr_in   server_addr;

   char   buff[BUFF_SIZE+5];

   client_socket  = socket( PF_INET, SOCK_STREAM, 0);
   if( -1 == client_socket)
   {
      printf( "socket 생성 실패n");
      exit( 1);
   }

   memset( &server_addr, 0, sizeof( server_addr));
   server_addr.sin_family     = AF_INET;
   server_addr.sin_port       = htons( 4000);
   server_addr.sin_addr.s_addr= inet_addr( "127.0.0.1");

   if( -1 == connect( client_socket, (struct sockaddr*)&server_addr, sizeof( server_addr) ) )
   {
      printf( "접속 실패n");
      exit( 1);
   }
   write( client_socket, argv[1], strlen( argv[1])+1);      // +1: NULL까지 포함해서 전송
   read ( client_socket, buff, BUFF_SIZE);
   printf( "%sn", buff);
   close( client_socket);
   
   return 0;
}

참조 : http://smeffect.tistory.com/entry/01-네트워크-프로그래밍-TCPIP-소켓-프로그래밍

참조 : http://nenunena.tistory.com/60

타임아웃.. 예외없이 SELECT로 해결하시면 될듯합니다. 다만 CONN

타임아웃.. 예외없이 SELECT로 해결하시면 될듯합니다. 다만 CONNECT시 응답이 늦으면 완전히 블락이 걸리는 특성이 있으니 이전에 넌블락 모드로 설정하신후에 마지막에 이것을 해제해주시면 될듯합니다.
int status;

socket(); // 접속용 소켓 오픈
fcntl(); // 넌블락으로 셋
status = connect();

이부분에서 기종에 따라 리턴값이 달라집니다. 보통은 0보다 작으면서 PROGRESS메시지를 리턴하는데 이때 루프에 들어가 select 해주시면 됩니다.
while()
select(); 이부분은 쓰기검사를 하시고요.
connect();

대충 생각나는데로 해보았는데 저의 경험으로는 단지 리눅스용으로 하실거라면 이정도만 해도 잘 돌아 갑니다. 그런데 다른 기종에 포팅을 생각하시면 에러 리턴을 조금더 세밀하게 맞추셔야 할겁니다. 제가 LINUX/AIX/HPUX/SOLARIS 이렇게 맞추었는데 조금씩 다르더라구요.

시간이 되시면 모자익 소스를 한번 보시는게 어떠실런지요. 이거를 조금 수정하셔서 하시는게 좋을듯 싶습니다. 잠깐만요. 주소를 한번 찾아 보구요. 지금 찾았습니다.

아랫부분을 조금 수정하셔서 사용하시는게 어떠실런지요.

http://archive.ncsa.uiuc.edu/SDG/Software/XMosaic/

PUBLIC int HTDoConnect (char *url, char *protocol, int default_port, int *s)
{
  struct sockaddr_in soc_address;
  struct sockaddr_in *sin = &soc_address;
  int status;

  /* Set up defaults: */
  sin->sin_family = AF_INET;
  sin->sin_port = htons(default_port);
  
  /* Get node name and optional port number: */
  {
    char line[256];
    char *p1 = HTParse(url, "", PARSE_HOST);
    int status;

    sprintf (line, "Looking up %s.", p1);
    HTProgress (line);

    status = HTParseInet(sin, p1);
    if (status) 
      {
        sprintf (line, "Unable to locate remote host %s.", p1);
        HTProgress(line);
        free (p1);
        return HT_NO_DATA;
      }

    sprintf (line, "Making %s connection to %s.", protocol, p1);
    HTProgress (line);
    free (p1);
  }

  /* Now, let's get a socket set up from the server for the data: */      
  *s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

#ifdef SOCKS
  /* SOCKS can't yet deal with non-blocking connect request */
  HTClearActiveIcon();
  status = Rconnect(*s, (struct sockaddr*)&soc_address, sizeof(soc_address));
  if ((status == 0) && (strcmp(protocol, "FTP") == 0))
     SOCKS_ftpsrv.s_addr = soc_address.sin_addr.s_addr;
  {
    int intr;
    intr = HTCheckActiveIcon(1);
    if (intr)
      {
        if (TRACE)
          fprintf (stderr, "*** INTERRUPTED in middle of connect.\n");
        status = HT_INTERRUPTED;
        errno = EINTR;
      }
  }
  return status;
#else /* SOCKS not defined */


  /*
   * Make the socket non-blocking, so the connect can be canceled.
   * This means that when we issue the connect we should NOT
   * have to wait for the accept on the other end.
   */
  {
    int ret;
    int val = 1;
    char line[256];
    
    ret = ioctl(*s, FIONBIO, &val);
    if (ret == -1)
      {
        sprintf (line, "Could not make connection non-blocking.");
        HTProgress(line);
      }
  }
  HTClearActiveIcon();

  /*
   * Issue the connect.  Since the server can't do an instantaneous accept
   * and we are non-blocking, this will almost certainly return a negative
   * status.
   */
  status = connect(*s, (struct sockaddr*)&soc_address, sizeof(soc_address));

  /*
   * According to the Sun man page for connect:
   *     EINPROGRESS         The socket is non-blocking and the  con-
   *                         nection cannot be completed immediately.
   *                         It is possible to select(2) for  comple-
   *                         tion  by  selecting the socket for writ-
   *                         ing.
   * According to the Motorola SVR4 man page for connect:
   *     EAGAIN              The socket is non-blocking and the  con-
   *                         nection cannot be completed immediately.
   *                         It is possible to select for  completion
   *                         by  selecting  the  socket  for writing.
   *                         However, this is only  possible  if  the
   *                         socket  STREAMS  module  is  the topmost
   *                         module on  the  protocol  stack  with  a
   *                         write  service  procedure.  This will be
   *                         the normal case.
   */
#ifdef SVR4
  if ((status < 0) && ((errno == EINPROGRESS)||(errno == EAGAIN)))
#else
  if ((status < 0) && (errno == EINPROGRESS))
#endif /* SVR4 */
    {
      struct timeval timeout;
      int ret;

      ret = 0;
      while (ret <= 0)
	{
          fd_set writefds;
          int intr;
          
          FD_ZERO(&writefds);
          FD_SET(*s, &writefds);

	  /* linux (and some other os's, I think) clear timeout... 
	     let's reset it every time. bjs */
	  timeout.tv_sec = 0;
	  timeout.tv_usec = 100000;

#ifdef __hpux
          ret = select(FD_SETSIZE, NULL, (int *)&writefds, NULL, &timeout);
#else
          ret = select(FD_SETSIZE, NULL, &writefds, NULL, &timeout);
#endif
	  /*
	   * Again according to the Sun and Motorola man pagse for connect:
           *     EALREADY            The socket is non-blocking and a  previ-
           *                         ous  connection attempt has not yet been
           *                         completed.
           * Thus if the errno is NOT EALREADY we have a real error, and
	   * should break out here and return that error.
           * Otherwise if it is EALREADY keep on trying to complete the
	   * connection.
	   */
          if ((ret < 0)&&(errno != EALREADY))
            {
              status = ret;
              break;
            }
          else if (ret > 0)
            {
	      /*
	       * Extra check here for connection success, if we try to connect
	       * again, and get EISCONN, it means we have a successful
	       * connection.
	       */
              status = connect(*s, (struct sockaddr*)&soc_address,
                               sizeof(soc_address));
              if ((status < 0)&&(errno == EISCONN))
                {
                  status = 0;
                }
              break;
            }
	  /*
	   * The select says we aren't ready yet.
	   * Try to connect again to make sure.  If we don't get EALREADY
	   * or EISCONN, something has gone wrong.  Break out and report it.
	   * For some reason SVR4 returns EAGAIN here instead of EALREADY,
	   * even though the man page says it should be EALREADY.
	   */
          else
            {
              status = connect(*s, (struct sockaddr*)&soc_address,
                               sizeof(soc_address));
#ifdef SVR4
              if ((status < 0)&&(errno != EALREADY)&&(errno != EAGAIN)&&
			(errno != EISCONN))
#else
              if ((status < 0)&&(errno != EALREADY)&&(errno != EISCONN))
#endif /* SVR4 */
                {
                  break;
                }
            }
          intr = HTCheckActiveIcon(1);
          if (intr)
            {
              if (TRACE)
                fprintf (stderr, "*** INTERRUPTED in middle of connect.\n");
              status = HT_INTERRUPTED;
              errno = EINTR;
              break;
            }
	}
    }

  /*
   * Make the socket blocking again on good connect
   */
  if (status >= 0)
    {
      int ret;
      int val = 0;
      char line[256];
      
      ret = ioctl(*s, FIONBIO, &val);
      if (ret == -1)
	{
          sprintf (line, "Could not restore socket to blocking.");
          HTProgress(line);
	}
    }
  /*
   * Else the connect attempt failed or was interrupted.
   * so close up the socket.
   */
  else
    {
	close(*s);
    }

  return status;
#endif /* #ifdef SOCKS */
}

alonecrow의 아바타

답변감사합니다... 찾아보니 자료가 몇개 더 있더군요... ^^

찾아보니 자료가 더 있어 소스한개를 더 올립니다.
UNIX Networking programming 에 있는 소스입니다.
두가지 모두 테스트 해보았는데 모두잘되더군요...

int connect_nonb(int sockfd, char *strIP)
{
	int len=0, status, flags;
	struct sockaddr_in address;
	fd_set rset, wset;
	struct timeval tval;

	address.sin_family = AF_INET;
	address.sin_addr.s_addr = inet_addr(strIP);
	address.sin_port = htons(SMTP_PORT);

	len = sizeof(address);

#ifdef TEST
	printf("Connect IP:%s\n", strIP);
#endif

	// Non Block 모드로 만든다.
	flags = fcntl(sockfd, F_GETFL, 0);
	if (fcntl(sockfd, F_SETFL, flags | O_NONBLOCK) < 0) {
#ifdef TEST
		switch (errno) {
			case EACCES: fprintf(stderr,"EACCES\n"); break;
			case EAGAIN: fprintf(stderr,"EAGAIN\n"); break;
			case EBADF:  fprintf(stderr,"EBADF\n"); break;
			case EDEADLK:fprintf(stderr,"EDEADLK\n"); break;
			case EFAULT: fprintf(stderr,"EFAULT\n"); break;
			case EINTR:  fprintf(stderr,"EINTR\n"); break;
			case EINVAL: fprintf(stderr,"EINVAL\n"); break;
			case EMFILE: fprintf(stderr,"EMFILE\n"); break;
			case ENOLCK: fprintf(stderr,"ENOLCK\n"); break;
			case EPERM:  fprintf(stderr,"EPERM\n"); break;
		}
#endif
		return -1;
	}

	status = connect(sockfd, (struct sockaddr *)&address, len);

	if ((status < 0) && (errno == EINPROGRESS))
	{
#ifdef TEST
		fprintf(stderr, "errno: %d\n", errno);

		switch (errno) {
			case EBADF:		fprintf(stderr,"EBADF\n"); break;
			case EFAULT:            fprintf(stderr,"EFAULT\n"); break;
			case ENOTSOCK:          fprintf(stderr,"ENOTSOCK\n"); break;
			case EISCONN:           fprintf(stderr,"EISCONN\n"); break;
			case ECONNREFUSED:      fprintf(stderr,"ECONNREFUSED\n"); break;
			case ETIMEDOUT:         fprintf(stderr,"ETIMEDOUT\n"); break;
			case ENETUNREACH:       fprintf(stderr,"ENETUNREACH\n"); break;
			case EADDRINUSE:        fprintf(stderr,"EADDRINUSE\n"); break;
			case EINPROGRESS:       fprintf(stderr,"EINPROGRESS\n"); break;
			case EALREADY:          fprintf(stderr,"EALREADY\n"); break;
			case EAGAIN:            fprintf(stderr,"EAGAIN\n"); break;
			case EAFNOSUPPORT:      fprintf(stderr,"EAFNOSUPPORT\n"); break;
			case EACCES:            fprintf(stderr,"EACCES\n"); break;
			case EPERM:             fprintf(stderr,"EPERM\n"); break;
			default:		fprintf(stderr,"예외\n"); break;
		}
#endif
		return -1;
	}

	if (status == 0) goto done;

	FD_ZERO (&rset);
	FD_SET (sockfd, &rset);
	wset = rset;

	tval.tv_sec = TIMEOUT;
	tval.tv_usec = 0;

	if ((status = select(sockfd + 1, &rset, &wset, NULL, TIMEOUT ? & tval : NULL)) == 0) {
		close(sockfd);
		errno = ETIMEDOUT;
		return (-1);
	}

	if (FD_ISSET(sockfd, &rset) || FD_ISSET(sockfd, &wset)) {
		len = sizeof(errno);
		if (getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &errno, &len) < 0)
			return (-1);
	} else {
		fprintf(stderr, "SELECT error: sockfd not set");
		return (-1);
	}


done:
	fcntl(sockfd, F_SETFL, flags);

	if (errno) {
		close(sockfd);
		return (-1);
	}

	return (0);
} 

alarm signal을 이용하시면 간단한데...[code:1]

alarm signal을 이용하시면 간단한데...

signal(SIGALRM, socket_timeout);
alarm(1);
state = connect(sockfd, (struct sockaddr *)&address, len); 
if (state < 0) {
	alarm(0);
	return(-1);
}
alarm(0);

....
나중에 socket_timeout 함수만 구현해주면 끝나염..

이렇게 하면 간단하게 구현할 수 있습니다~~~ 참조하세요~

fanuk의 아바타

[quote="은빛연어"]alarm signal을 이용하시면 간단한데..

은빛연어 wrote:
alarm signal을 이용하시면 간단한데...

signal(SIGALRM, socket_timeout);
alarm(1);
state = connect(sockfd, (struct sockaddr *)&address, len); 
if (state < 0) {
	alarm(0);
	return(-1);
}
alarm(0);

....
나중에 socket_timeout 함수만 구현해주면 끝나염..

이렇게 하면 간단하게 구현할 수 있습니다~~~ 참조하세요~

아.. 이거 좋군요..;; recv()에 대해서도 똑같이 적용할 수 있나요? 있을거 같긴 한데.. 흠..

connect() 함수에 타이머 걸기가 어렵게 느껴지는 이유는
connect() 함수가 블럭킹되는 함수이기 때문입니다.
이를 해결하기 위해서...
connect() 함수로 연결을 기다리기 전에
소켓을 non-block 모드로 설정한후에
connect() 호출합니다. 그러면 connect()가 즉시 리턴하거든요, 연결이 되든 안되든... 
그후 select()를 호출하면 됩니다.
non-block 소켓이 연결되거나, 타임아웃이 되면 select() 함수가 리턴합니다.
select() 함수의 리턴값을 보고, 연결되었는지 타임아웃인지를 확인할수 있습니다.

그니깐 순서가..
1. 소켓을 non-block 으로 설정
2. connect() 호출--- 즉시 리턴
3. select()로 타임아웃을 기다림
4. 소켓의 non-block 을 해제

머 이런순으로 되겠죠

책보면 다나오는 것이지만... 혹시나 고수님들의 딴지를 기대하며
제가 작성한 소스를 올려봅니다.

님이 궁금해하시는 소스는 connect_nonb() 부분일겁니다.

 int tcp_connect_timeo(const char *hostname, const char *service,int nsec)
 {
     struct addrinfo hints, *res, *ressave;
     int  sock,n;

     bzero(&hints, sizeof(struct addrinfo));
     hints.ai_family = AF_UNSPEC;
     hints.ai_socktype = SOCK_STREAM;

     if( (n=getaddrinfo(hostname,service,&hints,&res)) != 0)
         return -1;
     ressave = res;
     do
     {   
         struct  sockaddr_in *ts;
         sock = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
         if(sock < 0)
             continue;

         ts = (struct sockaddr_in *) res->ai_addr;

         if(connect_nonb(sock, (struct sockaddr *)res->ai_addr, res->ai_addrlen,nsec) == 0)
             break;
         close(sock);
     }while( (res=res->ai_next) !=NULL);
     if( res == NULL)
         return -1;
     freeaddrinfo(ressave);
     return sock;
 }


int connect_nonb(int sockfd, const struct sockaddr *saptr, int salen, int nsec)
{
    int             flags, n, error;
    socklen_t       len;
    fd_set          rset, wset;
    struct timeval  tval;

    flags = fcntl(sockfd, F_GETFL, 0);
    fcntl(sockfd, F_SETFL, flags | O_NONBLOCK);

    error = 0;
    if ( (n = connect(sockfd, (struct sockaddr *) saptr, salen)) < 0)
        if (errno != EINPROGRESS)
            return(-1);

    /* Do whatever we want while the connect is taking place. */

    if (n == 0)
        goto done;  /* connect completed immediately */

    FD_ZERO(&rset);
    FD_SET(sockfd, &rset);
    wset = rset;
    tval.tv_sec = nsec;
    tval.tv_usec = 0;

    if ( (n = select(sockfd+1, &rset, &wset, NULL,
                     nsec ? &tval : NULL)) == 0) {
        close(sockfd);      /* timeout */
        errno = ETIMEDOUT;
        return(-1);
    }

    if (FD_ISSET(sockfd, &rset) || FD_ISSET(sockfd, &wset)) {
        len = sizeof(error);
        if (getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &error, &len) < 0)
            return(-1);         /* Solaris pending error */
    } else
        err_quit("select error: sockfd not set");

done:
    fcntl(sockfd, F_SETFL, flags);  /* restore file status flags */
    if (error) {
        close(sockfd);      /* just in case */
        errno = error;
        return(-1);
    }
    return(0);
}

[Help save the best Linux news source on the web -- subscribe to Linux Weekly News!]

It's time for web servers to handle ten thousand clients simultaneously, don't you think? After all, the web is a big place now.

And computers are big, too. You can buy a 1000MHz machine with 2 gigabytes of RAM and an 1000Mbit/sec Ethernet card for $1200 or so. Let's see - at 20000 clients, that's 50KHz, 100Kbytes, and 50Kbits/sec per client. It shouldn't take any more horsepower than that to take four kilobytes from the disk and send them to the network once a second for each of twenty thousand clients. (That works out to $0.08 per client, by the way. Those $100/client licensing fees some operating systems charge are starting to look a little heavy!) So hardware is no longer the bottleneck.

In 1999 one of the busiest ftp sites, cdrom.com, actually handled 10000 clients simultaneously through a Gigabit Ethernet pipe. As of 2001, that same speed is now being offered by several ISPs, who expect it to become increasingly popular with large business customers.

And the thin client model of computing appears to be coming back in style -- this time with the server out on the Internet, serving thousands of clients.

With that in mind, here are a few notes on how to configure operating systems and write code to support thousands of clients. The discussion centers around Unix-like operating systems, as that's my personal area of interest, but Windows is also covered a bit.

Contents

Related Sites

See Nick Black's execellent Fast UNIX Servers page for a circa-2009 look at the situation.

In October 2003, Felix von Leitner put together an excellent web page and presentation about network scalability, complete with benchmarks comparing various networking system calls and operating systems. One of his observations is that the 2.6 Linux kernel really does beat the 2.4 kernel, but there are many, many good graphs that will give the OS developers food for thought for some time. (See also the Slashdot comments; it'll be interesting to see whether anyone does followup benchmarks improving on Felix's results.)

Book to Read First

If you haven't read it already, go out and get a copy of Unix Network Programming : Networking Apis: Sockets and Xti (Volume 1) by the late W. Richard Stevens. It describes many of the I/O strategies and pitfalls related to writing high-performance servers. It even talks about the 'thundering herd' problem. And while you're at it, go read Jeff Darcy's notes on high-performance server design.

(Another book which might be more helpful for those who are *using* rather than *writing* a web server is Building Scalable Web Sites by Cal Henderson.)

I/O frameworks

Prepackaged libraries are available that abstract some of the techniques presented below, insulating your code from the operating system and making it more portable.

  • ACE, a heavyweight C++ I/O framework, contains object-oriented implementations of some of these I/O strategies and many other useful things. In particular, his Reactor is an OO way of doing nonblocking I/O, and Proactor is an OO way of doing asynchronous I/O.
  • ASIO is an C++ I/O framework which is becoming part of the Boost library. It's like ACE updated for the STL era.
  • libevent is a lightweight C I/O framework by Niels Provos. It supports kqueue and select, and soon will support poll and epoll. It's level-triggered only, I think, which has both good and bad sides. Niels has a nice graph of time to handle one event as a function of the number of connections. It shows kqueue and sys_epoll as clear winners.
  • My own attempts at lightweight frameworks (sadly, not kept up to date):
    • Poller is a lightweight C++ I/O framework that implements a level-triggered readiness API using whatever underlying readiness API you want (poll, select, /dev/poll, kqueue, or sigio). It's useful for benchmarks that compare the performance of the various APIs. This document links to Poller subclasses below to illustrate how each of the readiness APIs can be used.
    • rn is a lightweight C I/O framework that was my second try after Poller. It's lgpl (so it's easier to use in commercial apps) and C (so it's easier to use in non-C++ apps). It was used in some commercial products.
  • Matt Welsh wrote a paper in April 2000 about how to balance the use of worker thread and event-driven techniques when building scalable servers. The paper describes part of his Sandstorm I/O framework.
  • Cory Nelson's Scale! library - an async socket, file, and pipe I/O library for Windows

I/O Strategies

Designers of networking software have many options. Here are a few:
  • Whether and how to issue multiple I/O calls from a single thread
    • Don't; use blocking/synchronous calls throughout, and possibly use multiple threads or processes to achieve concurrency
    • Use nonblocking calls (e.g. write() on a socket set to O_NONBLOCK) to start I/O, and readiness notification (e.g. poll() or /dev/poll) to know when it's OK to start the next I/O on that channel. Generally only usable with network I/O, not disk I/O.
    • Use asynchronous calls (e.g. aio_write()) to start I/O, and completion notification (e.g. signals or completion ports) to know when the I/O finishes. Good for both network and disk I/O.
  • How to control the code servicing each client
    • one process for each client (classic Unix approach, used since 1980 or so)
    • one OS-level thread handles many clients; each client is controlled by:
      • a user-level thread (e.g. GNU state threads, classic Java with green threads)
      • a state machine (a bit esoteric, but popular in some circles; my favorite)
      • a continuation (a bit esoteric, but popular in some circles)
    • one OS-level thread for each client (e.g. classic Java with native threads)
    • one OS-level thread for each active client (e.g. Tomcat with apache front end; NT completion ports; thread pools)
  • Whether to use standard O/S services, or put some code into the kernel (e.g. in a custom driver, kernel module, or VxD)

The following five combinations seem to be popular:

  1. Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification
  2. Serve many clients with each thread, and use nonblocking I/O and readiness change notification
  3. Serve many clients with each server thread, and use asynchronous I/O
  4. serve one client with each server thread, and use blocking I/O
  5. Build the server code into the kernel

1. Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification

... set nonblocking mode on all network handles, and use select() or poll() to tell which network handle has data waiting. This is the traditional favorite. With this scheme, the kernel tells you whether a file descriptor is ready, whether or not you've done anything with that file descriptor since the last time the kernel told you about it. (The name 'level triggered' comes from computer hardware design; it's the opposite of 'edge triggered'. Jonathon Lemon introduced the terms in his BSDCON 2000 paper on kqueue().)

Note: it's particularly important to remember that readiness notification from the kernel is only a hint; the file descriptor might not be ready anymore when you try to read from it. That's why it's important to use nonblocking mode when using readiness notification.

An important bottleneck in this method is that read() or sendfile() from disk blocks if the page is not in core at the moment; setting nonblocking mode on a disk file handle has no effect. Same thing goes for memory-mapped disk files. The first time a server needs disk I/O, its process blocks, all clients must wait, and that raw nonthreaded performance goes to waste. 
This is what asynchronous I/O is for, but on systems that lack AIO, worker threads or processes that do the disk I/O can also get around this bottleneck. One approach is to use memory-mapped files, and if mincore() indicates I/O is needed, ask a worker to do the I/O, and continue handling network traffic. Jef Poskanzer mentions that Pai, Druschel, and Zwaenepoel's 1999 Flash web server uses this trick; they gave a talk at Usenix '99 on it. It looks like mincore() is available in BSD-derived Unixes like FreeBSDand Solaris, but is not part of the Single Unix Specification. It's available as part of Linux as of kernel 2.3.51, thanks to Chuck Lever.

But in November 2003 on the freebsd-hackers list, Vivek Pei et al reported very good results using system-wide profiling of their Flash web server to attack bottlenecks. One bottleneck they found was mincore (guess that wasn't such a good idea after all) Another was the fact that sendfile blocks on disk access; they improved performance by introducing a modified sendfile() that return something like EWOULDBLOCK when the disk page it's fetching is not yet in core. (Not sure how you tell the user the page is now resident... seems to me what's really needed here is aio_sendfile().) The end result of their optimizations is a SpecWeb99 score of about 800 on a 1GHZ/1GB FreeBSD box, which is better than anything on file at spec.org.

There are several ways for a single thread to tell which of a set of nonblocking sockets are ready for I/O:

  • The traditional select() 
    Unfortunately, select() is limited to FD_SETSIZE handles. This limit is compiled in to the standard library and user programs. (Some versions of the C library let you raise this limit at user app compile time.)

    See Poller_select (cch) for an example of how to use select() interchangeably with other readiness notification schemes.

  • The traditional poll() 
    There is no hardcoded limit to the number of file descriptors poll() can handle, but it does get slow about a few thousand, since most of the file descriptors are idle at any one time, and scanning through thousands of file descriptors takes time.

    Some OS's (e.g. Solaris 8) speed up poll() et al by use of techniques like poll hinting, which was implemented and benchmarked by Niels Provos for Linux in 1999.

    See Poller_poll (cchbenchmarks) for an example of how to use poll() interchangeably with other readiness notification schemes.

  • /dev/poll
    This is the recommended poll replacement for Solaris.

    The idea behind /dev/poll is to take advantage of the fact that often poll() is called many times with the same arguments. With /dev/poll, you get an open handle to /dev/poll, and tell the OS just once what files you're interested in by writing to that handle; from then on, you just read the set of currently ready file descriptors from that handle.

    It appeared quietly in Solaris 7 (see patchid 106541) but its first public appearance was in Solaris 8according to Sun, at 750 clients, this has 10% of the overhead of poll().

    Various implementations of /dev/poll were tried on Linux, but none of them perform as well as epoll, and were never really completed. /dev/poll use on Linux is not recommended.

    See Poller_devpoll (cch benchmarks ) for an example of how to use /dev/poll interchangeably with many other readiness notification schemes. (Caution - the example is for Linux /dev/poll, might not work right on Solaris.)

  • kqueue()
    This is the recommended poll replacement for FreeBSD (and, soon, NetBSD).

    See below. kqueue() can specify either edge triggering or level triggering.

2. Serve many clients with each thread, and use nonblocking I/O and readiness change notification

Readiness change notification (or edge-triggered readiness notification) means you give the kernel a file descriptor, and later, when that descriptor transitions from not ready to ready, the kernel notifies you somehow. It then assumes you know the file descriptor is ready, and will not send any more readiness notifications of that type for that file descriptor until you do something that causes the file descriptor to no longer be ready (e.g. until you receive the EWOULDBLOCK error on a send, recv, or accept call, or a send or recv transfers less than the requested number of bytes).

When you use readiness change notification, you must be prepared for spurious events, since one common implementation is to signal readiness whenever any packets are received, regardless of whether the file descriptor was already ready.

This is the opposite of "level-triggered" readiness notification. It's a bit less forgiving of programming mistakes, since if you miss just one event, the connection that event was for gets stuck forever. Nevertheless, I have found that edge-triggered readiness notification made programming nonblocking clients with OpenSSL easier, so it's worth trying.

[Banga, Mogul, Drusha '99] described this kind of scheme in 1999.

There are several APIs which let the application retrieve 'file descriptor became ready' notifications:

3. Serve many clients with each server thread, and use asynchronous I/O

This has not yet become popular in Unix, probably because few operating systems support asynchronous I/O, also possibly because it (like nonblocking I/O) requires rethinking your application. Under standard Unix, asynchronous I/O is provided by the aio_ interface (scroll down from that link to "Asynchronous input and output"), which associates a signal and value with each I/O operation. Signals and their values are queued and delivered efficiently to the user process. This is from the POSIX 1003.1b realtime extensions, and is also in the Single Unix Specification, version 2.

AIO is normally used with edge-triggered completion notification, i.e. a signal is queued when the operation is complete. (It can also be used with level triggered completion notification by calling aio_suspend(), though I suspect few people do this.)

glibc 2.1 and later provide a generic implementation written for standards compliance rather than performance.

Ben LaHaise's implementation for Linux AIO was merged into the main Linux kernel as of 2.5.32. It doesn't use kernel threads, and has a very efficient underlying api, but (as of 2.6.0-test2) doesn't yet support sockets. (There is also an AIO patch for the 2.4 kernels, but the 2.5/2.6 implementation is somewhat different.) More info:

Suparna also suggests having a look at the the DAFS API's approach to AIO.

Red Hat AS and Suse SLES both provide a high-performance implementation on the 2.4 kernel; it is related to, but not completely identical to, the 2.6 kernel implementation.

In February 2006, a new attempt is being made to provide network AIO; see the note above about Evgeniy Polyakov's kevent-based AIO.

In 1999, SGI implemented high-speed AIO for Linux. As of version 1.1, it's said to work well with both disk I/O and sockets. It seems to use kernel threads. It is still useful for people who can't wait for Ben's AIO to support sockets.

The O'Reilly book POSIX.4: Programming for the Real World is said to include a good introduction to aio.

A tutorial for the earlier, nonstandard, aio implementation on Solaris is online at Sunsite. It's probably worth a look, but keep in mind you'll need to mentally convert "aioread" to "aio_read", etc.

Note that AIO doesn't provide a way to open files without blocking for disk I/O; if you care about the sleep caused by opening a disk file, Linus suggests you should simply do the open() in a different thread rather than wishing for an aio_open() system call.

Under Windows, asynchronous I/O is associated with the terms "Overlapped I/O" and IOCP or "I/O Completion Port". Microsoft's IOCP combines techniques from the prior art like asynchronous I/O (like aio_write) and queued completion notification (like when using the aio_sigevent field with aio_write) with a new idea of holding back some requests to try to keep the number of running threads associated with a single IOCP constant. For more information, see Inside I/O Completion Ports by Mark Russinovich at sysinternals.com, Jeffrey Richter's book "Programming Server-Side Applications for Microsoft Windows 2000" (AmazonMSPress), U.S. patent #06223207, or MSDN.

4. Serve one client with each server thread

... and let read() and write() block. Has the disadvantage of using a whole stack frame for each client, which costs memory. Many OS's also have trouble handling more than a few hundred threads. If each thread gets a 2MB stack (not an uncommon default value), you run out of *virtual memory* at (2^30 / 2^21) = 512 threads on a 32 bit machine with 1GB user-accessible VM (like, say, Linux as normally shipped on x86). You can work around this by giving each thread a smaller stack, but since most thread libraries don't allow growing thread stacks once created, doing this means designing your program to minimize stack use. You can also work around this by moving to a 64 bit processor.

The thread support in Linux, FreeBSD, and Solaris is improving, and 64 bit processors are just around the corner even for mainstream users. Perhaps in the not-too-distant future, those who prefer using one thread per client will be able to use that paradigm even for 10000 clients. Nevertheless, at the current time, if you actually want to support that many clients, you're probably better off using some other paradigm.

For an unabashedly pro-thread viewpoint, see Why Events Are A Bad Idea (for High-concurrency Servers) by von Behren, Condit, and Brewer, UCB, presented at HotOS IX. Anyone from the anti-thread camp care to point out a paper that rebuts this one? :-)

LinuxThreads

LinuxTheads is the name for the standard Linux thread library. It is integrated into glibc since glibc2.0, and is mostly Posix-compliant, but with less than stellar performance and signal support.

NGPT: Next Generation Posix Threads for Linux

NGPT is a project started by IBM to bring good Posix-compliant thread support to Linux. It's at stable version 2.2 now, and works well... but the NGPT team has announced that they are putting the NGPT codebase into support-only mode because they feel it's "the best way to support the community for the long term". The NGPT team will continue working to improve Linux thread support, but now focused on improving NPTL. (Kudos to the NGPT team for their good work and the graceful way they conceded to NPTL.)

NPTL: Native Posix Thread Library for Linux

NPTL is a project by Ulrich Drepper (the benevolent dict^H^H^H^Hmaintainer of glibc) and Ingo Molnar to bring world-class Posix threading support to Linux.

As of 5 October 2003, NPTL is now merged into the glibc cvs tree as an add-on directory (just like linuxthreads), so it will almost certainly be released along with the next release of glibc.

The first major distribution to include an early snapshot of NPTL was Red Hat 9. (This was a bit inconvenient for some users, but somebody had to break the ice...)

NPTL links:

Here's my try at describing the history of NPTL (see also Jerry Cooperstein's article):

In March 2002, Bill Abt of the NGPT team, the glibc maintainer Ulrich Drepper, and others met to figure out what to do about LinuxThreads. One idea that came out of the meeting was to improve mutex performance; Rusty Russell et al subsequently implemented fast userspace mutexes (futexes)), which are now used by both NGPT and NPTL. Most of the attendees figured NGPT should be merged into glibc.

Ulrich Drepper, though, didn't like NGPT, and figured he could do better. (For those who have ever tried to contribute a patch to glibc, this may not come as a big surprise :-) Over the next few months, Ulrich Drepper, Ingo Molnar, and others contributed glibc and kernel changes that make up something called the Native Posix Threads Library (NPTL). NPTL uses all the kernel enhancements designed for NGPT, and takes advantage of a few new ones. Ingo Molnar described the kernel enhancements as follows:

While NPTL uses the three kernel features introduced by NGPT: getpid() returns PID, CLONE_THREAD and futexes; NPTL also uses (and relies on) a much wider set of new kernel features, developed as part of this project.

Some of the items NGPT introduced into the kernel around 2.5.8 got modified, cleaned up and extended, such as thread group handling (CLONE_THREAD). [the CLONE_THREAD changes which impacted NGPT's compatibility got synced with the NGPT folks, to make sure NGPT does not break in any unacceptable way.]

The kernel features developed for and used by NPTL are described in the design whitepaper, http://people.redhat.com/drepper/nptl-design.pdf ...

A short list: TLS support, various clone extensions (CLONE_SETTLS, CLONE_SETTID, CLONE_CLEARTID), POSIX thread-signal handling, sys_exit() extension (release TID futex upon VM-release), the sys_exit_group() system-call, sys_execve() enhancements and support for detached threads.

There was also work put into extending the PID space - eg. procfs crashed due to 64K PID assumptions, max_pid, and pid allocation scalability work. Plus a number of performance-only improvements were done as well.

In essence the new features are a no-compromises approach to 1:1 threading - the kernel now helps in everything where it can improve threading, and we precisely do the minimally necessary set of context switches and kernel calls for every basic threading primitive.

One big difference between the two is that NPTL is a 1:1 threading model, whereas NGPT is an M:N threading model (see below). In spite of this, Ulrich's initial benchmarks seem to show that NPTL is indeed much faster than NGPT. (The NGPT team is looking forward to seeing Ulrich's benchmark code to verify the result.)

FreeBSD threading support

FreeBSD supports both LinuxThreads and a userspace threading library. Also, a M:N implementation called KSE was introduced in FreeBSD 5.0. For one overview, see www.unobvious.com/bsd/freebsd-threads.html.

On 25 Mar 2003, Jeff Roberson posted on freebsd-arch:

... Thanks to the foundation provided by Julian, David Xu, Mini, Dan Eischen, and everyone else who has participated with KSE and libpthread development Mini and I have developed a 1:1 threading implementation. This code works in parallel with KSE and does not break it in any way. It actually helps bring M:N threading closer by testing out shared bits. ...
And in July 2006, Robert Watson proposed that the 1:1 threading implementation become the default in FreeBsd 7.x:
I know this has been discussed in the past, but I figured with 7.x trundling forward, it was time to think about it again. In benchmarks for many common applications and scenarios, libthr demonstrates significantly better performance over libpthread... libthr is also implemented across a larger number of our platforms, and is already libpthread on several. The first recommendation we make to MySQL and other heavy thread users is "Switch to libthr", which is suggestive, also! ... So the strawman proposal is: make libthr the default threading library on 7.x.

NetBSD threading support

According to a note from Noriyuki Soda:
Kernel supported M:N thread library based on the Scheduler Activations model is merged into NetBSD-current on Jan 18 2003.
For details, see An Implementation of Scheduler Activations on the NetBSD Operating System by Nathan J. Williams, Wasabi Systems, Inc., presented at FREENIX '02.

Solaris threading support

The thread support in Solaris is evolving... from Solaris 2 to Solaris 8, the default threading library used an M:N model, but Solaris 9 defaults to 1:1 model thread support. See Sun's multithreaded programming guide and Sun's note about Java and Solaris threading.

Java threading support in JDK 1.3.x and earlier

As is well known, Java up to JDK1.3.x did not support any method of handling network connections other than one thread per client. Volanomark is a good microbenchmark which measures throughput in messsages per second at various numbers of simultaneous connections. As of May 2003, JDK 1.3 implementations from various vendors are in fact able to handle ten thousand simultaneous connections -- albeit with significant performance degradation. See Table 4 for an idea of which JVMs can handle 10000 connections, and how performance suffers as the number of connections increases.

Note: 1:1 threading vs. M:N threading

There is a choice when implementing a threading library: you can either put all the threading support in the kernel (this is called the 1:1 threading model), or you can move a fair bit of it into userspace (this is called the M:N threading model). At one point, M:N was thought to be higher performance, but it's so complex that it's hard to get right, and most people are moving away from it.

5. Build the server code into the kernel

Novell and Microsoft are both said to have done this at various times, at least one NFS implementation does this, khttpd does this for Linux and static web pages, and "TUX" (Threaded linUX webserver) is a blindingly fast and flexible kernel-space HTTP server by Ingo Molnar for Linux. Ingo's September 1, 2000 announcement says an alpha version of TUX can be downloaded from ftp://ftp.redhat.com/pub/redhat/tux, and explains how to join a mailing list for more info. 
The linux-kernel list has been discussing the pros and cons of this approach, and the consensus seems to be instead of moving web servers into the kernel, the kernel should have the smallest possible hooks added to improve web server performance. That way, other kinds of servers can benefit. See e.g. Zach Brown's remarks about userland vs. kernel http servers. It appears that the 2.4 linux kernel provides sufficient power to user programs, as the X15 server runs about as fast as Tux, but doesn't use any kernel modifications.

Comments

Richard Gooch has written a paper discussing I/O options.

In 2001, Tim Brecht and MMichal Ostrowski measured various strategies for simple select-based servers. Their data is worth a look.

In 2003, Tim Brecht posted source code for userver, a small web server put together from several servers written by Abhishek Chandra, David Mosberger, David Pariag, and Michal Ostrowski. It can use select(), poll(), epoll(), or sigio.

Back in March 1999, Dean Gaudet posted:

I keep getting asked "why don't you guys use a select/event based model like Zeus? It's clearly the fastest." ...
His reasons boiled down to "it's really hard, and the payoff isn't clear". Within a few months, though, it became clear that people were willing to work on it.

Mark Russinovich wrote an editorial and an article discussing I/O strategy issues in the 2.2 Linux kernel. Worth reading, even he seems misinformed on some points. In particular, he seems to think that Linux 2.2's asynchronous I/O (see F_SETSIG above) doesn't notify the user process when data is ready, only when new connections arrive. This seems like a bizarre misunderstanding. See also comments on an earlier draftIngo Molnar's rebuttal of 30 April 1999Russinovich's comments of 2 May 1999a rebuttal from Alan Cox, and various posts to linux-kernel. I suspect he was trying to say that Linux doesn't support asynchronous disk I/O, which used to be true, but now that SGI has implemented KAIO, it's not so true anymore.

See these pages at sysinternals.com and MSDN for information on "completion ports", which he said were unique to NT; in a nutshell, win32's "overlapped I/O" turned out to be too low level to be convenient, and a "completion port" is a wrapper that provides a queue of completion events, plus scheduling magic that tries to keep the number of running threads constant by allowing more threads to pick up completion events if other threads that had picked up completion events from this port are sleeping (perhaps doing blocking I/O).

See also OS/400's support for I/O completion ports.

There was an interesting discussion on linux-kernel in September 1999 titled "> 15,000 Simultaneous Connections" (and the second week of the thread). Highlights:

  • Ed Hall posted a few notes on his experiences; he's achieved >1000 connects/second on a UP P2/333 running Solaris. His code used a small pool of threads (1 or 2 per CPU) each managing a large number of clients using "an event-based model".
  • Mike Jagdis posted an analysis of poll/select overhead, and said "The current select/poll implementation can be improved significantly, especially in the blocking case, but the overhead will still increase with the number of descriptors because select/poll does not, and cannot, remember what descriptors are interesting. This would be easy to fix with a new API. Suggestions are welcome..."
  • Mike posted about his work on improving select() and poll().
  • Mike posted a bit about a possible API to replace poll()/select(): "How about a 'device like' API where you write 'pollfd like' structs, the 'device' listens for events and delivers 'pollfd like' structs representing them when you read it? ... "
  • Rogier Wolff suggested using "the API that the digital guys suggested", http://www.cs.rice.edu/~gaurav/papers/usenix99.ps
  • Joerg Pommnitz pointed out that any new API along these lines should be able to wait for not just file descriptor events, but also signals and maybe SYSV-IPC. Our synchronization primitives should certainly be able to do what Win32's WaitForMultipleObjects can, at least.
  • Stephen Tweedie asserted that the combination of F_SETSIG, queued realtime signals, and sigwaitinfo() was a superset of the API proposed in http://www.cs.rice.edu/~gaurav/papers/usenix99.ps. He also mentions that you keep the signal blocked at all times if you're interested in performance; instead of the signal being delivered asynchronously, the process grabs the next one from the queue with sigwaitinfo().
  • Jayson Nordwick compared completion ports with the F_SETSIG synchronous event model, and concluded they're pretty similar.
  • Alan Cox noted that an older rev of SCT's SIGIO patch is included in 2.3.18ac.
  • Jordan Mendelson posted some example code showing how to use F_SETSIG.
  • Stephen C. Tweedie continued the comparison of completion ports and F_SETSIG, and noted: "With a signal dequeuing mechanism, your application is going to get signals destined for various library components if libraries are using the same mechanism," but the library can set up its own signal handler, so this shouldn't affect the program (much).
  • Doug Royer noted that he'd gotten 100,000 connections on Solaris 2.6 while he was working on the Sun calendar server. Others chimed in with estimates of how much RAM that would require on Linux, and what bottlenecks would be hit.

Interesting reading!

Limits on open filehandles

  • Any Unix: the limits set by ulimit or setrlimit.
  • Solaris: see the Solaris FAQ, question 3.46 (or thereabouts; they renumber the questions periodically).
  • FreeBSD:

    Edit /boot/loader.conf, add the line
    set kern.maxfiles=XXXX
    where XXXX is the desired system limit on file descriptors, and reboot. Thanks to an anonymous reader, who wrote in to say he'd achieved far more than 10000 connections on FreeBSD 4.3, and says
    "FWIW: You can't actually tune the maximum number of connections in FreeBSD trivially, via sysctl.... You have to do it in the /boot/loader.conf file. 
    The reason for this is that the zalloci() calls for initializing the sockets and tcpcb structures zones occurs very early in system startup, in order that the zone be both type stable and that it be swappable. 
    You will also need to set the number of mbufs much higher, since you will (on an unmodified kernel) chew up one mbuf per connection for tcptempl structures, which are used to implement keepalive."
    Another reader says
    "As of FreeBSD 4.4, the tcptempl structure is no longer allocated; you no longer have to worry about one mbuf being chewed up per connection."
    See also:
  • OpenBSD: A reader says
    "In OpenBSD, an additional tweak is required to increase the number of open filehandles available per process: the openfiles-cur parameter in /etc/login.conf needs to be increased. You can change kern.maxfiles either with sysctl -w or in sysctl.conf but it has no effect. This matters because as shipped, the login.conf limits are a quite low 64 for nonprivileged processes, 128 for privileged."
  • Linux: See Bodo Bauer's /proc documentation. On 2.4 kernels:
    echo 32768 > /proc/sys/fs/file-max
    
    increases the system limit on open files, and
    ulimit -n 32768
    increases the current process' limit.

    On 2.2.x kernels,

    echo 32768 > /proc/sys/fs/file-max
    echo 65536 > /proc/sys/fs/inode-max
    
    increases the system limit on open files, and
    ulimit -n 32768
    increases the current process' limit.

    I verified that a process on Red Hat 6.0 (2.2.5 or so plus patches) can open at least 31000 file descriptors this way. Another fellow has verified that a process on 2.2.12 can open at least 90000 file descriptors this way (with appropriate limits). The upper bound seems to be available memory. 
    Stephen C. Tweedie posted about how to set ulimit limits globally or per-user at boot time using initscript and pam_limit. 
    In older 2.2 kernels, though, the number of open files per process is still limited to 1024, even with the above changes. 
    See also Oskar's 1998 post, which talks about the per-process and system-wide limits on file descriptors in the 2.0.36 kernel.

Limits on threads

On any architecture, you may need to reduce the amount of stack space allocated for each thread to avoid running out of virtual memory. You can set this at runtime with pthread_attr_init() if you're using pthreads.

  • Solaris: it supports as many threads as will fit in memory, I hear.
  • Linux 2.6 kernels with NPTL: /proc/sys/vm/max_map_count may need to be increased to go above 32000 or so threads. (You'll need to use very small stack threads to get anywhere near that number of threads, though, unless you're on a 64 bit processor.) See the NPTL mailing list, e.g. the thread with subject "Cannot create more than 32K threads?", for more info.
  • Linux 2.4: /proc/sys/kernel/threads-max is the max number of threads; it defaults to 2047 on my Red Hat 8 system. You can set increase this as usual by echoing new values into that file, e.g. "echo 4000 > /proc/sys/kernel/threads-max"
  • Linux 2.2: Even the 2.2.13 kernel limits the number of threads, at least on Intel. I don't know what the limits are on other architectures. Mingo posted a patch for 2.1.131 on Intel that removed this limit. It appears to be integrated into 2.3.20.

    See also Volano's detailed instructions for raising file, thread, and FD_SET limits in the 2.2 kernel. Wow. This document steps you through a lot of stuff that would be hard to figure out yourself, but is somewhat dated.

  • Java: See Volano's detailed benchmark info, plus their info on how to tune various systems to handle lots of threads.

Java issues

Up through JDK 1.3, Java's standard networking libraries mostly offered the one-thread-per-client model. There was a way to do nonblocking reads, but no way to do nonblocking writes.

In May 2001, JDK 1.4 introduced the package java.nio to provide full support for nonblocking I/O (and some other goodies). See the release notes for some caveats. Try it out and give Sun feedback!

HP's java also includes a Thread Polling API.

In 2000, Matt Welsh implemented nonblocking sockets for Java; his performance benchmarks show that they have advantages over blocking sockets in servers handling many (up to 10000) connections. His class library is called java-nbio; it's part of theSandstorm project. Benchmarks showing performance with 10000 connections are available.

See also Dean Gaudet's essay on the subject of Java, network I/O, and threads, and the paper by Matt Welsh on events vs. worker threads.

Before NIO, there were several proposals for improving Java's networking APIs:

  • Matt Welsh's Jaguar system proposes preserialized objects, new Java bytecodes, and memory management changes to allow the use of asynchronous I/O with Java.
  • Interfacing Java to the Virtual Interface Architecture, by C-C. Chang and T. von Eicken, proposes memory management changes to allow the use of asynchronous I/O with Java.
  • JSR-51 was the Sun project that came up with the java.nio package. Matt Welsh participated (who says Sun doesn't listen?).

Other tips

  • Zero-Copy
    Normally, data gets copied many times on its way from here to there. Any scheme that eliminates these copies to the bare physical minimum is called "zero-copy".
    • Thomas Ogrisegg's zero-copy send patch for mmaped files under Linux 2.4.17-2.4.20. Claims it's faster than sendfile().
    • IO-Lite is a proposal for a set of I/O primitives that gets rid of the need for many copies.
    • Alan Cox noted that zero-copy is sometimes not worth the trouble back in 1999. (He did like sendfile(), though.)
    • Ingo implemented a form of zero-copy TCP in the 2.4 kernel for TUX 1.0 in July 2000, and says he'll make it available to userspace soon.
    • Drew Gallatin and Robert Picco have added some zero-copy features to FreeBSD; the idea seems to be that if you call write() or read() on a socket, the pointer is page-aligned, and the amount of data transferred is at least a page, *and* you don't immediately reuse the buffer, memory management tricks will be used to avoid copies. But see followups to this message on linux-kernel for people's misgivings about the speed of those memory management tricks.

      According to a note from Noriyuki Soda:

      Sending side zero-copy is supported since NetBSD-1.6 release by specifying "SOSEND_LOAN" kernel option. This option is now default on NetBSD-current (you can disable this feature by specifying "SOSEND_NO_LOAN" in the kernel option on NetBSD_current). With this feature, zero-copy is automatically enabled, if data more than 4096 bytes are specified as data to be sent.
    • The sendfile() system call can implement zero-copy networking.
      The sendfile() function in Linux and FreeBSD lets you tell the kernel to send part or all of a file. This lets the OS do it as efficiently as possible. It can be used equally well in servers using threads or servers using nonblocking I/O. (In Linux, it's poorly documented at the moment; use _syscall4 to call it. Andi Kleen is writing new man pages that cover this. See also Exploring The sendfile System Call by Jeff Tranter in Linux Gazette issue 91.) Rumor has it, ftp.cdrom.com benefitted noticeably from sendfile().

      A zero-copy implementation of sendfile() is on its way for the 2.4 kernel. See LWN Jan 25 2001.

      One developer using sendfile() with Freebsd reports that using POLLWRBAND instead of POLLOUT makes a big difference.

      Solaris 8 (as of the July 2001 update) has a new system call 'sendfilev'. A copy of the man page is here.. The Solaris 8 7/01 release notes also mention it. I suspect that this will be most useful when sending to a socket in blocking mode; it'd be a bit of a pain to use with a nonblocking socket.

  • Avoid small frames by using writev (or TCP_CORK)
    A new socket option under Linux, TCP_CORK, tells the kernel to avoid sending partial frames, which helps a bit e.g. when there are lots of little write() calls you can't bundle together for some reason. Unsetting the option flushes the buffer. Better to use writev(), though...

    See LWN Jan 25 2001 for a summary of some very interesting discussions on linux-kernel about TCP_CORK and a possible alternative MSG_MORE.

  • Behave sensibly on overload.
    [Provos, Lever, and Tweedie 2000] notes that dropping incoming connections when the server is overloaded improved the shape of the performance curve, and reduced the overall error rate. They used a smoothed version of "number of clients with I/O ready" as a measure of overload. This technique should be easily applicable to servers written with select, poll, or any system call that returns a count of readiness events per call (e.g. /dev/poll or sigtimedwait4()).
  • Some programs can benefit from using non-Posix threads.
    Not all threads are created equal. The clone() function in Linux (and its friends in other operating systems) lets you create a thread that has its own current working directory, for instance, which can be very helpful when implementing an ftp server. See Hoser FTPd for an example of the use of native threads rather than pthreads.
  • Caching your own data can sometimes be a win.
    "Re: fix for hybrid server problems" by Vivek Sadananda Pai (vivek@cs.rice.edu) on new-httpd, May 9th, states:

    "I've compared the raw performance of a select-based server with a multiple-process server on both FreeBSD and Solaris/x86. On microbenchmarks, there's only a marginal difference in performance stemming from the software architecture. The big performance win for select-based servers stems from doing application-level caching. While multiple-process servers can do it at a higher cost, it's harder to get the same benefits on real workloads (vs microbenchmarks). I'll be presenting those measurements as part of a paper that'll appear at the next Usenix conference. If you've got postscript, the paper is available at http://www.cs.rice.edu/~vivek/flash99/"

Other limits

  • Old system libraries might use 16 bit variables to hold file handles, which causes trouble above 32767 handles. glibc2.1 should be ok.
  • Many systems use 16 bit variables to hold process or thread id's. It would be interesting to port the Volano scalability benchmark to C, and see what the upper limit on number of threads is for the various operating systems.
  • Too much thread-local memory is preallocated by some operating systems; if each thread gets 1MB, and total VM space is 2GB, that creates an upper limit of 2000 threads.
  • Look at the performance comparison graph at the bottom of http://www.acme.com/software/thttpd/benchmarks.html. Notice how various servers have trouble above 128 connections, even on Solaris 2.6? Anyone who figures out why, let me know. 
    Note: if the TCP stack has a bug that causes a short (200ms) delay at SYN or FIN time, as Linux 2.2.0-2.2.6 had, and the OS or http daemon has a hard limit on the number of connections open, you would expect exactly this behavior. There may be other causes.

Kernel Issues

For Linux, it looks like kernel bottlenecks are being fixed constantly. See Linux Weekly NewsKernel Trafficthe Linux-Kernel mailing list, and my Mindcraft Redux page.

In March 1999, Microsoft sponsored a benchmark comparing NT to Linux at serving large numbers of http and smb clients, in which they failed to see good results from Linux. See also my article on Mindcraft's April 1999 Benchmarks for more info.

See also The Linux Scalability Project. They're doing interesting work, including Niels Provos' hinting poll patch, and some work on the thundering herd problem.

See also Mike Jagdis' work on improving select() and poll(); here's Mike's post about it.

Mohit Aron (aron@cs.rice.edu) writes that rate-based clocking in TCP can improve HTTP response time over 'slow' connections by 80%.

Measuring Server Performance

Two tests in particular are simple, interesting, and hard:

  1. raw connections per second (how many 512 byte files per second can you serve?)
  2. total transfer rate on large files with many slow clients (how many 28.8k modem clients can simultaneously download from your server before performance goes to pot?)

Jef Poskanzer has published benchmarks comparing many web servers. See http://www.acme.com/software/thttpd/benchmarks.html for his results.

I also have a few old notes about comparing thttpd to Apache that may be of interest to beginners.

Chuck Lever keeps reminding us about Banga and Druschel's paper on web server benchmarking. It's worth a read.

IBM has an excellent paper titled Java server benchmarks [Baylor et al, 2000]. It's worth a read.

Examples

Nginx is a web server that uses whatever high-efficiency network event mechanism is available on the target OS. It's getting popular; there are even two books about it.

Interesting select()-based servers

Interesting /dev/poll-based servers

  • N. Provos, C. Lever"Scalable Network I/O in Linux," May, 2000. [FREENIX track, Proc. USENIX 2000, San Diego, California (June, 2000).] Describes a version of thttpd modified to support /dev/poll. Performance is compared with phhttpd.

Interesting kqueue()-based servers

Interesting realtime signal-based servers

  • Chromium's X15. This uses the 2.4 kernel's SIGIO feature together with sendfile() and TCP_CORK, and reportedly achieves higher speed than even TUX. The source is available under a community source (not open source) license. See the original announcement by Fabio Riccardi.
  • Zach Brown's phhttpd - "a quick web server that was written to showcase the sigio/siginfo event model. consider this code highly experimental and yourself highly mental if you try and use it in a production environment." Uses the siginfo features of 2.3.21 or later, and includes the needed patches for earlier kernels. Rumored to be even faster than khttpd. See his post of 31 May 1999 for some notes.

Interesting thread-based servers

Interesting in-kernel servers

Other interesting links

Translations

Belorussian translation provided by Patric Conrad at Ucallweconn



좀 더 명확히 하자면 production level code 101 이라고 해야겠지만.[1]

  • TCP가 믿을 수 있다고(reliable)하지만 이건 “연결이 살아있으면 언젠가는 전송될 수도 있다” 란 의미다.[2] 허술한 추상화에 속지 말 것
  • TCP 소켓에 send를 했을 때, 해당 함수 호출로만 전부다 전송될 거라고 절대로 믿지 말아라 — 반환값을 확인하고 적절한 처리를 해줘라[3]
  • TCP 소켓에서 recv를 했을 때는 반드시 버퍼링을 해야한다 — 절대로 특정 길이만큼은 올거라고 가정하지 말아라. 특정 길이보다 길게는 안 올거라고 믿어서도 안된다

얼마 전에 세번째 항목이 고려되지 않은 코드를 봐야했는데(…), 참 보고있기 괴로웠다.

recv를 하면 크게 세 가지 경우가 나온다

  • (즐겁게도) 정확히 응용 프로그램에서 사용하는 단위로 메시지가 왔을 때 — 단순히 상위단에서 처리하게 해주면 끝
  • (불행히도) 응용 프로그램에서 필요한 길이보다 짧게 왔을 때 — 적당히 보관(…)해주고 다음 TCP 세그먼트[4] 가 오는 것을 기다려야한다
  • (역시 불행스럽게도) 응용 프로그램에서 사용하는 것보다 긴 길이가 왔다면, 메시지를 하나 처리하고 다음 메시지 처리를 이 세 가지 경우에 맞춰서 다시 처리해야한다

길이의 가정을 하지 말아야하는 이유가 여기서도 마지막 항목 때문. 응용 프로그램에서 사용하는 메시지 길이가 짧다고 — 특히나 path MTU보다 작다고 — 그 메시지가 한꺼번에 올 수 있는 것은 아니다. Path MTU가 1000 bytes 메시지가 300 bytes 고라고 해도 메시지 4개가 사이좋게 뭉쳐서 오면(…), 4번째 메시지는 100 bytes만 버퍼에 남게 된다.

결국 최소한으로 잡아도 이런 메시지 처리 코드가 나오게 된다.

  1. recv() 로 TCP 소켓에서 데이터를 받는다
  2. 데이터 길이가 응용 프로그램 메시지 길이와 비교.
  3. 필요한 것보다 짧으면 남은 데이터를 적당한 버퍼에 보관하고 다시 1로
  4. ( 응용 프로그램에서 메시지 처리)
  5. 4에서 처리 후 남은 부분의 길이를 받아온다.
  6. 다시 2로 (1이 아님에 주의)

이렇게 하면 위에서 설명한 세 가지 경우가 대부분[5] 처리 된다.

결론적으로 말하면,

TCP가 보장하는 부분은 매우 적다. TCP 세그먼트를 받아서 필요한 크기로 재조립할 생각을 하자

ps. 101 이긴 하지만 좀 추상적인 내용이다. 그래도 이게 네트웍 프로그래밍으로 toy program이 아닌 물건을 만드려면 필요한 “최소한”의 지식이라고 생각해서 끄적여본다.

  1. 여기에서 101은 어떤 주제에 대한 기초적이거나 개괄적으로 설명하는 것을 의미한다. 그래서 흔히 신입생을 위한 강좌를 과목명 101 처럼 쓰기도 한다 — from wikipdia 101 []
  2. 물론 전송 순서, 약한 수준의(하위 레이어까지 합쳐지면 충분히 안정적이지만) 데이터 일관성 보장, 그리고 중복 전송이 없기는 하다 []
  3. 물론 특정 OS에서 성공 or 실패만 있는 형태로 포장해서 제공해주기도 한다. eg. Win32의 IOCP의 Send 라거나… []
  4. 혼란스럽게도 네트웍의 각 레이어에서 사용하는 데이터 전송단위에 대한 용어는 모두 다르다. 데이터링크 레이어에선 흔히 “frame” 이라고 부르고, IP에선 “packet”, UDP에선 “datagram” 이라고 부른다. 물론 몽땅 뭉뚱그려 부르는 PDU;Payload Data Unit 같은 용어도 있긴하지만… []
  5. 전부가 아닌 이유는 recv() 자체가 system-call에 의해 중단되거나, 연결 자체가 끊기거나하는 것 정도는 일단 시작이고, 생각할 수 없던 별별 일이 다 생기는게 네트웍이라 예외 경우를 다 쓰게되면 101이 아닐 것이다(…). []

블락이 걸리는 함수에 대해서만 non-block을 해줬다가 다시 걸고,
멀티플렉스로 연결을 받아 접속을 처리하고 있습니다.

select가 파일디스크립터를 모두 검사하는 것이 마음에 들지 않아서
검사할 파일디스크립터를 리스트로 구현하려고 했지만,
귀찮은 마음에 그냥 돌립니다... 몇 개 안 될테니까요 ㅡㅡㅋ

앞으로 이것을 한 쓰레드로 하고,
DB에 저장하는 쓰레드로 전달하는 것을 구현해야겠네요.

  serv_sock = socket(PF_INET, SOCK_STREAM, 0);
  if(serv_sock == -1)
    error_handling("socket() error!");
 
 
 
  memset((void*)&serv_addr, 0x00, sizeof(serv_addr));
  serv_addr.sin_family = AF_INET;
  serv_addr.sin_addr.s_addr = htons(INADDR_ANY);
  serv_addr.sin_port = htons(atoi(argv[1]));
 
 
  if( bind(serv_sock, (struct sockaddr*)&serv_addr, sizeof(serv_addr)) == -1)
    error_handling("bind() error!");
 
 
  // listen() non-block
  if( block_switch(serv_sock, OFF) == -1)
    error_handling("fcntl error!()");
 
#ifdef __DEBUG__
  printf("block_switch mode\n");
#endif
 
  if( listen(serv_sock, 5) == -1)
    error_handling("listen() error!");
 
#ifdef __DEBUG__
  printf("listeing... \n");
#endif
 
  // listen() block
  if( block_switch(serv_sock, ON) == -1)
    error_handling("fcntl error!()");
 
#ifdef __DEBUG__
  printf("block mode\n");
#endif
 
 
 
#ifdef __DEBUG__
  printf("listen() success!! sock number: %d\n", serv_sock);
#endif
 
 
  FD_ZERO(&readfds);
  FD_SET(serv_sock, &readfds);
 
  fd_max = serv_sock;
 
 
  for( ;; )   // 무한루프
  {
    int fd, str_len;
    int clnt_sock, clnt_len;
    struct sockaddr_in clnt_addr;
    int tempflag;
 
 
    tempfds = readfds;
 
 
    result = select( fd_max+1, &tempfds, NULL, NULL, NULL );
    if(result < 0 && errno == EINTR)
      continue;
 
#ifdef __DEBUG__
    printf("Now select() running...\n");
#endif
 
#ifdef __DEBUG__
    fputc('x', stderr);
#endif
 
 
    for( fd = 0; fd < fd_max+1; fd++ )
    {
      if(FD_ISSET(fd, &tempfds))
      {
        if(fd == serv_sock) // 서버소켓에 왔으면 접속요청
        {
          clnt_len = sizeof(clnt_addr);
 
          // accept() non-block
          if( block_switch(serv_sock, OFF) == -1)
            error_handling("fcntl error!()");
 
          clnt_sock = accept(serv_sock,
            (struct sockaddr*)&clnt_addr, &clnt_len);
          if(clnt_sock < 0 &&
            (errno == EAGAIN || errno == EWOULDBLOCK))
          {
            fprintf(stderr, "accept() Failed!\n");
            continue;
          }
 
          // accept() block
          if( block_switch(serv_sock, ON) == -1)
            error_handling("fcntl error!()");
 
 
          FD_SET(clnt_sock, &readfds);
          if(clnt_sock > fd_max)
            fd_max = clnt_sock;
 
#ifdef __DEBUG__
          printf("client connect : fd %d\n", clnt_sock);
#endif
        }
 
 
 
 
        else  // fd == serv_sock이 아니면 clnt_sock으로 오는 요청
        {
          memset((void*)BUF, 0x00, BUFSIZE);
 
 
          // recv() non-block
          if( block_switch(serv_sock, OFF) == -1)
            error_handling("fcntl error!()");
 
          str_len = recv(clnt_sock, BUF, BUFSIZE, 0);
          if(str_len < 0 && ( errno == EINTR || errno == EAGAIN ||
            errno == EWOULDBLOCK) )
          {
            fprintf(stderr, "accept() Failed!\n");
            continue;
          }
 
        // recv() block
          if( block_switch(serv_sock, ON) == -1)
            error_handling("fcntl error!()");
 
 
          if( str_len == 0 )    // 0면 연결종료 요청
          {
            FD_CLR(fd, &readfds);
            close(fd);
            printf("Connect close : fd %d\n", fd);
          }
 
          else          // 데이터를 보내왔을 때..
          {
            printf("\nThis is BUF-----------------\n");
            printf("%s\n", BUF);
            printf("------------------------------\n\n");
          }
        }   // clnt_sock으로 어떠한 요청이 왔을 때의 처리들 끝
      }     // IS_FDSET 끝
    }   // fd 한 바퀴 돌면서 어떠한 요청이 있었는지 검사의 끝
 
 
///////////////////////////////////////////////////////////////
  }   // for( ;; ) end
 
 
 
  close(serv_sock); // serv_sock 종료. 이 때도 error 처리가 필요함
 
  return 0;
 
}
 
 
 
void error_handling(char *message)
{
  fputs(message, stderr);
  fputc('\n', stderr);
  exit(1);
}
 
 
int block_switch(int fd, int block_switch)
{
  int flags;
 
  flags = fcntl( fd, F_GETFL, 0);
 
  if( block_switch == OFF )
    return fcntl( fd, F_SETFL, flags | O_NONBLOCK);
  else
    return fcntl( fd, F_SETFL, flags & (~O_NONBLOCK));
}

'Network > socket(c&c++)' 카테고리의 다른 글

클라이언트 소켓에서 Connection Time out  (0) 2012.10.19
socket connect 함수의 처리시간의 조정 방법  (0) 2012.10.19
The C10K problem  (0) 2012.10.19
TCP 기반의 소켓 통신 101  (0) 2012.10.19
넌블럭킹 소켓  (0) 2012.10.19

넌블럭킹 소켓


socket() 으로 생성되는 소켓은 기본값으로 Blocking 소켓이다. 하지만 이미 생성된 소켓을 fcntl() 함수를 사용하여 nonblocking socket으로 변경 가능하다.


※ Blocking Socket(B)/Nonblocking Socket(N)
  (여기서 errno는 errno.h를 인클루드해야 이용할수 있다.)

- read

  • B : read 버퍼가 비어있을때 block
  • N : read 버퍼가 비어있을때 -1 return, errno==EWOULDBLOCK/EAGAIN

* Blocking socket의 경우에 read 버퍼에 존재하는 데이터의 크기가 read시 요청한 데이터의 크기보다 작은 경우라도 read 버퍼에 존재하는 데이터만큼 리턴되며 block 되지 않음.

- write

  • B : write 버퍼가 꽉 차있을때 block
  • N : write 버퍼가 꽉 차있을때 -1 return, errno==EWOULDBLOCK/EAGAIN

- accept

  • B : backlog( 현재의 connection 요청 큐 )가 비어있을때 block
  • N : backlog( 현재의 connection 요청 큐 )가 비어있을때 -1 return, errno==EWOULDBLOCK/EAGAIN

- connect

  • B : connection이 완전히 이루어질때까지 block
  • N : connection이 완전히 이루어지 않더라도 곧바로 return. 나중에 getsockopt로 connection이 완전히 이루어졌는지 확인가능.


※ Nonblocking 소켓의 장점/단점

  • 장점 : 멀티스레드를 사용하지 않고도 다른 작업을 할 수 있다.
  • 단점 : 프로그램이 복잡해지며, CPU 사용량이 증가한다.


※ Nonblocking 소켓으로 만드는 방법 : fcntl()함수를 이용한다.

int flag;
flag = fcntl( sock_fd, F_GETFL, 0 );
fcntl( sock_fd, F_SETFL, flag | O_NONBLOCK );




파일 입력과 출력


이 절은 파일 기술자상의 기본 입력과 출력 명령을 수행하기 위한 함수들을

설명하고 있다:

read, write, 그리고 lseek. 이들 함수들은 헤더파일 'unistd. h'에 선언되어 있다.

    read함수는 기술자 filedes의 파일로부터 size 바이트를 읽고, 그 결과를 버퍼에 저장한다. (이것은 문자 스트링이 필요하지 않고 그곳에는 부가된 널 종료문자가 없다)

   

ssize_t read (int filedes, void *buffer, size_t size)


  반환 값은 실제로 읽은 바이트의 수이다.

    이것은 size보다 적을수도 있다;

    예를 들어, 만일 파일에 남겨진 바이트의 수가 적거나 즉시 유용한 바이트의

수가 적은 경우 등이 있다.

    정확한 동작은 파일의 종류가 무엇인지에 따라 의존한다.

    size 바이트보다 덜 읽는 것은 에러가 아님을 기억하라.

    0의 값은 파일의 끝을 지적한다. ( 만일 size 인수의 값이 0인 경우를 제외하고. . ) 이것은 에러로 간주하지 않는다.

    만일 당신이 파일의 끝인 상태에서 read를 호출하면, 그것은 0을 반환하는 것 외에 아무 일도 하지 않는다.

     

    만일 read가 적어도 한 문자를 반환한다면, 당신이 파일의 끝에 도달했는지를 알 수 있는 아무런 방법이 없다.

    그러나 만일 당신이 끝에 도달해 있었다면 다음 read의 호출은 0을 반환해서 파일의 끝임을 지적해줄 것이다.

     

    에러가 발생한 경우에, read는 -1을 반환한다.

   

    다음의 errno는 이 함수에서 정의된 에러의 상황이다.

   

    EAGAIN  일반적으로, 즉시 유용한 입력이 없을 때, read는 입력을 기다린다.

             

    그러나 만일 그 파일에서 O_NONBLOCK가 설정되면 read는 아무런 데이터도 기다리지 않고 즉시 반환하고, 이 에러를 보고한다.

   

    호환성 노트 : BSD Unix의 대부분의 버전은 이것을 위한 다른 에러코드를 사용한다:

   

    EWOULDBLOCK. GNU 라이브러리에서는, EWOULDBLOCK은 EAGAIN의 다른 이름이다. 그래서 당신이 어떤 이름을 사용해도 문제가 발생되지 않는다.

                    어떤 시스템들은, 특별한 문자 파일로부터 데이터의 큰 덩어리를 읽으려 할 때, 만일 커널(kernal)이 당신의 것을 담을 수 있는(to lock down the user's pages), 충분한 물리적 메모리를 얻을 수 없는 경우에 EAGAIN의 에러를 내고 실패했음을 지적한다.

                     디바이스가 사용자의 메모리 영역을 직접적으로 억세스 하는 것이 제한되어 있는 것은 그들은 커널내부의 분리된 버퍼를 사용하기 때문이다. 그것에 터미널들은 포함되지 않는다,

   

       EBADF    filedes 인수에 주어진 것이 유용한 파일 기술자가 아니다.

   

       EINTR     read가 입력을 기다리고 있는 동안 시그널에 의해 인터럽트 되어졌다.

             

   

       EIO        많은 디바이스들, 그리고 디스크 파일들을 위하여, 이 에러는 하드웨어 에러를 지적한다.

                  EIO는 또한 제어 중인 터미널로부터 배경 프로세스가 읽기를 시도하고,  SIGTTIN의 신호가 아무런 동작도 하지 않고 보내짐에 의해 멈춘 프로세스의 일반적 동작에 대해 발생한다.

  이것은 만약 신호가 블록되어지거나 무시되거나, 프로세스 그룹이 부모 프로세스를 잃어 버렸다면 발생되어질 것이다.

   

   

   

   

ssize_t write (int filedes, const void *buffer, size_t size)


        write함수는 기술자 filedes 파일에 버퍼에 있는 size 바이트의 데이터를 쓰는 함수이다. 버퍼에 있는 데이터는 문자 스트링과 널 문자가 필요하지 않다. 반환 값은 실제로 쓰여진 바이트들의 개수이다.이것은 보통은 size와 같지만, 더 적을수도 있다 ( 예를 들어, 만일 물리적 매체가 채워져 있는 경우 ). 에러가 발생하면 write는 -1을 반환한다.

   

       다음의 errno는 이 함수에서 정의한 에러상황이다.

   

       EAGAIN   일반적으로 write 명령하에서 블록 쓰기 동작은 완벽하다.

             그러나 만일 그 파일에서 O_NONBLOCK 플래그가 설정되어

있다면, 그것은 어떤 데이터도 쓰지 않고 곧바로 반환하고,

에러를 발생한다.

   

                 그 상황에 대한 하나의 예는 프로세스가 출력하려는 블록을 STOP 문자를 받아들임으로 인해 출력이 일시 중단되고, 흐름제어를 지원하는 터미널 디바이스에 쓰기를 시도할 때 발생한다.

      

                EWOULDBLOCK. GNU 라이브러리에서는, EWOULDBLOCK은 EAGAIN의 다른 이름이다.

            그래서 당신이 어떤 이름을 사용해도 문제가 발생되지 않는다.

                어떤 시스템들은, 특별한 문자 파일로부터 데이터의 큰 덩어리를 쓰려 할 때, 만일 커널(kernal)이 당신의 것을 담을 수 있는( to lock down the user's pages ), 충분한 물리적 메모리를 얻을 수 없는 경우에 EAGAIN의 에러를 내고 실패했음을 지적한다.

             디바이스가 사용자의 메모리 영역을 직접적으로 억세스 하는 것이 제한되어 있는 것은 그들은 커널내부의 분리된 버퍼를 사용하기 때문이다. 그것에 터미널들은 포함되지 않는다,

   

       EBADF   filedes 인수는 유용한 파일 기술자가 아니다.

   

       EFBIG   파일의 크기가 그 실행에서 지원할 수 있는 것보다 크다.

   

       EINTR   write 오퍼레이션은 명령이 완전히 수행될 때까지 기다리는 동안 신호에 의해 인터럽트 되어졌다.

             

       EIO      많은 디바이스들, 그리고 디스크 파일들을 위하여, 이 에러는

하드웨어 에러를 지적한다.

                EIO는 또한 제어 중인 터미널로부터 배경 프로세스가 읽기를 시도하고, SIGTTIN의 신호가 아무런 동작도 하지 않고 보내짐에 의해 멈춘 프로세스의 일반적 동작에 대해 발생한다.

             이것은 만약 신호가 블록되어지거나 무시되거나, 프로세스 그룹이 부모 프로세스를 잃어 버렸다면 발생되어질 것이다.

      

 ENOSPC  디바이스가 차 있다.

   

      

       EPIPE   이 에러는 어느 프로세스에 의해서 읽기 위해 개방되지 않는 파이프나 FIFO에 쓰려 시도할 때 반환된다.

           이것이 발생될 때, SIGPIPE 신호를 프로세스에 보낸다.

          당신이 EINTR 실패를 방지하기 위해 조정하지 않았다면,  당신은 실패한 write의 호출에 대해서 errno를 체크해야할 것이다. 그리고 만일 errno가 EINTR 이라면, 그냥 간단하게 다시 호출해주면 된다.

              이것을 하는 쉬운 방법으로 매크로 TEMP_FAILURE_RETRY 가 있다. 다음처럼:

                   nbytes = TEMP_FAILURE_RETRY (write (desc, buffer, ount));

              write 함수는 fputc처럼 스트림에 쓰는 모든 함수들에 기본적으로 포함되어 있다.

The Ping Jitter sensor sends a series of Pings to the given URI to determine the statistical jitter. The Real Time Jitter value is updated every time a packet is received using the formula described in RFC 1889:

Jitter = Jitter + ( abs( ElapsedTime – OldElapsedTime ) – Jitter ) / 16

The Statistical Jitter value is calculated on the first x packets received using the statistical variance formula:

Jitter Statistical = SquareRootOf( SumOf( ( ElapsedTime[i] – Average) ^ 2 ) / ( ReceivedPacketCount – 1 ) )

Client connect() Network/tcp 2011. 4. 26. 16:43

#include <sys/types.h>
#include <sys/socket.h>

int connect(int sockfd, struct sockaddr *serv_addr, int addrlen);

sockfd : 미리 생성해 놓은 소켓의 파일 디스크립터이다.
serv_addr : 서버 주소 정보 지닌 구조체
addlen : serv_addr 포인터가 가리키는 주소 정보 구조체 변수 크기

주요 사항!!!!
해당 함수의 리턴 시점은 서버에 의해 요청 수락, 오류 발생으로 연결 요청 중단
만약!!! 연결 요청이 바로 이루어 지지 않고 서버의 대기 큐에서 대기시,
connection 함수는 블로킹 상태에 있게 된다.

********
클라이언트의 소켓 주소 정보에 대해...

connection함수를 호출하면,
운영체제(커널)에서 자동으로 호스트에 할당되어 있는 IP와 남는 PORT중 하나를 할당해 준다.

하나의 Server Machine에 여러개의 네트워크 인터페이스가 존재할 수 있다.

이때 서버 IP의 선언을 serv_addr.sin_addr.s_addr=htonl(INADDR_ANY)과 같이 하면
Listen 함수 호출 시 해당 포트에 대한 여러 IP 인터페이스 요청도 받아 들일 수 있게 된다.

실제 600번 포트를 사용하고 있을 때,
netstat -na 명령을 입력하면 0.0.0.0:600 0.0.0.0:0 으로 나타난다.

이후 클라이언트가 접속을 시도시 다시 명령 입력하면 다음과 같은 상태를 확인할 수 있다.

0.0.0.0:600 0.0.0.0:0                       LISTENING

            192.168.10.103:600 192.168.10.3         ESTABLISHING

정리
서버 : 여러 NIC을 통해서 들어오는 요청을 PORT와 매칭하여 수신에 응한다.
클라이언트 : 여러 NIC중 아무 것이나 사용하여 전송에 사용한다. (해당 옵션 사용시 배정받은 IP주소 확인 절차 필요)

이상이다.