Building a Real-time Audio Chat Application with Go and WebSockets from Scratch

This guide will walk you through creating a real-time audio chat application using Go and WebSockets, built entirely from scratch without cloning an existing repository. We’ll set up the project structure, implement a WebSocket server to pair two clients for audio communication, and create a simple frontend to capture and play audio in the browser. By the end, you’ll have a functional application where two users can connect and talk in real-time.

Table of Contents

  1. Project Setup
  2. WebSocket Server Implementation
  3. Frontend Implementation
  4. Generating Self-Signed Certificates
  5. Running and Testing the Application
  6. Security Considerations
  7. Troubleshooting

Project Setup

Let’s start by creating the project directory and initializing it as a Go module.

  1. Create the project directory and initialize the Go module: Open a terminal and run the following commands:

    mkdir duoduel
    cd duoduel
    go mod init duoduel
    

    This creates a directory named duoduel and initializes it as a Go module with the name duoduel.

  2. Install the required dependency: We’ll use the Gorilla WebSocket library to handle WebSocket connections. Install it by running:

    go get github.com/gorilla/websocket
    
  3. Set up the directory structure: Create the following directories and subdirectories inside the duoduel directory:

    mkdir server
    mkdir server/websocket
    mkdir server/cert
    mkdir static
    mkdir static/js
    

    After running these commands, your project structure should look like this:

    duoduel/
    ├── server/
    │   ├── main.go           (to be created)
    │   ├── websocket/
    │   │   ├── handler.go    (to be created)
    │   │   └── client.go     (to be created)
    │   └── cert/
    │       ├── cert.pem      (to be generated)
    │       └── key.pem       (to be generated)
    └── static/
        ├── index.html        (to be created)
        └── js/
            └── main.js       (to be created)
    

We’ll populate these files in the next sections.


WebSocket Server Implementation

The server will handle WebSocket connections, pair clients together, and forward audio data between them. We’ll create three Go files to achieve this.

1. Creating server/main.go

Create a file named main.go inside the server directory and add the following code:

package main

import (
    "duoduel/websocket"
    "flag"
    "log"
    "net/http"
    "path/filepath"
)

func main() {
    // Command line flags
    var (
        useHTTPS = flag.Bool("https", true, "Use HTTPS")
        certFile = flag.String("cert", "cert/cert.pem", "Path to certificate file")
        keyFile  = flag.String("key", "cert/key.pem", "Path to key file")
        port     = flag.String("port", "8443", "Port to listen on")
    )
    flag.Parse()

    // Serve static files
    staticDir := filepath.Join("..", "static")
    fs := http.FileServer(http.Dir(staticDir))
    http.Handle("/", fs)

    // WebSocket endpoint
    http.HandleFunc("/ws", websocket.HandleConnection)

    // Start server
    if *useHTTPS {
        log.Printf("Server starting with HTTPS on port %s...", *port)
        err := http.ListenAndServeTLS(":"+*port, *certFile, *keyFile, nil)
        if err != nil {
            log.Fatal("ListenAndServeTLS: ", err)
        }
    } else {
        log.Printf("Server starting with HTTP on port 8080...")
        err := http.ListenAndServe(":8080", nil)
        if err != nil {
            log.Fatal("ListenAndServe: ", err)
        }
    }
}

Explanation:

  • This file sets up an HTTP server.
  • It serves static files (like index.html) from the ../static directory, relative to the server directory.
  • It defines a /ws endpoint for WebSocket connections, handled by a function we’ll define next.
  • It supports both HTTP and HTTPS, configurable via command-line flags, defaulting to HTTPS on port 8443.

2. Creating server/websocket/handler.go

Create a file named handler.go inside the server/websocket directory and add the following code:

package websocket

import (
    "log"
    "net/http"
    "sync"
    "time"

    "github.com/gorilla/websocket"
)

// Global variables to manage connections
var (
    waitingClient *Client
    mu            sync.Mutex
)

// Configure the upgrader
var upgrader = websocket.Upgrader{
    CheckOrigin: func(r *http.Request) bool {
        return true // Allow connections from any origin (for development)
    },
    ReadBufferSize:   1024,
    WriteBufferSize:  1024,
    HandshakeTimeout: 10 * time.Second,
}

// HandleConnection handles new WebSocket connections
func HandleConnection(w http.ResponseWriter, r *http.Request) {
    log.Printf("New connection request from: %s", r.RemoteAddr)

    // Upgrade HTTP connection to WebSocket
    conn, err := upgrader.Upgrade(w, r, nil)
    if err != nil {
        log.Println("Failed to upgrade connection:", err)
        return
    }

    // Set up ping/pong to keep connection alive
    conn.SetReadDeadline(time.Now().Add(60 * time.Second))
    conn.SetPongHandler(func(string) error {
        conn.SetReadDeadline(time.Now().Add(60 * time.Second))
        return nil
    })

    // Create new client
    client := NewClient(conn)

    mu.Lock()
    if waitingClient == nil {
        // No waiting client; this client waits
        waitingClient = client
        mu.Unlock()
        client.SendJSON(map[string]string{"type": "waiting"})
        handleClientMessages(client)
    } else {
        // Pair with the waiting client
        peer := waitingClient
        waitingClient = nil
        client.Peer = peer
        peer.Peer = client
        mu.Unlock()

        client.SendJSON(map[string]string{"type": "connected"})
        peer.SendJSON(map[string]string{"type": "connected"})
        go handleClientMessages(client)
    }
}

// handleClientMessages reads messages and forwards audio data
func handleClientMessages(client *Client) {
    defer func() {
        client.Close()
        mu.Lock()
        if waitingClient == client {
            waitingClient = nil
        }
        mu.Unlock()
    }()

    for {
        messageType, message, err := client.Conn.ReadMessage()
        if err != nil {
            log.Println("Read error:", err)
            break
        }
        if messageType == websocket.BinaryMessage && client.Peer != nil {
            client.Peer.SendBinary(websocket.BinaryMessage, message)
        }
        // Ignore other message types for now
    }
}

Explanation:

  • This file manages WebSocket connections.
  • HandleConnection upgrades HTTP requests to WebSocket connections and pairs clients:
    • If no client is waiting, the new client becomes the waitingClient and waits.
    • If a client is waiting, the new client is paired with it, and both are notified.
  • handleClientMessages runs in a loop, reading messages from a client:
    • If the message is binary (audio data) and the client has a peer, it forwards the data to the peer.
    • It handles cleanup when a client disconnects.

3. Creating server/websocket/client.go

Create a file named client.go inside the server/websocket directory and add the following code:

package websocket

import (
    "log"
    "sync"
    "time"

    "github.com/gorilla/websocket"
)

type Client struct {
    Conn *websocket.Conn
    mu   sync.Mutex
    Peer *Client
}

func NewClient(conn *websocket.Conn) *Client {
    conn.SetReadLimit(1024 * 1024) // 1MB max message size
    conn.SetWriteDeadline(time.Now().Add(10 * time.Second))
    return &Client{
        Conn: conn,
        Peer: nil,
    }
}

func (c *Client) SendJSON(message interface{}) error {
    c.mu.Lock()
    defer c.mu.Unlock()
    c.Conn.SetWriteDeadline(time.Now().Add(10 * time.Second))
    return c.Conn.WriteJSON(message)
}

func (c *Client) SendBinary(messageType int, data []byte) error {
    if len(data) < 10 {
        return nil // Ignore small messages to avoid noise
    }
    c.mu.Lock()
    defer c.mu.Unlock()
    c.Conn.SetWriteDeadline(time.Now().Add(15 * time.Second))
    return c.Conn.WriteMessage(messageType, data)
}

func (c *Client) Close() {
    if c.Peer != nil {
        c.Peer.SendJSON(map[string]string{"type": "disconnected"})
        c.Peer.Peer = nil
    }
    msg := websocket.FormatCloseMessage(websocket.CloseNormalClosure, "Session ended")
    c.Conn.WriteControl(websocket.CloseMessage, msg, time.Now().Add(5*time.Second))
    c.Conn.Close()
}

Explanation:

  • Defines a Client struct to represent a WebSocket client with a connection and a peer.
  • NewClient initializes a client with a connection.
  • SendJSON sends status messages (e.g., “waiting”, “connected”).
  • SendBinary sends audio data to the client.
  • Close handles cleanup, notifying the peer of disconnection.

Frontend Implementation

The frontend will connect to the WebSocket server, capture audio from the microphone, send it as PCM data, and play received audio from the peer.

1. Creating static/index.html

Create a file named index.html inside the static directory and add the following code:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Duoduel Audio Chat</title>
</head>
<body>
    <h1>Duoduel Audio Chat</h1>
    <div id="status">Connecting...</div>
    <button id="mute">Mute</button>
    <script src="js/main.js"></script>
</body>
</html>

Explanation:

  • A simple HTML page with:
    • A heading.
    • A status div to display connection status.
    • A mute button to toggle the microphone.
    • A script tag to include main.js.

2. Creating static/js/main.js

Create a file named main.js inside the static/js directory and add the following code:

let socket;
let audioContext;
let isMuted = false;
let stream;
let source;
let processor;
let gainNode;

const SAMPLE_RATE = 48000;
const CHUNK_SIZE = 4096;

function connectWebSocket() {
    socket = new WebSocket('wss://localhost:8443/ws');
    socket.onopen = () => {
        console.log('WebSocket connected');
        setupAudio();
    };
    socket.onmessage = (event) => {
        if (event.data instanceof Blob) {
            // Handle binary audio data
            event.data.arrayBuffer().then(buffer => {
                playAudioData(buffer);
            });
        } else {
            // Handle JSON status messages
            const message = JSON.parse(event.data);
            handleMessage(message);
        }
    };
    socket.onclose = () => {
        console.log('WebSocket closed');
        document.getElementById('status').textContent = 'Disconnected';
    };
    socket.onerror = (error) => {
        console.error('WebSocket error:', error);
    };
}

function handleMessage(message) {
    switch (message.type) {
        case 'waiting':
            document.getElementById('status').textContent = 'Waiting for partner...';
            break;
        case 'connected':
            document.getElementById('status').textContent = 'Connected to partner';
            break;
        case 'disconnected':
            document.getElementById('status').textContent = 'Partner disconnected';
            break;
        default:
            console.log('Unknown message type:', message.type);
    }
}

async function setupAudio() {
    try {
        audioContext = new (window.AudioContext || window.webkitAudioContext)({
            sampleRate: SAMPLE_RATE
        });
        if (isIOS()) {
            // For iOS, resume audio context on user interaction
            document.addEventListener('touchstart', () => {
                audioContext.resume();
            }, { once: true });
        }
        const constraints = {
            audio: {
                echoCancellation: false,
                noiseSuppression: false,
                autoGainControl: false,
                sampleRate: 44100,
                channelCount: 1
            },
            video: false
        };
        stream = await navigator.mediaDevices.getUserMedia(constraints);
        source = audioContext.createMediaStreamSource(stream);
        gainNode = audioContext.createGain();
        gainNode.gain.value = 2.5; // Amplify input
        source.connect(gainNode);
        processor = audioContext.createScriptProcessor(CHUNK_SIZE, 1, 1);
        processor.onaudioprocess = (e) => {
            if (!isMuted && socket.readyState === WebSocket.OPEN) {
                const inputData = e.inputBuffer.getChannelData(0);
                const pcmData = new Int16Array(inputData.length);
                for (let i = 0; i < inputData.length; i++) {
                    const s = Math.max(-1, Math.min(1, inputData[i]));
                    pcmData[i] = s < 0 ? s * 32768 : s * 32767;
                }
                socket.send(pcmData.buffer);
            }
        };
        gainNode.connect(processor);
        processor.connect(audioContext.destination); // Required to trigger processing
    } catch (error) {
        console.error('Error setting up audio:', error);
    }
}

function playAudioData(buffer) {
    const pcmData = new Int16Array(buffer);
    const floatData = new Float32Array(pcmData.length);
    for (let i = 0; i < pcmData.length; i++) {
        floatData[i] = pcmData[i] / 32768;
    }
    const audioBuffer = audioContext.createBuffer(1, floatData.length, SAMPLE_RATE);
    audioBuffer.getChannelData(0).set(floatData);
    const source = audioContext.createBufferSource();
    source.buffer = audioBuffer;
    source.connect(audioContext.destination);
    source.start();
}

function toggleMute() {
    isMuted = !isMuted;
    document.getElementById('mute').textContent = isMuted ? 'Unmute' : 'Mute';
}

function isIOS() {
    return /iPad|iPhone|iPod/.test(navigator.userAgent);
}

// Initialize
connectWebSocket();
document.getElementById('mute').addEventListener('click', toggleMute);

Explanation:

  • WebSocket Connection: Connects to wss://localhost:8443/ws and handles connection events.
  • Audio Setup: Requests microphone access, processes audio into PCM format using ScriptProcessorNode, and sends it over WebSocket when not muted.
  • Audio Playback: Receives PCM data, converts it to a playable format, and plays it using the Web Audio API.
  • Mute Functionality: Toggles the microphone on/off.
  • iOS Compatibility: Resumes the audio context on touch for iOS devices.

Generating Self-Signed Certificates

Since the server uses HTTPS by default, we need self-signed certificates for development. Run these commands from the duoduel directory:

cd server
mkdir cert
openssl req -x509 -newkey rsa:4096 -keyout cert/key.pem -out cert/cert.pem -days 365 -nodes
  • Follow the prompts to generate cert.pem and key.pem in server/cert.
  • These are for development only; use proper certificates in production.

Running and Testing the Application

  1. Start the server: Navigate to the server directory and run:

    cd server
    go run main.go -https=true -port=8443
    

    This starts the server on https://localhost:8443.

  2. Test the application:

    • Open two browser tabs (e.g., Chrome or Firefox).
    • In each tab, navigate to https://localhost:8443.
    • Accept the self-signed certificate warning.
    • The first tab will show “Waiting for partner…”.
    • When the second tab connects, both will show “Connected to partner”, and you can talk.
    • Use the “Mute” button to toggle your microphone.

Security Considerations

  • HTTPS: Always use HTTPS in production to encrypt WebSocket traffic.
  • Origin Checking: The CheckOrigin function allows all origins for simplicity. In production, restrict it to trusted domains.
  • Certificates: Replace self-signed certificates with ones from a trusted authority in production.

Troubleshooting

  • No Audio:

    • Check the browser console (F12) for errors.
    • Ensure microphone permissions are granted.
    • Test with different browsers or devices.
  • Connection Fails:

    • Verify the server is running and port 8443 is open.
    • Check the WebSocket URL (wss://localhost:8443/ws).
    • Ensure the certificate is accepted.
  • Choppy Audio:

    • Adjust CHUNK_SIZE in main.js (e.g., try 2048 or 8192).
    • Test with a faster network or locally.

Congratulations! You’ve built a real-time audio chat application from scratch. This basic version pairs two users, but you could extend it with features like multi-user rooms, better audio buffering, or a prettier UI. Happy coding! 🎤