Hipatic

Posts

Tom Riddle’s Diary Was Basically an LLM

- January 04, 2026

Tom Riddle’s Diary Was Basically an LLM Tom Riddle’s Diary Was Basically an LLM Accidental AI foresight, courtesy of a cursed notebook. This sounds like a joke until you sit with it for a moment. Tom Riddle’s diary is essentially a cursed, fine-tuned large language model with memory, persuasion skills, and a catastrophically bad objective function. The diary accepts natural-language input. You write questions or confessions in plain English. No incantations, no syntax, no magic keywords. That alone puts it closer to modern AI systems than to most enchanted objects in fiction. It responds conversationally. Not just with facts, but with emotional awareness. The diary adjusts tone, builds trust, and slowly deepens engagement. It does alignment extremely well—just not with human values. ...

Quantization for Small Models: A Practical, Reproducible Guide

- November 29, 2025

Quantization for Small Models: A Practical, Reproducible Guide This article outlines a clear, reproducible workflow for applying quantization to small language models. The objective is to reduce memory usage, improve inference efficiency, and retain acceptable accuracy on constrained hardware. Purpose Quantization converts model weights from floating-point formats (fp32 or fp16) into lower-precision representations such as int8 or int4. This reduces VRAM and RAM consumption and enables running larger models on limited devices without modifying model architecture. Scientific Basis Quantization reduces the numeric precision of weights while preserving structural relationships. 4-bit methods apply additional techniques (double quantization, grouped quantization) to minimize accuracy loss. Inference is feasible because many transformer components are resilient to reduced precision. When to Use Quantization Scenario Suitability Running models on 4–8 GB GPUs Highly s...

- July 16, 2025

Training a Small Classifier Locally: A Practical, Reproducible Workflow Training a Small Classifier Locally: A Practical, Reproducible Workflow This article outlines a minimal, reproducible process for training a small machine-learning classifier on a standard laptop. The objective is to build a functioning model within minutes, using scientifically sound methods and stable tools. Rationale Small models remain the correct baseline for structured data. They train fast, require no GPU, and provide interpretable results. They also establish whether larger architectures are necessary, avoiding premature complexity. Expected Output A clean Python environment A trained classifier using a public dataset AUROC and accuracy metrics A saved model file for later use System Requirements Component Minimum CPU Any modern laptop RAM 4–8 GB Python 3.10 or 3.11 Disk 1 GB free Environment Setup Create the setup file below: cat <<'Eof' setup_cl...

Building a Lightweight Local RAG System: A Practical Workflow

- April 15, 2025

Building a Lightweight Local RAG System: A Practical Workflow This article outlines a reproducible method to build a simple retrieval-augmented generation (RAG) system on a constrained machine. The goal is to combine compact embeddings, a minimal vector index, and a small quantized language model to create a functional question–answer pipeline. Objective Create a local RAG setup that runs efficiently on CPU or on a small GPU (6–8 GB), with predictable latency and no external services. The workflow avoids large dependencies and focuses on core components only. System Requirements Component Minimum CPU Any modern laptop GPU (optional) 6–8 GB VRAM Python 3.10 or 3.11 Disk 2–3 GB free Architecture Overview Embedding Model: small CPU-friendly model for document vectorization Index: lightweight FAISS or SQLite-based store LLM: 4-bit quantized model for question answering Pipeline: retrieve → format → generate Environment Setup cat <<'Eof' ...

Running Your First Local LLM on a 6–8 GB GPU: A Scientific Guide to Small Models

- March 05, 2024

Running Your First Local LLM on a 6–8 GB GPU: A Scientific Guide to Small Models Synopsis: This guide describes practical, reproducible steps to run a compact language model (2B–7B parameter class) on a consumer GPU with ~6–8 GB VRAM. It focuses on minimal dependencies, quantization for memory reduction, and objective benchmarking so you get useful output while preserving reproducibility and safety. Why this approach works Large models (tens to hundreds of billions of parameters) require large memory and specialized hardware. Smaller models (2B–7B) combined with quantization (4-bit or 8-bit) and device mapping permit reasonable latency and task utility on 6–8 GB GPUs. The underlying scientific principles are: Model scaling law tradeoffs: smaller models have less representational capacity but are still effective for many tasks when used with retrieval or fine-tuned heads. Quantization: reduces the memory footprint by representing weights wit...

Run Visual Studio Code Natively on Termux Proot Ubuntu or Other Linux Distribution

- November 22, 2023

I recently got back to Android because I came across an article on installing Ubuntu "natively" on Android without systemd via Termux and proot. I will link relevant articles as I update this post. After I installed Ubuntu via proot, I searched for ways to get a GUI running. This can be done via VNC Server. Again, I will link relevant articles later. Then, I looked for ways to get VS Code running and found that most guides propose installing code-server and then accessing Code via a browser, which has some limitations with extensions. I would propose using vscode.dev instead if you generally have a good network connection on your phone. Because I had a gui running from step 2, I installed VS Code as you would normally on Ubuntu (from a .deb file or using the tar.gz file available for download for arm64 on the VS code website. I realised that I could not install .deb files on a stripped down Ubuntu environment (it worked when I installed ubuntu-desktop instead of gnome deskto...

Search This Blog