Posts

Quantization for Small Models: A Practical, Reproducible Guide

Quantization for Small Models: A Practical, Reproducible Guide This article outlines a clear, reproducible workflow for applying quantization to small language models. The objective is to reduce memory usage, improve inference efficiency, and retain acceptable accuracy on constrained hardware. Purpose Quantization converts model weights from floating-point formats (fp32 or fp16) into lower-precision representations such as int8 or int4. This reduces VRAM and RAM consumption and enables running larger models on limited devices without modifying model architecture. Scientific Basis Quantization reduces the numeric precision of weights while preserving structural relationships. 4-bit methods apply additional techniques (double quantization, grouped quantization) to minimize accuracy loss. Inference is feasible because many transformer components are resilient to reduced precision. When to Use Quantization Scenario Suitability Running models on 4–8 GB GPUs Highly s...
Training a Small Classifier Locally: A Practical, Reproducible Workflow Training a Small Classifier Locally: A Practical, Reproducible Workflow This article outlines a minimal, reproducible process for training a small machine-learning classifier on a standard laptop. The objective is to build a functioning model within minutes, using scientifically sound methods and stable tools. Rationale Small models remain the correct baseline for structured data. They train fast, require no GPU, and provide interpretable results. They also establish whether larger architectures are necessary, avoiding premature complexity. Expected Output A clean Python environment A trained classifier using a public dataset AUROC and accuracy metrics A saved model file for later use System Requirements Component Minimum CPU Any modern laptop RAM 4–8 GB Python 3.10 or 3.11 Disk 1 GB free Environment Setup Create the setup file below: cat <<'Eof' setup_cl...

Building a Lightweight Local RAG System: A Practical Workflow

Building a Lightweight Local RAG System: A Practical Workflow This article outlines a reproducible method to build a simple retrieval-augmented generation (RAG) system on a constrained machine. The goal is to combine compact embeddings, a minimal vector index, and a small quantized language model to create a functional question–answer pipeline. Objective Create a local RAG setup that runs efficiently on CPU or on a small GPU (6–8 GB), with predictable latency and no external services. The workflow avoids large dependencies and focuses on core components only. System Requirements Component Minimum CPU Any modern laptop GPU (optional) 6–8 GB VRAM Python 3.10 or 3.11 Disk 2–3 GB free Architecture Overview Embedding Model: small CPU-friendly model for document vectorization Index: lightweight FAISS or SQLite-based store LLM: 4-bit quantized model for question answering Pipeline: retrieve → format → generate Environment Setup cat <<'Eof' ...

Running Your First Local LLM on a 6–8 GB GPU: A Scientific Guide to Small Models

Running Your First Local LLM on a 6–8 GB GPU: A Scientific Guide to Small Models Synopsis: This guide describes practical, reproducible steps to run a compact language model (2B–7B parameter class) on a consumer GPU with ~6–8 GB VRAM. It focuses on minimal dependencies, quantization for memory reduction, and objective benchmarking so you get useful output while preserving reproducibility and safety. Why this approach works Large models (tens to hundreds of billions of parameters) require large memory and specialized hardware. Smaller models (2B–7B) combined with quantization (4-bit or 8-bit) and device mapping permit reasonable latency and task utility on 6–8 GB GPUs. The underlying scientific principles are: Model scaling law tradeoffs: smaller models have less representational capacity but are still effective for many tasks when used with retrieval or fine-tuned heads. Quantization: reduces the memory footprint by representing weights wit...

Run Visual Studio Code Natively on Termux Proot Ubuntu or Other Linux Distribution

 I recently got back to Android because I came across an article on installing Ubuntu "natively" on Android without systemd via Termux and proot. I will link relevant articles as I update this post. After I installed Ubuntu via proot, I searched for ways to get a GUI running. This can be done via VNC Server. Again, I will link relevant articles later. Then, I looked for ways to get VS Code running and found that most guides propose installing code-server and then accessing Code via a browser, which has some limitations with extensions. I would propose using vscode.dev instead if you generally have a good network connection on your phone. Because I had a gui running from step 2, I installed VS Code as you would normally on Ubuntu (from a .deb file or using the tar.gz file available for download for arm64 on the VS code website. I realised that I could not install .deb files on a stripped down Ubuntu environment (it worked when I installed ubuntu-desktop instead of gnome deskto...

The Best Intel Gaming CPUs of 2020

Image
1. Intel Core i9 10900K: Based on Intel's Comet Lake, Intel Core i9 10900K was launched in 43922 for a starting price of 488 USD. This CPU was ranked #1 in the review. 2. Intel Core i9 9900K: Based on Intel's Comet Lake, Intel Core i9 9900K was launched in 43374 for a starting price of 499 USD. This CPU was ranked #2 in the review. 3. Intel Core i7 9700K: Based on Intel's Coffee Lake, Intel Core i7 9700K was launched in 43374 for a starting price of 374 USD. This CPU was ranked a joint #2 in the review. 4. Intel Core i5-10600K: Based on Intel's Comet Lake, Intel Core i5-10600K was launched in 43922 for a starting price of 262 USD. This CPU was ranked #3 in the review. 5. Intel Core i5 9400F: Based on Intel's Coffee Lake, Intel Core i5 9400F was launched in 43466 for a starting price of 182 USD. This CPU was ranked #4 in the review. Rank CPU 1  Intel Core i9 10900K 2  Intel Core i9 9900K 2  Intel Core i7 97...

Outlook.com with 2fa and Microsoft Authenticator is broken!

So if you have enabled 2fa on your Microsoft account and then enabled Microsoft Authenticator and you uninstall the authenticator app, you can no longer login to your email account. You cannot ask for help as that requires you to login. You can basically do nothing. The error: When you login to your Microsoft account, you enter your password, then the login page prompts you to accept the login on the Authenticator app. Without the app, if you try and login using your backup email or phone number, you are routed back to your enter password screen. This then loops endlessly. I have written to Microsoft regarding this. Let me see if they reply. Will update the post soon hopefully.