• You might not have heard of the term foundation model before, but you’ve almost certainly used one. In reference to LLMs (and AI in general), “foundation model” refers to a model that has been trained on vast swathes of data, such that it can be used in a general context.

    LLM foundation models are those like OpenAI’s GPT series (e.g., GPT-3, GPT-4), Meta’s Llama series, and Anthropic’s Claude series of models. The release of a foundation model is a big thing since, as the name implies, they are general-purpose models that a lot of LLM applications are built upon. Thus, improved performance in these foundation models is akin to raising the floor upon which the applications stand.

    Moreover, foundation models are expensive to build and train in terms of computing resources, design manpower, and training time. As an example, OpenAI’s GPT-4 was state-of-the-art at the time of its release, boasting an estimated 1.8 trillion parameters. It took an estimated 79 million USD to train and took several weeks to do so, even with the compute power at OpenAI’s disposal.

    What differentiates foundation models?

    A foundation model’s performance can be measured by testing it on a variety of foundation model benchmarks. These are collections of varied tests in different fields that assess a model’s capabilities. In short, improvements across the board contribute to better foundation models.

    An increased parameter count is perhaps the most straightforward characteristic. Accompanying that, improvements to the model’s architecture can increase benchmark accuracy as well as efficiency.

    Increased quantity and quality of training data can have a positive effect as well. These days, that includes multimodal data too, since foundation models can analyze images and audio as well.

    Improved hardware and utilization of hardware can help decrease inference times, allowing the model to work faster.

  • RAG, or retrieval-augmented generation, is a technique that allows LLMs to access external sources of data. Normally, LLMs can only rely on the prompt fed to them, and the knowledge baked into their parameters. However, RAG vastly expands LLMs’ capabilities. Using these external sources of data, an LLM can incorporate it into its output. This proves to be a very versatile technique, allowing an LLM to incorporate web search results, private documents and more.

    How does RAG work?

    Fundamentally, RAG is rather straightforward. Once the system receives a prompt, we need to use that prompt to find relevant data from whatever source our RAG system is using. Once we have that, we narrow it down to the most relevant results, then allow the LLM to read that data along with the input prompt. I’ll go in-depth into how RAG works in a later post.

    Why use RAG?

    Think of RAG like handing an LLM a textbook, or giving it access to Google. By backing the LLM up with a data source, the LLM can generate answers based on up-to-date data, as well as data from private/specialized sources. All this without retraining the model! Additionally, RAG reduces hallucinations, since the LLM has a repository of data to reference and rely on.

    Where is RAG used?

    Aside from search-backed LLM applications like Perplexity, RAG is used in areas that require the LLM to use private or specialized data.

    For instance, a customer support chatbot could utilize RAG to reference private company policy documents, and thus generate answers that accurately and specifically apply to that company.

    Similarly, RAG can be used in internal search tools, such as within companies and legal firms to semantically sift through the large swathes of private data that may exist. The sky is really the limit when it comes to RAG, think of any use case that an LLM foundation model might not have sufficient knowledge in, and you can probably use RAG to supercharge a foundation model for your use case!

  • What is Transfer Learning?

    Transfer learning is a machine learning technique where a model trained on one task is reused as a starting point for a different but related task. Instead of building and training a new model from scratch for every problem, transfer learning leverages the knowledge and features learned by a pre-trained model to accelerate and improve learning on the new task. This approach is especially valuable when the new task has limited labeled data, allowing models to adapt quickly and effectively by building on prior experience.

    How Transfer Learning Works

    The process typically begins with a pre-trained model that has learned generalizable features from a large dataset and task. In transfer learning, most of this model, including early layers that capture broad patterns, is usually kept unchanged or “frozen.” The final layers, which capture task-specific information, are then fine-tuned with new data for the target task. This fine-tuning adjusts the model’s parameters just enough to specialize it for the new application while retaining the foundational knowledge from the original training. Depending on the similarity and size of the new dataset, more or fewer layers may be retrained to balance adaptation and preservation of learned features.

    Why Transfer Learning Matters

    Transfer learning offers key benefits such as improved efficiency, because it reduces the training time and computational resources needed compared to training from scratch. It also lowers data requirements, enabling effective learning even when labeled data are scarce. Additionally, by starting from a model with a solid base of learned representations, transfer learning often leads to better performance and generalization on the new task. These advantages make it a cost-effective and practical approach for deploying models in real-world scenarios where data and resources can be limited.

    How Transfer Learning is Used

    Transfer learning has become fundamental across multiple fields. In natural language processing (NLP), models like BERT and GPT are pre-trained on vast text bases and then fine-tuned for tasks such as sentiment analysis, machine translation, or question answering. In computer vision, transfer learning is widely used to adapt pre-trained models like ResNet or VGG for image classification, object detection, and segmentation, which is even done in domains like medical imaging where data can be scarce! Beyond these, transfer learning finds applications in speech recognition, robotics, and even more specialized areas, enabling versatile and efficient adaptation of AI systems to diverse tasks.

  • LoRA in a Nutshell

    Low-Rank Adaptation, or LoRA, is a cutting-edge technique designed to fine-tune large language models (LLMs) efficiently and effectively. Instead of modifying the entire massive model, LoRA adapts just a small fraction of parameters through lightweight additions, enabling rapid specialization without retraining from scratch or requiring excessive computational resources.

    The Core Idea: Low-Rank Adaptation

    At its heart, LoRA takes advantage of the mathematical insight that the complex weight updates needed to fine-tune a model can be approximated by the product of two much smaller low-rank matrices. This decomposition drastically reduces the number of parameters that need to be adjusted. Essentially, LoRA freezes the original pre-trained model weights and introduces these smaller trainable matrices to capture the necessary changes, preserving the extensive knowledge already embedded in the base model.

    How It Works in Practice

    Practically, LoRA inserts these low-rank matrices into each layer of the model, which are then trained on the new, task-specific data. During fine-tuning, only these added matrices are updated, while the original weights remain untouched. Once training completes, the adjustments from the low-rank matrices are combined with the original model during inference, allowing for rapid adaptation with minimal computational overhead. This modular approach also permits multiple task-specific LoRA adapters to coexist, each tailored for different applications, without duplicating the entire model.

    Why It Matters

    LoRA brings significant advantages to the fine-tuning landscape for large language models. It substantially reduces the computational cost and memory footprint, speeding up training times and making fine-tuning accessible even on more modest hardware. By preserving the base model’s original knowledge, LoRA helps prevent issues like catastrophic forgetting, where models lose valuable general knowledge when fine-tuned extensively. Moreover, its efficiency enables scalable deployment, letting organizations adapt a single large model across many specialized tasks cost-effectively. This balance of economy, performance, and flexibility is why LoRA is increasingly becoming a standard approach for adapting powerful LLMs to specific, real-world needs.

  • With all the buzz around OpenAI’s new Sora 2 video generation model, you might be wondering what makes it different from other previous SOTA models like Veo 3. Here’s the breakdown.

    Visual fidelity:

    Sora 2 has made improvements in visual fidelity, generating frames natively in 720p and then upscaling, to maintain sharp textures and object edges.

    Object permanence has also been improved upon, thanks to incorporating Long Context Tuning research into the model’s architecture, allowing it to “remember” entities across cuts.

    Fluid graphics have also been improved upon, partly thanks to improvements in the model’s understanding of physics.

    Physics:

    One of Sora 2’s biggest improvements is in its understanding of physics. This is largely due to incorporating a differentiable physics engine within the generative loop, allowing real-world dynamics to be learned. Accompanied by using a “referee model” to spot physics errors and encourage retraining, Sora 2 has an unprecedented level of quality when it comes to modeling dynamic processes and events.

    Audio:

    I think this is Sora 2’s biggest improvement, along with physics. Sora 2 tightly couples audio with video, even baking audio spectrograms into a shared latent space with that of video. This allows for realistic, layered audio with excellent synchronization with the video. Compared to other generative models, audio in Sora 2 feels much less like an afterthought.

    Social interactions and virality:

    Sora 2’s Cameo collaboration system allows users to insert their own likeness and voice into generated videos, encouraging personalized memes, reaction videos, and branded messages. While there are concerns regarding safeguarding identity, Sora places the owner of the likeness in control of their “cameo’s” usage. Combined with the Sora app that OpenAI has released, Sora 2 seems poised to encourage social interactions.

  • Core Building Blocks

    • Goal and constraints
      A clear objective with constraints (time, cost, permissions) gives the agent boundaries. Strong prompt design or a formal task schema helps the system reason cleanly about tradeoffs.
    • Tool use
      Agents gain leverage by using tools like retrieval for background knowledge, code interpreters for precise computation, browsing for fresh information, and domain APIs. Tool outputs can become new context for the next decision.
    • Planning and decomposition
      Even simple models can accomplish more with explicit step-by-step plans. More advanced setups use dedicated “planner” components or planning prompts to structure work, track subgoals, and branch on contingencies.
    • Memory
      Short-term memory holds the current plan and intermediate results. Long-term memory stores reusable facts, learned preferences, past resolutions, and artifacts. Good memory design can reduce repetition and improve reliability.
    • Reflection and self-critique
      Reflection prompts or separate models acting as a critic can help agents catch mistakes, validate assumptions, and refine outputs. This can be as light as sanity checks or as heavy as unit tests and formal validations.
    • Safety and governance
      Policies, permissioning, rate limits, and human-in-the-loop checkpoints can be used to ensure the agent only acts within authorized scopes. Observability (logs, traces, action histories) is crucial for debugging and accountability.

    Why Agentic AI?

    • Autonomy and efficiency
      Agents can handle multi-step tasks end-to-end, reducing human orchestration. They can run overnight research, triage tickets, and generate drafts without constant supervision.
    • Tool-augmented competence
      By calling calculators, compilers, search, and specialized APIs, agents sidestep LLM weaknesses and lean on systems designed for correctness and speed.
    • Adaptivity
      Unlike static workflows, agents react to failures, missing data, or changing requirements. They can adjust plans, try alternatives, and escalate when needed.
    • Reuse and scale
      Encapsulating workflows as policies and tools lets organizations scale patterns across teams and domains. Agents become templates for repeatable tasks.
  • Agentic AI refers to AI systems that don’t just predict the next token or classify inputs – they perceive, plan, and act to achieve goals over time. Instead of passively answering questions, agentic systems take initiative: they break down objectives into steps, call the right tools and services, monitor progress, adapt to feedback, and iterate until they succeed or fail safely.

    At a high level, an agentic AI system has three pillars: the ability to understand its environment, the ability to decide what to do next, and the ability to take actions that change the world or the task state. Wrapped around that is a feedback loop that lets it evaluate results and improve its next move.

    How Agentic AI Works

    A simple way to frame agentic systems is as a loop:

    Perceive

    The system gathers context from the user, tools, documents, APIs, or sensors. This includes reading instructions, inspecting current task state, and checking constraints like budgets, deadlines, or policies.

    Plan

    The system creates a task plan: it decomposes the goal into steps, orders them, assigns tools, and sets criteria for success. Plans can be explicit (a written checklist) or implicit (kept in hidden state), but the key is that the model is preparing to act, not just to answer.

    Act

    The agent executes steps. Actions can include:

            Calling external tools or APIs (search, databases, code execution, email, calendar, CI/CD)

            Reading and writing files

            Running simulations or tests

            Interacting with software systems (browsers, terminals, apps)

    Reflect

    After each action, the agent evaluates outcomes against the plan. Did the tool call succeed? Did the result match the criteria? If not, it revises its approach, updates the plan, or asks the user for clarification.

    Iterate

    The loop continues until criteria are met, time or budget is exhausted, or the agent decides to escalate to a human or stop.

    In practice, well-engineered agentic systems add scaffolding around this loop: memory to retain relevant facts and decisions, guardrails to enforce safety and policy, scheduling to handle long-running tasks, and monitoring to prevent runaway behavior.

  • RISC, or Reduced Instruction Set Computer, is a type of processor architecture that uses simplified CPU instructions, making them faster and more granular. In this project, I built a 16-bit processor with SystemVerilog and ran it on a Spartan-7 FPGA. The processor was built to follow a subset of the SLC-3 RISC ISA.

    Hardware Description Language (HDL), like SystemVerilog, allows for the creation of programmable hardware functions on FPGAs, which I used to create efficient modules that functioned as core processor components, such as the ALU and control unit.

    Along with the processor, I implemented the memory subsystem for the processor, backed by MMIO. The MMIO used was the on-chip BRAM of the FPGA, which proved sufficient for the project. Connected via the main bus, read and write access was controlled via the various control registers of the processor. Back to the processor, I implemented modules including the ALU, register file, PC unit, and the control unit.

    The control unit, particularly, was rather complex, with its finite state machine spanning several dozen nodes. It was a challenge to extract a particular subset of the ISA while still maintaining desired functionality.

  • Raycasting is a computer graphics technique that involves projecting rays from a viewpoint through each pixel of a 2D screen into a 3D scene. This can be used to quickly calculate pixel color and segment sizes, resulting in a render of 3D objects.

    I built a raycasting engine using SystemVerilog and ran it on a Spartan-7 FPGA (Field-Programmable Gate Array). Hardware Description Language (HDL), like SystemVerilog, allows for the creation of programmable hardware functions on FPGAs, which I used to create efficient modules that worked together to process I/O and run the raycasting engine itself.

    The heart of the engine was a MicroBlaze processor IP, loaded with custom C code via Xilinx Vitis HLS. I then used several custom hardware modules to feed keyboard input from a MAX3421E chip into the processor, and to smoothly output the generated video data. These modules included a double frame buffer, a VGA to HDMI converter, as well as a module to render the generated image segments.

    SPI (Serial Peripheral Interface), UART (Universal Asynchronous Receiver/Transmitter), I2C (Inter-Integrated Circuit), and AXI (Advanced eXtensible Interface) communication protocols were critical to this project. AXI was used to connect IP modules, UART for the keybord input, SPI for communication with the MicroBlaze processor, and I2C for components such as the frame buffer. This project helped develop my skills regarding HDL design decisions, particularly with respect to communication protocols.

    I also gained familiarity with SystemVerilog and Xilinx Vivado, including debugging with an ILA core and the Debug Wizard.

    Source code can be found at https://github.com/AdityaVersion34/ECE385_FinalProject/tree/main/raycaster_final_v1

  • Earlier this year, two friends and I set out to build a UNIX-based OS for a RISC-V processor. Delegating the work amongst us left me primarily working on the file system for the operating system.

    The file system I designed and built was based on the ex2 file system. Introduced in 1993, ext2 (second extended file system) was a widely used Linux file system until roughly the year 2000, after which it was replaced by its successors.

    Ext2 organizes the disk into block groups to minimize fragmentation. These block groups are, in turn, managed by indirect nodes, or “inodes”, which provide a system to organize, point to, and relate data blocks. To support the file system, I also created a write-through cache to allow for efficient reads and writes.

    Due to the complex nature of the file system, its implementation alone ended up spanning almost 2000 lines of code. Combined with the intricacies of the rest of the operating system and the fact that the file system was the backbone for the actual execution of processes, meant that the file system’s reliability was of the utmost importance.

    Though I knew this, robustly testing the file system to the extent of use that it would see in practice proved a challenge nonetheless. In a few instances, what I thought to have been complete testing turned out to have gaps in it. Problematically, this was only discovered after painfully debugging the whole system, since the issues appeared to be from other components. Working on the file system thus greatly improved my skills when it came to testing, maintaining, and documenting sizable codebases.