• RISC, or Reduced Instruction Set Computer, is a type of processor architecture that uses simplified CPU instructions, making them faster and more granular. In this project, I built a 16-bit processor with SystemVerilog and ran it on a Spartan-7 FPGA. The processor was built to follow a subset of the SLC-3 RISC ISA.

    Hardware Description Language (HDL), like SystemVerilog, allows for the creation of programmable hardware functions on FPGAs, which I used to create efficient modules that functioned as core processor components, such as the ALU and control unit.

    Along with the processor, I implemented the memory subsystem for the processor, backed by MMIO. The MMIO used was the on-chip BRAM of the FPGA, which proved sufficient for the project. Connected via the main bus, read and write access was controlled via the various control registers of the processor. Back to the processor, I implemented modules including the ALU, register file, PC unit, and the control unit.

    The control unit, particularly, was rather complex, with its finite state machine spanning several dozen nodes. It was a challenge to extract a particular subset of the ISA while still maintaining desired functionality.

  • Raycasting is a computer graphics technique that involves projecting rays from a viewpoint through each pixel of a 2D screen into a 3D scene. This can be used to quickly calculate pixel color and segment sizes, resulting in a render of 3D objects.

    I built a raycasting engine using SystemVerilog and ran it on a Spartan-7 FPGA (Field-Programmable Gate Array). Hardware Description Language (HDL), like SystemVerilog, allows for the creation of programmable hardware functions on FPGAs, which I used to create efficient modules that worked together to process I/O and run the raycasting engine itself.

    The heart of the engine was a MicroBlaze processor IP, loaded with custom C code via Xilinx Vitis HLS. I then used several custom hardware modules to feed keyboard input from a MAX3421E chip into the processor, and to smoothly output the generated video data. These modules included a double frame buffer, a VGA to HDMI converter, as well as a module to render the generated image segments.

    SPI (Serial Peripheral Interface), UART (Universal Asynchronous Receiver/Transmitter), I2C (Inter-Integrated Circuit), and AXI (Advanced eXtensible Interface) communication protocols were critical to this project. AXI was used to connect IP modules, UART for the keybord input, SPI for communication with the MicroBlaze processor, and I2C for components such as the frame buffer. This project helped develop my skills regarding HDL design decisions, particularly with respect to communication protocols.

    I also gained familiarity with SystemVerilog and Xilinx Vivado, including debugging with an ILA core and the Debug Wizard.

    Source code can be found at https://github.com/AdityaVersion34/ECE385_FinalProject/tree/main/raycaster_final_v1

  • Earlier this year, two friends and I set out to build a UNIX-based OS for a RISC-V processor. Delegating the work amongst us left me primarily working on the file system for the operating system.

    The file system I designed and built was based on the ex2 file system. Introduced in 1993, ext2 (second extended file system) was a widely used Linux file system until roughly the year 2000, after which it was replaced by its successors.

    Ext2 organizes the disk into block groups to minimize fragmentation. These block groups are, in turn, managed by indirect nodes, or “inodes”, which provide a system to organize, point to, and relate data blocks. To support the file system, I also created a write-through cache to allow for efficient reads and writes.

    Due to the complex nature of the file system, its implementation alone ended up spanning almost 2000 lines of code. Combined with the intricacies of the rest of the operating system and the fact that the file system was the backbone for the actual execution of processes, meant that the file system’s reliability was of the utmost importance.

    Though I knew this, robustly testing the file system to the extent of use that it would see in practice proved a challenge nonetheless. In a few instances, what I thought to have been complete testing turned out to have gaps in it. Problematically, this was only discovered after painfully debugging the whole system, since the issues appeared to be from other components. Working on the file system thus greatly improved my skills when it came to testing, maintaining, and documenting sizable codebases.

  • Try it out now: https://aditya-rag-app.streamlit.app/

    As LLMs have become more ubiquitous, tools to build with them have evolved as well. In fact, they’ve improved to the point that even apps like Perplexity no longer seem daunting to build at a smaller scale.

    To learn more about building with LLMs, I created a RAG (Retrieval-Augmented Generation) chatbot that integrates real-time web search information to provide up-to-date and relevant answers. This operates similarly to many popular large-scale LLM chat applications today.

    The initial consideration when it came to this project was to decide which framework to use when working with LLMs. The two popular choices are LangChain and LlamaIndex. While LlamaIndex is well-known for its specialty in RAG, I chose LangChain because of its flexibility in various aspects – such as workflows and data loaders – thanks to its modular design.

    After finishing an initial prototype however, I faced issues with LangChain. Primarily, I couldn’t enforce a strict workflow for the LLM system. For instance, system prompts would often fail to register, leading to the overall app being unreliable. Moreover, many LangChain features were being deprecated in favor of LangGraph, LangChain’s new framework for complex and dynamic agentic workflows. This led me to rewrite the application to use LangGraph instead.

    Although it initially posed a slight learning curve due to LangGraph’s graph-based architecture, it was soon apparent how much more powerful it was. While LangChain is sequential in nature – “chaining” together runnable components – LangGraph allows for non-linear workflows with conditional logic. This made the agentic approach to the system much more reliable and scalable. LangGraph’s memory persistence via checkpointers made memory management and use much more straightforward, as well as providing a useful abstraction layer compared to LangChain’s many redundant and incompatible memory types.

    When it came to putting together a frontend for the application, Streamlit was my first choice. It was relatively easy to build, and deployment was a smooth process!

  • With the recent release of Genie 3, attention has once again been brought to world models. These are generative models that are trained to simulate and model the (not necessarily) real world! Most impressively, these are real-time simulations, sensitive to input. World models are able to understand the properties of the simulated world – such as forces, motion, spatial and temporal relations – and aim to apply them by simulating a coherent world.

    How they work

    A popular example breaks world models down into 3 components.

    The vision model component allows for representing the video frame as a latent representation – done so at a lower dimensionality. This permits a compressed representation of the world. This component also ensures that the latent space representation can be decoded back into the appropriate video frame.

    The memory model component uses RNNs (Recurrent Neural Networks) to process the temporal sequence of latent space representations, and predict the next one. Since RNNs are already commonly used for temporal predictions, they fit right into this use case! This component also includes the previous control action to make its prediction, which brings us to…

    The control component. This unit is responsible for deciding how the video output changes. While traditionally this was automated via NNs, incorporating human input into this component allows a user to control the simulation. This is all quite the simplified explanation, but it gives a good idea of how world models work.

    Why world models?

    World models are highly valuable because they enable AI systems to simulate and understand complex, dynamic environments before acting in them. This capability drives applications across robotics, autonomous vehicles, and more. For example, robots can learn spatial awareness and plan multi-step tasks safely in simulations, reducing costly real-world trials. Autonomous vehicles use world models to train safely in diverse traffic, weather, and pedestrian scenarios that might be difficult to encounter consistently in reality. Beyond training, world models support better decision-making and safety by predicting future states and outcomes in real time. They also accelerate learning efficiency and task generalization, empowering AI to handle new and complex challenges with flexibility.

    Genie 3

    Google’s recent release of Genie 3 marks a significant step forward in world model capabilities. Genie 3 generates interactive, 3D virtual worlds in real-time, powered by a foundation model trained to simulate diverse environments accurately and responsively. What sets Genie 3 apart is its ability to maintain logical consistency and physical realism over extended interactions without relying on hard-coded physics engines. This results in an interaction horizon lasting several minutes! Through its short-term memory, it remembers past events and actions to sustain coherent experiences. Users can guide these worlds with text prompts or direct actions, creating a fluid, explorable simulation that blends imagination and grounded world understanding.

  • Have you seen this video?

    “This is gonna be scariest sound you’ll hear when they’re looking for you”

    “This is almost like two R2-D2’s having a conversation.”

    “Literally sounds like something out of a sci-fi horror with the AI looking for you hiding in the cupboard lol”

    What’s going on? Is Skynet upon us? What is Gibberlink mode?

    Gibberlink Mode

    Gibberlink mode is an AI communication protocol that allows two AI agents to communicate via a sound-based language optimized for inter-machine communications. Once two AI voice agents realize they’re in a conversation, they can switch to communicating via Gibberlink mode.

    This enables them to transmit and receive data via the GGWave protocol, which encodes digital data into bursts of sound. This compacts the data, making it an efficient communication method.

    Some mechanics the protocol uses include:

    • Frequency division multiplexing: Using multiple carrier frequencies at once, boosting throughput and signal integrity.
    • Packet structure: Data bursts form packets, containing a header, payload, and end marker.
    • Encryption: Though Gibberlink data on its own is insecure (to devices at least), some promising results are demonstrating that AI voice agents can learn to encrypt data via public keys and derived secret keys.

    Why Gibberlink?

    Created by Anton Pidkuiko and Boris Starkov, and demonstrated via this viral video, Gibberlink won the global top prize at the ElevenLabs (a leading speech synthesis company) Worldwide Hackathon. Despite its playful appearance, the technology is seeing steady adoption among voice agents. Gibberlink’s upsides include:

    • Speed: Up to 80% faster than spoken language.
    • Efficiency: Reduced computational load, by representing data efficiently and avoiding full NLP for communication.
    • Security: Harder to intercept than spoken language.

  • Why Do LLMs Hallucinate?

    LLM hallucinations stem from several inherent factors tied to how these models are developed and operate:

    • Limitations in Training Data: LLMs learn from vast datasets, but these datasets can be incomplete, outdated, or biased. Missing information, errors in the data, or skewed representations can lead the model to generate inaccurate or misleading content.
    • Probabilistic Text Generation: LLMs generate text by predicting the most likely next word based on patterns learned during training. However, they do not possess true fact-checking capabilities. This probabilistic nature means they can produce plausible-sounding but incorrect information.
    • Ambiguous or Poorly Phrased Prompts: When user input is vague or unclear, the model struggles to interpret intent precisely. This uncertainty can cause it to fill gaps with invented or unrelated details, resulting in hallucinations.
    • Architectural and Optimization Factors: Certain design choices in model architecture and optimization techniques can impact how well the model balances creativity and accuracy, influencing hallucination rates.
    • Randomness in Generation Processes: Elements like temperature settings introduce randomness to encourage diverse outputs, but this can sometimes cause the model to produce unexpected or erroneous content.

    Approaches to Mitigate Hallucinations

    While hallucinations cannot be entirely eliminated, various strategies help reduce their frequency and impact:

    • Improving Training Data Quality: Curating high-quality, comprehensive, and up-to-date datasets helps models learn more accurate and relevant information.
    • Retrieval-Augmented Generation: Integrating external knowledge sources or real-time databases allows the model to ground its responses in verifiable facts, reducing fabrication.
    • Prompt Engineering: Crafting clear, specific, and well-structured prompts minimizes ambiguity and guides the model toward more accurate answers.
    • Post-Processing and Fact-Checking: Applying automated or human-in-the-loop verification processes after generation can identify and correct hallucinated content before it reaches users.
  • What Are LLM Hallucinations?

    When it comes to LLMs, “hallucinations” refer to instances where the model generates information that is inaccurate, irrelevant, or entirely fabricated. The term is metaphorical, borrowing from human experiences of perceiving things that aren’t real, to describe how an AI model can produce outputs that appear plausible and convincing but are fundamentally false or misleading. These hallucinations pose a significant challenge because they can undermine trust and limit the usefulness of LLMs in practical applications.

    Types of LLM Hallucinations

    Hallucinations in LLMs manifest in several distinct forms:

        Factual Inaccuracies: These occur when the model provides information that is simply wrong or misleading. Despite drawing from extensive training data, LLMs can mix facts incorrectly, invent dates, misattribute quotes, or otherwise present erroneous content as truth.

        Nonsensical Responses: Sometimes the output lacks logical coherence—sentences or paragraphs may be grammatically correct yet make no real sense, fail to connect ideas meaningfully, or veer into absurdity without clear reason.

        Contradictions: An LLM may produce conflicting statements either within a single response or between its response and the input prompt. This inconsistency can confuse users and reduce confidence in the model’s reasoning.

        Irrelevant or Off-Topic Content: The model might wander away from the subject at hand, introducing information or tangents that have little or no connection to the user’s query or the surrounding context. This distracts from the conversation’s purpose and reduces clarity.

  • Common Mistakes in Evaluation

    One frequent error is over-reliance on a single metric, which fails to capture the multidimensional nature of language tasks. Using outdated benchmarks can misrepresent modern model abilities or ignore emerging challenges. Data leakage—where test data overlaps with training data—can artificially inflate scores and mislead evaluations.

    Best Practices

    Combining human and automatic evaluation leverages the speed of algorithms and the insightfulness of human judgment. Regularly updating benchmarks ensures evaluations remain relevant amid rapid LLM advancements. Robustness testing against adversarial inputs and real-world scenarios helps assess how models perform outside controlled environments.

  • Benchmarks

    Benchmarks provide standardized datasets and tasks to compare model performance. Popular benchmarks for LLMs include GLUE and SuperGLUE for language understanding, SQuAD for question answering, as well as more specialized domain tests focused on coding ability or multilingual competence. These benchmarks help track progress and identify gaps across diverse challenges.

    Core Automatic Metrics

    Common quantitative metrics include perplexity, which measures how well a model predicts text; accuracy and F1 score for classification-type tasks; and BLEU and ROUGE for evaluating text similarity in translation or summarization tasks. These metrics offer objective, reproducible ways to gauge model capability on discrete aspects of language.

    Limitations of Metrics

    While useful, automatic metrics have blind spots. They often miss subtleties like contextual appropriateness, creativity, or ethical risks. Some metrics can be gamed by models optimizing for score rather than quality, leading to misleading conclusions. Therefore, relying solely on metrics without complementary evaluation methods, such as human evaluation, can obscure a model’s true performance capabilities.