type
status
date
slug
summary
tags
category
icon
password
comment
IntroductionThe main content and contributions of this paper can be summarized as follows: A Brief History of Intelligent Personal AssistantsTimeline View of the Intelligent Personal Assistants HistoryTechnical View of the Intelligent Personal Assistants HistoryPersonal LLM Agents:Definition & InsightsKey ComponentsIntelligence Levels of Personal LLM AgentsOpinions on Common Problemswhat features are desired for an ideal IPA?what are the most urgent technical challenges?Fundamental CapabilitiesTask Automation Methods1. code-based Task Automation2. UI-based Task AutomationAutonomous Agent FrameworksEvaluationContext SensingSensing sorces:Sensing Targets:MemorizingObtaining MemoryManaging and Utilizing MemoryEffiencyEfficient InferenceModel CompressionInference AccelerationMemory ReductionEnergy OptimizationEfficient CustomizationContext Loading Efficiency:Fine-tuning Efficiency:Efficient Memory ManipulationSearch EfficiencyWorkflow OptimizationSecurity and PrivacyConfidentialityLocal ProcessingSecure Remote ProcessingData MaskingInformation Flow ControlIntergrityAdversarial AttacksBackdoor AttacksPrompt Injection AttacksReliabilityProblemsImprovementInspectionConclusion and Outlook
Introduction
- we focus primarily on the aspects related to “personal” parts within Personal LLM Agents, encompassing the analysis and utilization of users’ personal data, the use of personal resources, deployment on personal devices, and the provision of personalized services
- Some approaches have attempted to automatically learn to support tasks through supervised learning or reinforcement learning. However, these methods also rely on a substantial amount of manual demonstrations and/or the definition of reward functions.
The main content and contributions of this paper can be summarized as follows:
- We summarize the status quo of existing IPAs in both industry and academia, while analyzing their primary limitations and future trends in the LLM era.
- We collect insights from senior domain experts in the area of LLM and personal agents, proposing a generic system architecture and a definition of intelligence levels for personal LLM agents.
- We review the literature on three important technical aspects of personal LLM agents, including foundamental capabilities, efficiency, and security & privacy.
A Brief History of Intelligent Personal Assistants
Timeline View of the Intelligent Personal Assistants History
- Speech Recognition System → Voice-based Software → Virtual Personal Assistant →LLM-based Chatbot
Technical View of the Intelligent Personal Assistants History
- Template-based Programming:
Given a user command, the agent first map the command to the most relevant template, then follow the predefined steps to complete the task.
- Supervised Learning Methods
how to learn a representation of software GUI and how to train the interaction model
Supervised Learning Methods (1)
名称
描述
Transformer network : mapping(Instructions→action tuples;UI obj→obj descriptions;action tuples↔descriptions
A pair of UIs as input to capture the semantics info
Mobile GUI Understanding:Pixel-Words(atomic text and graphic components)→a pixel-based GUI understanding framework→screen sentence
A multimodal Transformer: images, structures and language→5 distinct tasks.(jointly encoders and trained)
an auto-regressive transformer:language input↔command or question↔ans.
Based on the self-aligned characteristics between components of different modalities
a well-designed joint image-text model
vision-only approach : take screenshots and a region of interest (the “focus”) as input
composed of a vision encoder and a language decoder
leverage text-based instruction manuals and user guides to curate a multimodal dataset
fuse text and visual features as input to the co-attention transformer layers,
- Reinforcement Learning Methods
During the interaction, the agent gets feedback of rewards that indicate the progress of task completion, and it gradually learns how to automate the tasks by maximizing the reward payoff.To train RL-based task automation agents, a reward function that indicates the progress towards task completion is required.
- Early Adoption of Foundation Models
Let LLMs use tools autonomously to accomplish complex tasks.
Some commercial products have attempted to integrate LLM with IPA.
Many issues related to efficiency, security and privacy have not been adequately addressed yet.
Personal LLM Agents:Definition & Insights
LLM-based agent deeply integrated with personal data, personal devices, and personal services.
Key Components
The combination of these key components is analogous to an operating system.
Main components of Personal LLM Agents | traditional operating systems |
Foundation Model | kernel |
Local recourse | driver programs |
User context and memory | program contexts and system logs |
Skills | software application |
Intelligence Levels of Personal LLM Agents
We categorize the intelligence levels of Personal LLM Agents into five levels, denoted as L1 to L5
Level | Key Characteristics | Representative Use Cases |
L1 - Simple Step Following | Agent completes tasks by following exact steps predefined by the users or the developers. | - User: “Open Messenger”; Agent opens the app named Messenger.
- User: “Open the first unread email in my mailbox and read its content”; Agent follows the command step by step.
- User: “Call Alice”; Agent matches a developer-defined template, finds Alice’s phone number in the address book, and calls the number. |
L2 - Deterministic Task Automation | Based on the user’s description of a deterministic task, agent auto-completes the necessary steps in a predefined action space. | - User: “Check the weather in Beijing today”; Agent automatically calls the weather API with parameter “Beijing” and parses info. from the response.
- User: “Make a video call to Alice”; Agent automatically opens the address book, finds Alice’s contact, and clicks on “video chat”.
- User: “Tell the robot vacuum to clean the room tonight”; Agent opens the robot vacuum app, clicks ‘schedule’, and sets the time to tonight. |
L3 - Strategic task Automation | Based on user-specified tasks, agents autonomously plan the execution steps using various resources and tools, and iterates the plan based on intermediate feedback until completion. | - User: “Tell Alice about my schedule for tomorrow”; Agent gathers tomorrow’s schedule information from the user’s calendar and chat history, then summarizes and sends them to Alice via Messenger.
- User: “Find out which city is suitable for travel recently”; Agent lists several cities suitable for travel, checks the weather in each city, summarizes the information, and returns recommendations.
- User: “Record my sleep quality tonight”; Agent checks every 10 minutes during sleep time if the user is using the phone, moving, or snoring (based on smartphone sensors and microphone), summarizes the information, and generates a report. |
L4 - Memory and Context Awareness | Agent senses user context, understands user memory, and proactively provides personalized services at appropriate times. | - Agent recommends suitable financial products automatically based on User’s recent income and expenses, considering User’s personality and risk preference.
- Agent estimates User’s recent anxiety level based on the conversations and behaviors, recommends movies/music to help relax and notifies user’s friends or doctors depending on the severity.
- When a user falls in the bathroom, the Agent detects the event and decides whether to ask the user, notify the user’s family members, or call for help based on the user’s age and physical conditions. |
L5 - Autonomous Avatar | Agent fully represents the user in completing complex affairs, can interact on behalf of user with other users or agents, ensuring safety and reliability. | - Agent automatically reads emails and messages on behalf of User, replies to questions without user intervention, and summarizes them into an abstract.
- Agent attends the work discussion meeting on behalf of the user, expresses opinions based on user’s work log, listens to suggestions, and writes the minutes.
- Agent records User’s daily diet and activities, privately researches or ask experts on any anomalies, and makes health improvement suggestions. |
Opinions on Common Problems
Questions | Rank 1st | Rank 2nd | Rank 3rd |
where to deploy the LLM | edge-cloud collaborated architecture | local deployment | cloud-only |
how to customize the agents | combine the advantages of both fine-tuning and in-context learning | fine-tuning | in-context learning |
what modalities to use | Text | Image | Video |
which LLM ability is the most crucial for IPA products | Language understanding | academic→long context | common sense reasoning |
how to interact with the agents | Voice-based | Text-based | GUI |
which agent ability is needed to develop | intelligent and autonomous decision-making | Continuous improvement of user experience and interaction methods | Secure handling of personal data |
what features are desired for an ideal IPA?
- Efficient Data Management and Search
- Autonomous Task Planning and Completion
- Work and Life Assistance
- Emotional Support and Social Interaction
- Personalized Services and Recommendations
- Digital Representative and Beyond
what are the most urgent technical challenges?
- Intelligent: 1️⃣Multimodal Support 、2️⃣Context Understanding and Context-aware Actions 、3️⃣Enhancing Domain-specific Abilities of Lightweight LLM
- Performance: 1️⃣Effective LLM Compression or Compact Architecture2️⃣Practical Local-Remote Collaborative Architecture
- Security & Privacy: 1️⃣Data Security and Privacy Protection 2️⃣Inference Accuracy and Harmlessness
- Personalization & Storage:efficient data storage solutions to manage and leverage userrelated data
- Traditional OS Support:LLM-friendly interfaces and support (API) of traditional operating systems like Android
Fundamental Capabilities
Task execution 、context sensing、memorization
Task Automation Methods
1. code-based Task Automation
- Slot-filling method : slot-value pairs
- Program synthesis method
- fine-tune LLMs to use specific APIs
- utilize the chain reasoning and in-context learning ability of LLMs→show descriptions and demonstrations of the tools (e.g. APIs, other DNNs, etc.) in context
2. UI-based Task Automation
- text-based GUI representation
- Multimodal representation
Autonomous Agent Frameworks
Evaluation
- Metrics and Benchmarks
Context Sensing
- Hardware-based:acquisition through various sensors
- Software-based:a form of software-base sensing(typing habits)
purposes:
- Enabling Sensing Tasks:
- Supplementing Contextual Information:
- Triggering Context-aware Services:
- Augmenting Agent Memory:
Sensing sorces:
- Hardware Sensor,Software Sensor,Combination of Multiple Sensors
How to understand and utilize?
- Sensor Data as Prompt.
- Sensor Data Encoding + Fine-tuning.
- Redirecting Sensor Data to Domain-Specific Models.
Sensing Targets:
- Sensing the Environment
- Scene sensing:more tangible environmental factors(locations and places)
- Occasion perception:deeper environmental information(religious and cultural backgrounds…)
- Sensing the User
- Short-term user sensing
- Long-term user sensing
Memorizing
the capability to record, manage and utilize historical data
Obtaining Memory
From Logging and Inferring
Managing and Utilizing Memory
- Raw Data Management and Processing:selecting, filtering, transforming to other formats, etc
- Memory-augmented LLM Inference:
- Short-term memory : the form of symbolic variables during the current decision cycle.This includes perceptual inputs, active knowledge and other core information from memory or the previous.
- Long-term memory : experiences from earlier decision cycles. This includes history event flows, game trajectories from previous episodes, interaction information.
- Agent Self-evolution:
- Learning Skills: skills as code or APIs through the strategic use of prompts.Agents could acquire novel skills by linking skills within a foundational skill set.
- Finetuning LLM:
- LLMs were not specifically designed for agent-specific use cases.
- Limited device makes it difficult to acquire new skills through prior knowledge and in-context learning abilities.
- New knowledge and tools change the task schemas,demanding adaptation of LLMs.
reasons:
Fine-tuned smaller LLMs could outperform prompted larger LLMs with reduced inference and expense in specific case.
Effiency
Three processes desire careful optimization of effiency:
- Inference(bottleneck):a lot of both computation and memory resources
- Customization:different context tokens or tune the LLM with domain-specific data
- Memory manipulation:require access to longer contexts or external memories,handle and managemeny.
Efficient Inference
Model Compression
- Quantization:reduce the model size by using fewer bits to represent the model parameters, and also reduces computations with system-level support for quantized kernels.
- weight only quantization(WOQ):integer quantization(INT4 and INT8) on weights only, while preserving activations in float formats (FP16 and FP32).
- INT8 quantization for both weights and activations: the activations including KV pairs are more difficult to quantize because of outliers.
- low-bit floating point quantization(FP4 and FP8): higher computational performance
post-training quantization(PTQ) : quantize after train(available and flexible) quantization-aware training(QAT) : train after quantization
- Pruning: removing less important connections in the network
Structured pruning: removes weights in regular patterns, such as a rectangle block Unstructured pruning: eazier to maintain model accuracy but less hardware-friendly
- Knowledge Distillation(KD): using a well-performing teacher model (a large number of parameters and high precision) to guide the training of a lightweight student model(fewer parameters and lower precision)
White-box: require teacher model’s parameters Black-box: don’t require
KD is also adopted in QAT and pruning techniques to enhance the training performance.
- Low-rank Factorization: Low-rank Factorization can be combined with quantization and pruning.
Inference Acceleration
the computational cost of attention increases near quadratically with the context length
- KV Cache:storing (i.e., “caching”) and incrementally updating the Key-Value (KV) pairs in each token’s generation.
- Context Compression:
- Co-quantization of weights and activations, including KV cache.
- compress the context at the prefill stage based on different importance of tokens(one-shot and cannot prune the KV cache).
- a learnable mechanism to continuously determine and drop uninformative tokens.
- reduce computations of less important tokens instead of directly removing them.
- Kernel Optimization:
- efficient attention kernels including FlashAttention and Flashdecoding++
- reduce the computational complexity of attention from the algorithm aspect and achieve linear complexity for self-attention in the prefill phase.
- reduce dequantization overhead.
Small-batch or single-batch inference is especially important for edge scenarios.
The complexity of attention scales quadratically with the sequence length, while that of the FFN scales linearly.
- Speculative Decoding:
An effective approach in small-batch inference to improve the latency.
Speculative decoding mitigates this challenge by “guessing” several subsequent tokens through a lightweight “draft model”, and then validating the draft tokens in batches using the large “oracle model”.
Memory Reduction
KV cache and model weights are two major causes of this memory overhead.
short-context→the model compression / long-context→the KV cache
Energy Optimization
energy-consumption increase the runtime cost and carbon footprint, hurt the quality of experience (QoE) due to increased temperature and shorten battery lifespan.
Two major cause: computation and memory access
software perspective: model compression,KV cache,efficient attention kernels hardware perspective: utilize efficient processors including NPUs and TPUs/FPGA-based.
The research on energy efficiency is insufficient due to the complexity of hardware deployment and the volatility of energy measurement and analysis.
Efficient Customization
Two ways: feed the LLM with different contextual prompts and tune the LLM with domain-specific data.
Efficiency: context loading efficiency and LLM fine-tuning efficiency.
Context Loading Efficiency:
- A straightforward way is to prune some redundant tokens or shorten the context length.
- Another way is to reduce the bandwidth consumption during context data transmission.
- different input prompts may have overlapping text segments,pre-computing,storing and reuse these segments.
Fine-tuning Efficiency:
- Efficient Optimizer Design(skip)
- Training Data Curation
a small amount of high-quality data can lead to significantly reduced training cost and achieve capabilities comparable to large-scale datasets and models.
Efficient Memory Manipulation
Retrieve external memory which will be injected through prompt concatenation or intermediate layer cross-attention.
Search Efficiency
a brute-force approach results in a computational complexity of O(DN )→indexing is commonly employed to expedite query searching by reducing the number of required comparisons.
- Typical Indexing Algorithms:partitioning methods include randomization→data structures
- Hardware-aware Index Optimization:the utilization of disk-based indexes or the co-design of hardware and algorithms
- Search Mechanism Design:
Multiple similarity criteria can be employed to evaluate vector similarity.Rule-based or estimated-cost-based methods configured offline are often employed to determine the optimal search plan.
Combine vector search with metadata filters.
- Search Process Execution:
Several hardware acceleration methods,multi-threading,multi-core parallelism,GPU,distributed clusters.
Workflow Optimization
Traditional workflow is sequential, with inference/retrieval stage idle while conducting retrieving/generation.
→the potential of execuation parallelism and retrieval locality of requests
- Pipelining
- Caching
Select reason: the temporal and spatial locality of retrieved documents.use knowledge tree to organize in the GPU and host memory hierarchy.
the embedding and generative model are equivalent→Query Caching or Query-Doc Caching and save computation overhead
Security and Privacy
Three security principles including confidentiality integrity, and reliability
Confidentiality:the protection of user data privacy
Integrity: the intention not modified or influenced by malicious parties
Reliability: the agents’ internal mistakes
Confidentiality
Local Processing
lightweight models ,deployment frameworks and compression techniques.
Secure Remote Processing
- Homomorphic encryption(HE):employ encryption to encode in the client→server conducts inference on the ciphertext→decryption
Challenge:certain operations in the LLMs, such as max, min, and softmax, cannot be accurately performed using HE;inference speed is slow.
- Multi-Party Communicatio
- Using the trusted execution environments
Data Masking
transform the original inputs into a form that is not privacy-sensitive while preserving the crucial information
- hide or replace sensitive content
- embedding-based data anonymization approaches(outperform HE but more risky)
Information Flow Control
There may also exist the risks of privacy leakage in the model output.
Rule-based permission control to constrain what LLMs can do and what LLMs can access.
Intergrity
output the intended content correctly, even when faced with various types of attacks
Adversarial Attacks
Attack through the specialized customization of the model’s inputs(image,text,graph…) or malicious tampering with the model.
- adversarial defense, abnormal input detection, input preprocessing, output security verification, and more.
- adversarial training through parameter-efficient fine-tuning.
Backdoor Attacks
- through data poisoning , inserting maliciously modified samples into the model’s training data, enabling the model to learn deliberate hidden decision logic
- modify the model input during the test time
- modify the prompts, essentially fine-tunes the model’s parameters and thus alters its decision logic
- distill benign knowledge from poisoned pre-trained encoders and transfer it to a new encoder
Prompt Injection Attacks
Bypass the preset security safeguards by using subtle or special diction in the prompts.
- ensure the transparency and security of the LLM’s prompts.
- distinguish the third-party from the system’s inherent promptsRe.
Reliability
Numerous critical actions include some sensitive operations→reliability.
Problems
- Hallucination:incorrect answers,coherent and fluent but ultimately erroneous.
- Unrecognized Operation:agent is required to execute actions,which have higher requirements for the format and executability of their outputs.
- Sequential Reliability:LLMs are initially pre-trained on sequential data,while problems in the real world may not be fully addressed sequentially.
Improvement
- Alignment:
the use of pre-training and finetuning,incorporating human values and intentions into their training.
Or reinforcement learning techniques.
- Self-Reflection:
leverage the model’s self-reflection and check the consistency between the responses.
Or enable multiple large model agents to engage in mutual discussion and verification.
- Retrieval Augmentation.
Inspection
focus on how to enhance or understand the reliability of agents based on the already generated results.
- Verification:Constrained Generation.
- Explanation:incorporate user opinions or human assistance.
- Intermediate Feature Analysis:harness hierarchical information across different layers of the model.
Conclusion and Outlook
The emergence of large language models presents new opportunities for the development of intelligent personal assistants, offering the potential to revolutionize the way of human-computer interaction. In this paper, we focus on Personal LLM Agents, systematically discussing several key opportunities and challenges based on domain expert feedback and extensive literature review.
Currently, research on Personal LLM Agents is in the early stages. Task execution capabilities are still relatively inadequate, and the range of supported functionalities is rather narrow, leaving significant room for improvement. Moreover, ensuring the efficiency, reliability and usability of such personal agents requries to address numerous critical performance and security issues. There exists an inherent tension between the need of large-scale parameters in LLM to achieve better service quality and the constraints of resource, privacy and security in personal agents.
Going forward, except for addressing the respective challenges in each specific direction, a joint effort is needed to establish the whole software/hardware stack and ecosystem for Personal LLM Agents. Researchers and engineers also need to carefully consider the responsibility of such technology to guarantee the benign and assistive nature of Personal LLM Agents.
- Author:E1ainay
- URL:https://e1ainay.top/article/PersonLLMAgents
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts
Attacking Vision-Language Computer Agents via Pop-ups
Enabling Conversational Interaction with Mobile UI using Large Language Models
LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation
DroidBot: A Lightweight UI-Guided Test Input Generator for Android
AutoDroid: LLM-powered Task Automation in Android