Elainay
链接
往期整理
  •   历史归档
  •   文章分类
  •   文章标签
E1ainay
Article
14
Category
4
Tags
8
链接
往期整理
历史归档
文章分类
文章标签
Transformers
AutoDroid: LLM-powered Task Automation in Android
Post on: 2024-11-13
Last edited: 2024-11-15
Views
LLM
Agent
type
status
date
slug
summary
tags
category
icon
password
comment
RelevancePurposeKey ProblemMethodTask-oriented UI PromptingExploration-based Memory InjectionSimulated Task Generation(LLM Generte)Augmenting Prompts with App MemoryTuning Local LLM with App-specific DataMulti-granularity Query OptimizationEvaluation and Results
AutoDroid: LLM-powered Task Automation in Android
Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the...
AutoDroid: LLM-powered Task Automation in Android
https://arxiv.org/abs/2308.15272
AutoDroid: LLM-powered Task Automation in Android
AutoDroid
MobileLLM • Updated Nov 13, 2024

Relevance

  • Current commercial AI assistants use developer-driven approaches with NLU modules and developer-defined functions, but face scalability challenges when supporting new tasks.
  • LLMs can now utilize tools like search engines, code interpreters, and APIs.

Purpose

  • an autonomous agent that can complete user-specified tasks by interacting with the smartphone

    Key Problem

    1. GUI Representation: Convert GUI states and actions to text format to help LLMs understand and make decisions.
    1. Knowledge Integration: LLMs need domain-specific knowledge to navigate and complete tasks in complex smartphone apps.
    1. Cost Optimization: Optimize LLM query efficiency to provide a responsive task automation experience.

    Method

    Task-oriented UI Prompting

    • a GUI parsing module to convert GUI to a simplified HTML representation:automatically scrolls and records the information
    The classes and properties of GUI elements
    The classes and properties of GUI elements
    • Restricting the Action Space with Selections:requirement“- id=<id number> action=<tap/input> input text=<text or N/A> (in the event of task completion, id=-1)”

    Exploration-based Memory Injection

    Simulated Task Generation(LLM Generte)

    • UTG——UI Transition Graph
    • Simulated task——function of element
    notion image

    Augmenting Prompts with App Memory

    1. embedding model: {S}
      1. cosine similarity between the embeddings of the simulated task S and the current task T
    1. Give hints about the UI elements in {S}
    1. PromptGenertor(T,UI,History)
    notion image
     
     

    Tuning Local LLM with App-specific Data

     

    Multi-granularity Query Optimization

    1. Pruning Tokens by Merging Functionally Equivalent Elements.
        • two UI elements leads to the same interface
        • UI leaf nodes sharing the same interactive ancestor (button, checkbox, text field, etc.)
    1. Reducing Query Times by Shortcuts and GUI Merging.
        • GUI merging is to include several GUI states into one prompt if LLMs need them all to make decisions.(scroll down)
        • execute simple actions directly with the help of the app memory

    Evaluation and Results

    • Reproducible results
      •  
     
    • Author:E1ainay
    • URL:https://e1ainay.top/13d94b63edfa8191ace0ec88e141c3af
    • Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
    Relate Posts
    Attacking Vision-Language Computer Agents via Pop-ups
    Enabling Conversational Interaction with Mobile UI using Large Language Models
    LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation
    DroidBot: A Lightweight UI-Guided Test Input Generator for Android
    PERSONAL LLM AGENTS: INSIGHTS AND SURVEY ABOUT THE CAPABILITY, EFFICIENCY AND SECURITY
    PERSONAL LLM AGENTS: INSIGHTS AND SURVEY ABOUT THE CAPABILITY, EFFICIENCY AND SECURITY关于yolov8中数据集相对路径错误(找不到)的问题
    Loading...
    Catalog
    0%
    RelevancePurposeKey ProblemMethodTask-oriented UI PromptingExploration-based Memory InjectionSimulated Task Generation(LLM Generte)Augmenting Prompts with App MemoryTuning Local LLM with App-specific DataMulti-granularity Query OptimizationEvaluation and Results
    E1ainay
    E1ainay
    淡泊明志,宁静致远🍃
    Article
    14
    Category
    4
    Tags
    8
    Latest posts
    DroidBot: A Lightweight UI-Guided Test Input  Generator for Android
    DroidBot: A Lightweight UI-Guided Test Input Generator for Android
    2024-12-18
    Attacking Vision-Language Computer Agents via Pop-ups
    Attacking Vision-Language Computer Agents via Pop-ups
    2024-12-16
    DM8168开发指南(1):Linux环境准备
    DM8168开发指南(1):Linux环境准备
    2024-12-16
    Enabling Conversational Interaction with Mobile UI using Large Language Models
    Enabling Conversational Interaction with Mobile UI using Large Language Models
    2024-12-16
    LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation
    LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation
    2024-12-16
    OI算法模板与思想记录
    OI算法模板与思想记录
    2024-11-22
    Announcement
    🎉Welcome to My Blog🎉
    📖I’m currently a junior in the University of Electronic Science and Technology of China(UESTC),majoring in Software Engineering.
    💡This blog is used to record my learning experiences.
    🥳If you have any suggestions,I’d be very glad to communicate with you!
     
    Catalog
    0%
    RelevancePurposeKey ProblemMethodTask-oriented UI PromptingExploration-based Memory InjectionSimulated Task Generation(LLM Generte)Augmenting Prompts with App MemoryTuning Local LLM with App-specific DataMulti-granularity Query OptimizationEvaluation and Results
    2024-2025E1ainay.

    Elainay | 淡泊明志,宁静致远🍃

    Powered byNotionNext 4.7.8.