论文解读：通过注入弹窗pop对Agent进行攻击

Attacking Vision-Language Computer Agents via Pop-ups

双系统安装和使用，介绍了Linux系统的安装和使用

DM8168开发指南(1):Linux环境准备

论文解读：这是一篇23年的文章，或许是利用LLM对理解Mobile UI并进行交互的较早的尝试。主要贡献可总结如下:将交互情景分为了四类；开发了一套提示技术能够用UI去提示大模型；并开发了一套深度优先遍历的算法，能够将安卓的UI层次结构树转化为HTML的文本；并在四项关键任务(Screen Question-Generation, Screen Summarization, Screen QuestionAnswering, and Mapping Instruction to UI Action)上进行了详细的实验。

Enabling Conversational Interaction with Mobile UI using Large Language Models

树状数组，比线段树简单，方便区间的信息查询和维护

树状数组

LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation

As many automated test input generation tools for Android need to instrument the system or the app, they cannot be used in some scenarios such as compatibility testing and malware analysis. We introduce DroidBot, a lightweight UI-guided test input generator, which is able to interact with an Android app on almost any device without instrumentation. The key technique behind DroidBot is that it can generate UI-guided test inputs based on a state transition model generated on-the-fly, and allow users to integrate their own strategies or algorithms. DroidBot is lightweight as it does not require app instrumentation, thus no need to worry about the inconsistency between the tested version and the original version. It is compatible to most Android apps, and able to run on almost all Android-based systems, including customized sandboxes and commodity devices. Droidbot is released as an open-source tool on GitHub [1], and the demo video can be found at https://youtu.be/3-aHG SazMY.

DroidBot: A Lightweight UI-Guided Test Input  Generator for Android

Since the advent of personal computing devices, intelligent personal assistants (IPAs) have been one of the key technologies that researchers and engineers have focused on, aiming to help users efficiently obtain information and execute tasks, and provide users with more intelligent, convenient, and rich interaction experiences. With the development of smartphones and IoT, computing and sensing devices have become ubiquitous, greatly expanding the boundaries of IPAs. However, due to the lack of capabilities such as user intent understanding, task planning, tool using, and personal data management etc., existing IPAs still have limited practicality and scalability. Recently, the emergence of foundation models, represented by large language models (LLMs), brings new opportunities for the development of IPAs. With the powerful semantic understanding and reasoning capabilities, LLM can enable intelligent agents to solve complex problems autonomously. In this paper, we focus on Personal LLM Agents, which are LLM-based agents that are deeply integrated with personal data and personal devices and used for personal assistance. We envision that Personal LLM Agents will become a major software paradigm for end-users in the upcoming era. To realize this vision, we take the first step to discuss several important questions about Personal LLM Agents, including their architecture, capability, efficiency and security. We start by summarizing the key components and design choices in the architecture of Personal LLM Agents, followed by an in-depth analysis of the opinions collected from domain experts. Next, we discuss several key challenges to achieve intelligent, efficient and secure Personal LLM Agents, followed by a comprehensive survey of representative solutions to address these challenges.

PERSONAL LLM AGENTS:  INSIGHTS AND SURVEY ABOUT THE CAPABILITY, EFFICIENCY  AND SECURITY

Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the limited language understanding ability and the non-trivial manual efforts required from developers or end-users. The recent advance of large language models (LLMs) in language understanding and reasoning inspires us to rethink the problem from a model-centric perspective, where task preparation, comprehension, and execution are handled by a unified language model. In this work, we introduce AutoDroid, a mobile task automation system capable of handling arbitrary tasks on any Android application without manual efforts. The key insight is to combine the commonsense knowledge of LLMs and domain-specific knowledge of apps through automated dynamic analysis. The main components include a functionality-aware UI representation method that bridges the UI with the LLM, exploration-based memory injection techniques that augment the app-specific domain knowledge of LLM, and a multi-granularity query optimization module that reduces the cost of model inference. We integrate AutoDroid with off-the-shelf LLMs including online GPT-4/GPT-3.5 and on-device Vicuna, and evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks. The results demonstrated that AutoDroid is able to precisely generate actions with an accuracy of 90.9%, and complete tasks with a success rate of 71.3%, outperforming the GPT-4-powered baselines by 36.4% and 39.7%. The demo, benchmark suites, and source code of AutoDroid will be released at url{https://autodroid-sys.github.io/}.

AutoDroid: LLM-powered Task Automation in Android

之前在学习时发现很多视频评论都有关于v8数据集找不到的问题，但似乎网上相关内容较少，今天闲来写篇博客解释一下

关于yolov8中数据集相对路径错误（找不到）的问题

多项式

组合数学

数论

浅谈二分

搬移了以前写过的部分算法和笔记，以后有新的算法笔记也会更新