Technical Challenges
Differences between MoFA’s mobile application scenarios and traditional robot control:
Robot Control Scenario Characteristics:
- Low-latency hardware communication (microsecond level)
- Real-time processing of large sensor data
- Dora-RS shared memory
Mobile AI Agent Scenario Characteristics:
- Logical reasoning and decision-making
- Network operations and screen interaction as the main focus
- Dialogue and task planning oriented
- Relatively high latency tolerance (millisecond level is acceptable)
Dora-RS Porting Issues
Some Problems:
// Dora shared memory
shared-memory-server (Dora's own)
├── shared_memory_extended = "0.13.0"
├── raw_sync_2 = "0.1.5"
- Shared Memory Dependencies: Dora’s dependency on the
shared_memory_extendedcrate has incomplete support on mobile platforms - Inter-process Communication Mechanism: Mobile operating system IPC mechanisms differ from desktop systems
- ARM Architecture Adaptation
Solutions
Solution 1: Termux + Root Permissions for Complete Porting (Demo Phase)
Technical Architecture
Android Host
├── Termux (complete Arch Linux environment)
│ ├── libandroid-shmem (shared memory)
│ ├── Rust toolchain
│ ├── Dora-RS complete version
│ └── MoFA agent framework
└── Host Bridge App (referencing Auto.JS)
├── Screenshot API
├── Touch injection API
└── Socket communication bridge
Layered Validation Plan
Basic Environment
□ Termux installation and Root permission acquisition
□ Basic Linux command availability verification
□ Network connection and file system permission testing
pkg update && pkg install -y git curl wget
su -c "whoami" # Verify root permissions
Rust Toolchain
pkg install rust
rustc --version && cargo --version
Shared Memory
# Kernel support check
cat /proc/version
ls -la /dev/shm
git clone https://github.com/termux/libandroid-shmem
cd libandroid-shmem && make && make install
gcc test/test.c -o test_shmem && ./test_shmem
Dora
git clone https://github.com/dora-rs/dora
cd dora
cargo build --bin dora-daemon
cargo build --bin dora-coordinator
# Basic operation test
./target/debug/dora-daemon --help
MoFA
pip install -e . # Install MoFA, details omitted
python examples/hello_world/hello_world.py
Host-VM Bridge Layer Design
Screen Data Flow:
// Android Host side, demo code, not yet implemented
class ScreenBridge {
fun captureScreen(): ByteArray {
val mediaProjection = getMediaProjection()
val imageReader = ImageReader.newInstance(width, height, ImageFormat.JPEG, 1)
return compressedImageData
}
fun sendToVM(data: ByteArray) {
socket.write(data) // Send to VM via Socket
}
}
Operation Command Callback:
# MoFA Agent in VM, demo code, not yet implemented
class AndroidOperationNode:
def __init__(self):
self.bridge_socket = socket.connect("unix:///tmp/android_bridge")
def click(self, x: int, y: int):
command = {"action": "click", "x": x, "y": y}
self.bridge_socket.send(json.dumps(command))
Solution 2: Native Android Redesign
New Architecture Design
Android Native MoFA
├── Zenoh-Kotlin communication layer
├── Agent Runtime (Kotlin/Java)
├── Agent template system (simplified version)
├── Android Services
│ ├── AccessibilityService (screen operations)
│ ├── ScreenCapture (screenshot service)
│ └── ForegroundService (background persistence)
└── Mobile Node Hub
├── AutoJS integration node
├── UI Automator wrapper
├── Intent caller
└── API call node
Comparison of the First Two Solutions
| Evaluation Dimension | Termux + Linux | Native Redesign |
|---|---|---|
| Development Speed | 🟢 Rapid reuse of existing code | 🔴 Requires complete rewrite |
| Runtime Performance | 🔴 Double virtualization overhead | 🟢 Native performance |
| Functional Completeness | 🟢 100% compatible | 🔴 Functionally limited |
| User Experience | 🔴 Extremely complex installation | 🟢 Native experience |
| Maintenance Cost | 🟢 Low maintenance | 🔴 Dual maintenance |
Further Scenario Analysis
Unless developers have specific needs, such as requiring agents to operate certain mobile-exclusive apps (Taobao, WeChat, etc.), but this market is relatively small and can easily deviate from the core direction.
Seriously implementing a mobile solution is quite resource-intensive. The technical complexity is high, and the input-output ratio is not ideal.
However, we can later try packaging directly into containers, not performing operations on users’ phones, but simply running flows for information crawling, document processing, etc., which is still feasible.
Solution 3: Containerized Deployment
Container-based MoFA Mobile Stage
├── Docker/Podman container
│ ├── Complete MoFA environment
│ ├── Pre-installed common dependencies
│ └── Standardized runtime
├── Common Agents/Flows
│ ├── Web data crawling
│ ├── Document processing Flow
│ ├── API call integration
│ └── Data analysis tasks
└── Mobile adaptation
├── Responsive Web UI
├── Touch-optimized interactions
└── Offline cache support
Benefits:
- Technology stack fully reused, no need to reinvent the wheel
- Simple deployment
Solution 4: Cloud + Web Wrapper App
Later, after cloud deployment, create a web wrapper app for the MoFA Stage.
Cloud MoFA + Mobile Client
├── Cloud MoFA
│ ├── Agent runtime environment
│ ├── Task scheduling system
│ └── Result cache layer
├── Mobile App (web wrapper)
│ ├── MoFA Stage mobile version
│ ├── Task creation and monitoring
│ ├── Result display interface
│ └── Offline mode support
└── Communication layer
├── WebSocket real-time communication
├── RESTful API
└── Message queue synchronization
Benefits:
- No need to reinvent the wheel
- Consistent technology stack, simple maintenance
- Solves users’ actual needs, not mobile-for-mobile’s-sake