Qwen-Image-Edit released: 20B base, bilingual bilingual accurate word change and full analysis of semantic/appearance-level image editing. Qwen-Image-Edit is an image editing model launched on a 20B scale Qwen-Image pedestal. The model supports precise text editing in Chinese and English, emphasizing "addition, deletion and modification" while retaining the original font and layout. The model supports both semantic-level editing (such as object rotation, style transfer, and IP continuous creation) and appearance-level editing (such as adding/deleting/modifying objects, changing colors, changing backgrounds, and detail repair), and provides online experience, open-source weighting, and cloud API access.
1. Core Capabilities
1) Bilingual Text Editing: Support adding, deleting, and replacing Chinese and English text in images, and try to keep the original font, font size and style consistent.
2) Semantic level editing: supports object 90°/180° perspective rotation, style transfer, character consistency and IP continuous creation, emphasizing that semantics remain consistent with the overall style.
3) Appearance-level editing: Support adding/deleting/modifying, changing colors, changing backgrounds, removing debris, and repairing details while keeping irrelevant areas unchanged.
4) Pipeline idea (according to official materials): The input image is simultaneously fed into the visual semantic control and appearance reconstruction channel to balance "content consistency" and "pixel fidelity".
5) Ecological integrity: Provides web experience, open-source models and inference examples, and production-oriented cloud APIs.
2. Applicable scenarios
- E-commerce/brand: direct correction of poster typos, cross-language localization, and rapid update of promotional posters.
- Social media/short videos: style migration, batch generation of emoticons and avatars.
- Graphic design: Signboards add and generate realistic reflections, remove debris, and repair local details.
- Post-image stage: character dressing, background change, posture and perspective adjustment.
3. Quick start (online and local)
1) Online experience: Select "Image Editing" in the official chat portal, upload an image and describe the modification requirements in Chinese and English to generate results.
2) Hugging Face Inference: Provides a local inference example of QwenImageEditPipeline
, which can load weights in a GPU environment, input image + prompt
, and configure parameters such as steps, random seeds, and negative prompts.
3) ModelScope: Provide model pages and experience entrances simultaneously, which is convenient for access and download in the domestic network environment.
4. Alibaba Cloud Model Studio API access points
- Model name:
qwen-image-edit
. - Interface path: The HTTP interface of the international station supports multimodal generation services, using the JSON request body and Bearer API key authentication.
- Input structure:
input.messages[0].content
contains{"image": "<URL or Base64>"}
and{"text": "< Chinese and English prompts>"}
. - Field constraints: Forward prompt
text
is up to about 800 characters long; Negative promptsnegative_prompt
up to approximately 500 characters long; Enable intelligent rewritingof prompt_extend
;watermark
controls the "Qwen-Image" watermark switch in the lower right corner. - Image restrictions: JPG/JPEG/PNG/BMP/TIFF/WEBP; Width and height 512–4096; single image ≤10MB; URLs must not contain Chinese; The result link is valid for 24 hours.
- Billing and Limit (Singapore): Approximately $0.045/graph; 100 free credits (valid for 180 days after activation); Commit RPS=5, Concurrency=2.
- Return result: The output is a structured result with image links; It is recommended that you download and transfer your own storage immediately after the business side is implemented.
5. Practical operation and workflow suggestions
1) Chain editing is more stable: disassemble complex targets into multi-step fine-tuning (frame selection, word-by-word/zone-by-area correction), and gradually converge to the desired effect.
2) Regionalization control priority: For appearance-level modifications, first delineate the areas that need to be modified or remain unchanged to reduce irrelevant pixel drift.
3) The prompt words should be verifiable: clear objects, positions, colors, quantities, and styles; If necessary, cooperate with negative prompts to eliminate unwanted elements.
4) Caching and fault tolerance: The timeliness of cloud result links is limited, so download and caching policies need to be designed in combination with object storage, whitelisting, and retry queues.
5) Team collaboration: "Text editing", "semantic editing" and "appearance editing" are divided into different templates to facilitate the reuse of operation and design.
6. Comparison and positioning (according to official and community materials)
- Chinese text editing friendly: It has strong ability to maintain shape in small Chinese font size and complex typesetting scenarios.
- Semantic + appearance dual control idea: keep an eye on "content consistency" and "regional invariance" at the same time to reduce the risk of style drift.
- Ecological coverage: Web page demos, open source weights, and enterprise-level APIs run in parallel to shorten the path from experience to implementation.
7. Limitations and risk warning
- The benchmark results and "SOTA" expressions come from official materials, and the real business needs to be verified on its own samples.
- Extreme scenarios (ultra-small font size, strong perspective/reflection, complex backgrounds) may fail and require multiple chain fine-tuning.
- When it comes to trademarks, portraits, specific styles, and IPs, be sure to comply with copyright and platform specifications.
Q&A FAQ
Q: What core problems does Qwen-Image-Edit solve?
A: Qwen-Image-Edit solves the problem of the availability of Chinese and English "word changes" in images, and achieves a balance between semantic and appearance editing.
Q: How to experience it online?
A: Select "Image Editing" in the official chat portal, upload the image and enter the modification instructions in Chinese and English to start the experience.
Q: How is local reasoning?
A: Load QwenImageEditPipeline
in Hugging Face, enter image + prompt
, and configure parameters such as steps, negative prompts, random seeds, etc.
Q: What are the key parameters of cloud API?
A: You need to provide model=qwen-image-edit
, image
and text
in messages
; Optional negative_prompt
, prompt_extend
、watermark
; Images must meet format, size, and size restrictions.
Q: How are prices and quotas calculated?
A: The price in Singapore is about $0.045/chart; 100 free credits (valid for 180 days after activation); Commit RPS=5, Concurrency=2.
Q: Why do links expire?
A: The image link returned in the cloud is valid for 24 hours and needs to be downloaded and transferred to your own storage as soon as possible.
References
Official Blog (English/Chinese translation): https://qwenlm.github.io/blog/qwen-image-edit/
Hugging Face model card (with QwenImageEditPipeline
sample and license): https://huggingface.co/Qwen/Qwen-Image-Edit
Hugging Face Online Demo (Space): https://huggingface.co/spaces/Qwen/Qwen-Image-Edit
Alibaba Cloud Model Studio · Qwen-Image-Edit (API/price/parameters/examples): https://www.alibabacloud.com/help/en/model-studio/qwen-image-edit
Qwen Chat: https://chat.qwen.ai/?inputFeature=image_edit
GitHub · Qwen-Image repository (Apache-2.0): https://github.com/QwenLM/Qwen-Image
Qwen-Image Technical Report (arXiv): https://arxiv.org/abs/2508.02324
ModelScope Model page: https://modelscope.cn/models/Qwen/Qwen-Image-Edit