Search AI Agent Marketplace
Try: Autonomous Agent GUI Agent MCP Server Sales Agent HR Agent
Overview
Most Reviewed
Qwen3-0.6B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 0.6B Number of Paramaters (Non-Embedding): 0.44B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768 # Qwen3-0.6B ## Qwen3 Highlights Qwen3 is the latest generation of large language models
DeepSeek-Prover-V2 is an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thou
Qwen3-32B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 32.8B Number of Paramaters (Non-Embedding): 31.2B Number of Layers: 64 Number of Attention Heads (GQA): 64 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. # Qwen3-32B ## Qwen3 Highlights Qwen3 is the late
Qwen3 14B has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 14.8B - Number of Paramaters (Non-Embedding): 13.2B - Number of Layers: 40 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: 32,768 natively and . # Qwen3-14B ## Qwen3 Highlights Qwen3 is the latest generati
Qwen3-8B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 8.2B Number of Paramaters (Non-Embedding): 6.95B Number of Layers: 36 Number of Attention Heads (GQA): 32 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. # Qwen3-8B ## Qwen3 Highlights Qwen3 is the latest
Qwen3-4B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 4.0B Number of Paramaters (Non-Embedding): 3.6B Number of Layers: 36 Number of Attention Heads (GQA): 32 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. # Qwen3-4B ## Qwen3 Highlights Qwen3 is the latest gen
Qwen3-1.7B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 1.7B Number of Paramaters (Non-Embedding): 1.4B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768 # Qwen3-1.7B ## Qwen3 Highlights Qwen3 is the latest generation of large language models in
Top Rated
Qwen3-0.6B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 0.6B Number of Paramaters (Non-Embedding): 0.44B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768 # Qwen3-0.6B ## Qwen3 Highlights Qwen3 is the latest generation of large language models
Qwen3-32B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 32.8B Number of Paramaters (Non-Embedding): 31.2B Number of Layers: 64 Number of Attention Heads (GQA): 64 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. # Qwen3-32B ## Qwen3 Highlights Qwen3 is the late
DeepSeek-Prover-V2 is an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thou
Qwen3 14B has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 14.8B - Number of Paramaters (Non-Embedding): 13.2B - Number of Layers: 40 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: 32,768 natively and . # Qwen3-14B ## Qwen3 Highlights Qwen3 is the latest generati
Qwen3-8B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 8.2B Number of Paramaters (Non-Embedding): 6.95B Number of Layers: 36 Number of Attention Heads (GQA): 32 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. # Qwen3-8B ## Qwen3 Highlights Qwen3 is the latest
Qwen3-4B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 4.0B Number of Paramaters (Non-Embedding): 3.6B Number of Layers: 36 Number of Attention Heads (GQA): 32 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. # Qwen3-4B ## Qwen3 Highlights Qwen3 is the latest gen
Qwen3-1.7B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 1.7B Number of Paramaters (Non-Embedding): 1.4B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768 # Qwen3-1.7B ## Qwen3 Highlights Qwen3 is the latest generation of large language models in
REASON
Qwen3-0.6B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 0.6B Number of Paramaters (Non-Embedding): 0.44B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768 # Qwen3-0.6B ## Qwen3 Highlights Qwen3 is the latest generation of large language models
DeepSeek-Prover-V2 is an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thou
Qwen3-32B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 32.8B Number of Paramaters (Non-Embedding): 31.2B Number of Layers: 64 Number of Attention Heads (GQA): 64 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. # Qwen3-32B ## Qwen3 Highlights Qwen3 is the late
Qwen3 14B has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 14.8B - Number of Paramaters (Non-Embedding): 13.2B - Number of Layers: 40 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: 32,768 natively and . # Qwen3-14B ## Qwen3 Highlights Qwen3 is the latest generati
Qwen3-8B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 8.2B Number of Paramaters (Non-Embedding): 6.95B Number of Layers: 36 Number of Attention Heads (GQA): 32 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. # Qwen3-8B ## Qwen3 Highlights Qwen3 is the latest
Qwen3-4B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 4.0B Number of Paramaters (Non-Embedding): 3.6B Number of Layers: 36 Number of Attention Heads (GQA): 32 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. # Qwen3-4B ## Qwen3 Highlights Qwen3 is the latest gen
Qwen3-1.7B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 1.7B Number of Paramaters (Non-Embedding): 1.4B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768 # Qwen3-1.7B ## Qwen3 Highlights Qwen3 is the latest generation of large language models in
Reviews
Tags
-
Grok 4 is the strongest model released by XAI by 2025. Any interestingly, if it surpasses the AIME and other baseline, not sure if it have any data leakage. But it still worth trying in daily work.
-
Qwen 0.6B small size LLM is extremely powerful in realworld applications such as search and recommendation, query intent recognition, etc. And Qwen3 0.6B model is the SOTA one compared to previous counterparts such as Gemini and Llama small size LLM.
-
Qwen3 32B model series are the most widely adopted and deployed model in industrial applications, which compromise of inference speed and performance. This updated version of Qwen3 32B model have the thinking mode and non-thinking mode, which supports both the common task of chat/text generation and more complex task of math, code generation, etc. On the AIME and many other math benchmarks, Qwen3 surpass many of the opensource counterpart.
-
Prompt: How to use KL divergence to help regularize the RL training of large reasoning model? What's the drawback of current RL algorithm?There is not public access to test the prover model. And I tried to use a previous prompt in machine learning to ask DeepSeek model to make a proof. But it seems like the question is over simplified and it only gave some introductory summarization. But the thinking process is quite interesting.
Write Your Review
Detailed Ratings
-
Community
-
大家在使用可灵AI生成视频的时候遇到了哪些好的体验和有问题的体验?请务必写明prompt输入文本和视频截图or短视频clip
-
大家在使用抖音的即梦AI生成视频的时候遇到了哪些好的体验和有问题的体验?请务必写明prompt输入文本和视频截图or短视频clip
-
大家在使用快手(Kuaishou Kwai)短视频的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用小红书(Xiaohongshu)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用微信(WeChat)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用微信(WeChat)APP的AI问答功能的时候,遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用知乎(Zhihu)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用京东(JD)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用淘宝(Taobao)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用支付宝(Alipay)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用拼多多(PPD Temu)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用知乎直答(Zhihu)AI搜索功能的时候,遇到了哪些好的体验和有问题的体验?请麻烦写一下当时输入的条件,比如prompt输入文本,或者是上传截图。
-
大家在使用知乎直答(Zhihu)AI搜索功能的时候,遇到了哪些好的体验和有问题的体验?请麻烦写一下当时输入的条件,比如prompt输入文本,或者是上传截图。
-
大家在使用快手(Kuaishou)的AI搜索功能的时候,遇到了哪些好的体验和有问题的体验?请麻烦写一下当时输入的条件,比如prompt输入文本,或者是上传截图。
-
大家在使用抖音(Douyin Tiktok)的AI搜索功能的时候,遇到了哪些好的体验和有问题的体验?请麻烦写一下当时输入的条件,比如prompt输入文本,或者是上传截图。
-
Please leave your thoughts on the best and coolest AI Generated Images.
-
Please leave your thoughts on free alternatives to Midjourney Stable Diffusion and other AI Image Generators.
-
Please leave your thoughs on the most scary or creepiest AI Generated Images.
-
We are witnessing great success in recent development of generative Artificial Intelligence in many fields, such as AI assistant, Chatbot, AI Writer. Among all the AI native products, AI Search Engine such as Perplexity, Gemini and SearchGPT are most attrative to website owners, bloggers and web content publishers. AI Search Engine is a new tool to provide answers directly to users' questions (queries). In this blog, we will give some brief introduction to basic concepts of AI Search Engine, including Large Language Models (LLM), Retrieval-Augmented Generation(RAG), Citations and Sources. Then we will highlight some majors differences between traditional Search Engine Optimization (SEO) and Generative Engine Optimization(GEO). And then we will cover some latest research and strategies to help website owners or content publishers to better optimize their content in Generative AI Search Engines.
-
We are seeing more applications of robotaxi and self-driving vehicles worldwide. Many large companies such as Waymo, Tesla and Baidu are accelerating their speed of robotaxi deployment in multiple cities. Some human drivers especially cab drivers worry that they will lose their jobs due to AI. They argue that the lower operating cost and AI can work technically 24 hours a day without any rest like human will have more competing advantage than humans. What do you think?
Reply