X

Compare

Overview

Claude vs Gemini for weight Comparison in different aspects of AI services with data mining from genuine user reviews & ratings, including: ALL,Interesting,Helpfulness,Correctness. AI store is a platform of genuine user reviews,rating and AI generated contents, covering a wide range of categories including AI Image Generators, AI Chatbot & Assistant, AI Productivity Tool, AI Video Generator, AI in Healthcare, AI in Education, AI in Lifestyle, AI in Finance, AI in Business, AI in Law, AI in Travel, AI in News, AI in Entertainment, AI for Kids, AI for Elderly, AI Search Engine, AI Quadruped Robot.

Reviews Comparison

Recommended Tags



  • wilsonz1989 2024-09-03 18:06
    Interesting:3,Helpfulness:5,Correctness:5

    I've tested the AI model implementation ability of multiple AI Code Generators. And for this prompt "Self Attention Layer in Transformer", Claude provides the best overall response. The pytorch implementation of Transformer model's self attention layer is correct. And the detailed explanation is really helpful. For example, it introduce the breakdown of the python code, including the "__init__" method, the forward method, the input tensor, the attention energy function, etc. And I will also attach the detailed text response here: """ Claude Sonnet 3.5 Here's a breakdown of how this code works: The __init__ method initializes the linear layers for the value, key, and query projections, as well as the final linear layer that combines the multi-head outputs. The forward method takes in the values, keys, query, and an optional mask tensor. The input tensors are reshaped to have a separate dimension for the number of heads, so that we can perform multi-head attention in parallel. The value, key, and query tensors are projected using the respective linear layers. The attention energy is computed using the einsum operation, which performs the scaled dot-product attention operation across all heads in parallel. If a mask tensor is provided, the attention energies corresponding to the masked positions are set to a very large negative value, so that they get zeroed out after applying the softmax. The attention probabilities are computed by applying softmax to the attention energies. The values are weighted by the attention probabilities using another einsum operation, and the results are concatenated across the head dimension. Finally, the concatenated tensor is projected using the final linear layer to obtain the output of the self-attention layer. You can use this SelfAttention module as a part of your Transformer architecture by applying it to the input embeddings and optionally using the output as the input to the feed-forward network. """




  • wilsonz1989 2024-09-03 18:19
    Interesting:3,Helpfulness:3,Correctness:4

    Overall, Gemini did a good job though not optimal in the pytorch model coding test. I asked the same question on ChatGPT vs Gemini vs Claude. Among all these three platforms, Gemini‘s python code is correct but less concise. It also uses some unusual functions of pytorch package, such as """ context.permutate(0, 2, 1, 3).view(...) """. This is just a weighted average tensor operation, so this implementation is very uncommon and not necessary. And ChatGPT and Claude use the correct function for this operation, which is """ torch.einsum().reshape() """. Even though it's not a big deal, but generating common python code in Gemini is not optimal.