ACL2023 Hot Retrieval Papers Full List

rockingdingo 2023-08-31 #nlp #ACL2023 #retrieval


This blog summarizes the latest research development of retrieval papers published in ACL2023 conferences. This year there are total 76 papers related to retrieval in ACL2023. Most of the authors' affiliations are top research institutes (Google Research, DeepMind, Meta FAIR) and universities (Stanford, Berkeley, MIT, CMU and others).

Navigation

Paper List

  • 1.Explainable Recommendation with Personalized Review Retrieval and Aspect Learning

    Hao Cheng,Shuo Wang,Wensheng Lu,Wei Zhang,Mingyang Zhou,Kezhong Lu,Hao Liao
    Download URL

    https://aclanthology.org/2023.acl-long.4/


    abstract

    AbstractExplainable recommendation is a technique that combines prediction and generation tasks to produce more persuasive results. Among these tasks, textual generation demands large amounts of data to achieve satisfactory accuracy. However, historical user reviews of items are often insufficient, making it challenging to ensure the precision of generated explanation text. To address this issue, we propose a novel model, ERRA (Explainable Recommendation by personalized Review retrieval and Aspect learning). With retrieval enhancement, ERRA can obtain additional information from the training sets. With this additional information, we can generate more accurate and informative explanations. Furthermore, to better capture usersâ?? preferences, we incorporate an aspect enhancement component into our model. By selecting the top-n aspects that users are most concerned about for different items, we can model user representation with more relevant details, making the explanation more persuasive. To verify the effectiveness of our model, extensive experiments on three datasets show that our model outperforms state-of-the-art baselines (for example, 3.4% improvement in prediction and 15.8% improvement in explanation for TripAdvisor).

  • 2.Dependency resolution at the syntax-semantics interface: psycholinguistic and computational insights on control dependencies

    Iria de-Dios-Flores,Juan Garcia Amboage,Marcos Garcia
    Download URL

    https://aclanthology.org/2023.acl-long.12/


    abstract

    AbstractUsing psycholinguistic and computational experiments we compare the ability of humans and several pre-trained masked language models to correctly identify control dependencies in Spanish sentences such as â??José le prometió/ordenó a María ser ordenado/aâ?? (â??Joseph promised/ordered Mary to be tidyâ??). These structures underlie complex anaphoric and agreement relations at the interface of syntax and semantics, allowing us to study lexically-guided antecedent retrieval processes. Our results show that while humans correctly identify the (un)acceptability of the strings, language models often fail to identify the correct antecedent in non-adjacent dependencies, showing their reliance on linearity. Additional experiments on Galician reinforce these conclusions. Our findings are equally valuable for the evaluation of language modelsâ?? ability to capture linguistic generalizations, as well as for psycholinguistic theories of anaphor resolution.

  • 3.Retrieve-and-Sample: Document-level Event Argument Extraction via Hybrid Retrieval Augmentation

    Yubing Ren,Yanan Cao,Ping Guo,Fang Fang,Wei Ma,Zheng Lin
    Download URL

    https://aclanthology.org/2023.acl-long.17/


    abstract

    AbstractRecent studies have shown the effectiveness of retrieval augmentation in many generative NLP tasks. These retrieval-augmented methods allow models to explicitly acquire prior external knowledge in a non-parametric manner and regard the retrieved reference instances as cues to augment text generation. These methods use similarity-based retrieval, which is based on a simple hypothesis: the more the retrieved demonstration resembles the original input, the more likely the demonstration label resembles the input label. However, due to the complexity of event labels and sparsity of event arguments, this hypothesis does not always hold in document-level EAE. This raises an interesting question: How do we design the retrieval strategy for document-level EAE? We investigate various retrieval settings from the input and label distribution views in this paper. We further augment document-level EAE with pseudo demonstrations sampled from event semantic regions that can cover adequate alternatives in the same context and event schema. Through extensive experiments on RAMS and WikiEvents, we demonstrate the validity of our newly introduced retrieval-augmented methods and analyze why they work.

  • 4.Fantastic Expressions and Where to Find Them: Chinese Simile Generation with Multiple Constraints

    Kexin Yang,Dayiheng Liu,Wenqiang Lei,Baosong Yang,Xiangpeng Wei,Zhengyuan Liu,Jun Xie
    Download URL

    https://aclanthology.org/2023.acl-long.28/


    abstract

    AbstractSimiles occur in the creative context of describing a concept (i.e., tenor) by making a literally false yet figuratively meaningful comparison to another (i.e., vehicle). Previous efforts form simile generation as a context-free generation task, focusing on simile-style transfer or writing a simile from a given prefix. However, generated texts under such settings might be undesirable, such as hardly meeting the simile definition (e.g., missing vehicle) or difficult to address certain preferences of content as humans wish (e.g., describe the color of apples through the simile). We believe that a simile could be more qualified and user-oriented if incorporated with pre-specified constraints. To this end, we introduce controllable simile generation (CSG), a new task that requires the model to generate a simile with multiple simile elements, e.g., context and vehicle. To facilitate this task, we present GraCe, including 61.3k simile-element annotated Chinese similes. Based on it, we propose a CSG model Similor to benchmark this task, including a vehicle retrieval module Scorer to obtain the explicable comparison for a given tenor in the vehicle-unknown situation. Both statistical and experimental analyses show that GraCe is of high quality beyond all other Chinese simile datasets, in terms of the number (8 vs. 3) of annotation elements, Is-Simile accuracy (98.9% vs. 78.7%), and increasing model-performance gains for both uncontrollable and controllable simile generation. Meanwhile, Similor can serve as a strong baseline for CSG, especially with Scorer, which beats model-based retrieval methods without any re-training.

  • 5.Revealing Single Frame Bias for Video-and-Language Learning

    Jie Lei,Tamara Berg,Mohit Bansal
    Download URL

    https://aclanthology.org/2023.acl-long.29/


    abstract

    AbstractTraining an effective video-and-language model intuitively requires multiple frames as model inputs. However, it is unclear whether using multiple frames is beneficial to downstream tasks, and if yes, whether the performance gain is worth the drastically-increased computation and memory costs resulting from using more frames. In this work, we explore single-frame models for video-and-language learning. On a diverse set of video-and-language tasks (including text-to-video retrieval and video question answering), we show the surprising result that, with large-scale pre-training and a proper frame ensemble strategy at inference time, a single-frame trained model that does not consider temporal information can achieve better performance than existing methods that use multiple frames for training. This result reveals the existence of a strong â??static appearance biasâ?? in popular video-and-language datasets. Therefore, to allow for a more comprehensive evaluation of video-and-language models, we propose two new retrieval tasks based on existing fine-grained action recognition datasets that encourage temporal modeling. Our code is available at https://github.com/jayleicn/singularity.

  • 6.ThinkSum: Probabilistic reasoning over sets using large language models

    Batu Ozturkler,Nikolay Malkin,Zhen Wang,Nebojsa Jojic
    Download URL

    https://aclanthology.org/2023.acl-long.68/


    abstract

    AbstractLarge language models (LLMs) have a substantial capacity for high-level analogical reasoning: reproducing patterns in linear text that occur in their training data (zero-shot evaluation) or in the provided context (few-shot in-context learning). However, recent studies show that even the more advanced LLMs fail in scenarios that require reasoning over multiple objects or facts and making sequences of logical deductions. We propose a two-stage probabilistic inference paradigm, ThinkSum, which reasons over sets of objects or facts in a structured manner. In the first stage (Think â?? retrieval of associations), a LLM is queried in parallel over a set of phrases extracted from the prompt or an auxiliary model call. In the second stage (Sum â?? probabilistic inference or reasoning), the results of these queries are aggregated to make the final prediction. We demonstrate the possibilities and advantages of ThinkSum on the BIG-bench suite of LLM evaluation tasks, achieving improvements over the state of the art using GPT-family models on thirteen difficult tasks, often with far smaller model variants. We also compare and contrast ThinkSum with other proposed modifications to direct prompting of LLMs, such as variants of chain-of-thought prompting. Our results suggest that because the probabilistic inference in ThinkSum is performed outside of calls to the LLM, ThinkSum is less sensitive to prompt design, yields more interpretable predictions, and can be flexibly combined with latent variable models to extract structured knowledge from LLMs. Overall, our proposed paradigm represents a promising approach for enhancing the reasoning capabilities of LLMs.

  • 7.Chain-of-Skills: A Configurable Model for Open-Domain Question Answering

    Kaixin Ma,Hao Cheng,Yu Zhang,Xiaodong Liu,Eric Nyberg,Jianfeng Gao
    Download URL

    https://aclanthology.org/2023.acl-long.89/


    abstract

    AbstractThe retrieval model is an indispensable component for real-world knowledge-intensive tasks, e.g., open-domain question answering (ODQA). As separate retrieval skills are annotated for different datasets, recent work focuses on customized methods, limiting the model transfer- ability and scalability. In this work, we propose a modular retriever where individual modules correspond to key skills that can be reused across datasets. Our approach supports flexible skill configurations based on the target domain to boost performance. To mitigate task interference, we design a novel modularization parameterization inspired by sparse Transformer. We demonstrate that our model can benefit from self-supervised pretraining on Wikipedia and fine-tuning using multiple ODQA datasets, both in a multi-task fashion. Our approach outperforms recent self-supervised retrievers in zero-shot evaluations and achieves state-of-the-art fine-tuned retrieval performance on NQ, HotpotQA and OTT-QA.

  • 8.Precise Zero-Shot Dense Retrieval without Relevance Labels

    Luyu Gao,Xueguang Ma,Jimmy Lin,Jamie Callan
    Download URL

    https://aclanthology.org/2023.acl-long.99/


    abstract

    AbstractWhile dense retrieval has been shown to be effective and efficient across tasks and languages, it remains difficult to create effective fully zero-shot dense retrieval systems when no relevance labels are available. In this paper, we recognize the difficulty of zero-shot learning and encoding relevance. Instead, we propose to pivot through Hypothetical Document Embeddings (HyDE). Given a query, HyDE first zero-shot prompts an instruction-following language model (e.g., InstructGPT) to generate a hypothetical document. The document captures relevance patterns but is â??fakeâ?? and may contain hallucinations. Then, an unsupervised contrastively learned encoder (e.g., Contriever) encodes the document into an embedding vector. This vector identifies a neighborhood in the corpus embedding space, from which similar real documents are retrieved based on vector similarity. This second step grounds the generated document to the actual corpus, with the encoderâ??s dense bottleneck filtering out the hallucinations. Our experiments show that HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever and shows strong performance comparable to fine-tuned retrievers across various tasks (e.g. web search, QA, fact verification) and in non-English languages (e.g., sw, ko, ja, bn).

  • 9.SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval

    Liang Wang,Nan Yang,Xiaolong Huang,Binxing Jiao,Linjun Yang,Daxin Jiang,Rangan Majumder,Furu Wei
    Download URL

    https://aclanthology.org/2023.acl-long.125/


    abstract

    AbstractIn this paper, we propose SimLM (Similarity matching with Language Model pre-training), a simple yet effective pre-training method for dense passage retrieval. It employs a simple bottleneck architecture that learns to compress the passage information into a dense vector through self-supervised pre-training. We use a replaced language modeling objective, which is inspired by ELECTRA (Clark et al., 2020), to improve the sample efficiency and reduce the mismatch of the input distribution between pre-training and fine-tuning. SimLM only requires access to an unlabeled corpus and is more broadly applicable when there are no labeled data or queries. We conduct experiments on several large-scale passage retrieval datasets and show substantial improvements over strong baselines under various settings. Remarkably, SimLM even outperforms multi-vector approaches such as ColBERTv2 (Santhanam et al., 2021) which incurs significantly more storage cost. Our code and model checkpoints are available at https://github.com/microsoft/unilm/tree/master/simlm .

  • 10.Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In

    Zichun Yu,Chenyan Xiong,Shi Yu,Zhiyuan Liu
    Download URL

    https://aclanthology.org/2023.acl-long.136/


    abstract

    AbstractRetrieval augmentation can aid language models (LMs) in knowledge-intensive tasks by supplying them with external information. Prior works on retrieval augmentation usually jointly fine-tune the retriever and the LM, making them closely coupled. In this paper, we explore the scheme of generic retrieval plug-in: the retriever is to assist target LMs that may not be known beforehand or are unable to be fine-tuned together. To retrieve useful documents for unseen target LMs, we propose augmentation-adapted retriever (AAR), which learns LMâ??s preferences obtained from a known source LM. Experiments on the MMLU and PopQA datasets demonstrate that our AAR trained with a small source LM is able to significantly improve the zero-shot generalization of larger target LMs ranging from 250M Flan-T5 to 175B InstructGPT. Further analysis indicates that the preferences of different LMs overlap, enabling AAR trained with a single source LM to serve as a generic plug-in for various target LMs. Our code is open-sourced at https://github.com/OpenMatch/Augmentation-Adapted-Retriever.

  • 11.What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary

    Ori Ram,Liat Bezalel,Adi Zicher,Yonatan Belinkov,Jonathan Berant,Amir Globerson
    Download URL

    https://aclanthology.org/2023.acl-long.140/


    abstract

    AbstractDual encoders are now the dominant architecture for dense retrieval. Yet, we have little understanding of how they represent text, and why this leads to good performance. In this work, we shed light on this question via distributions over the vocabulary. We propose to interpret the vector representations produced by dual encoders by projecting them into the modelâ??s vocabulary space. We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval. We find that this view can offer an explanation for some of the failure cases of dense retrievers. For example, we observe that the inability of models to handle tail entities is correlated with a tendency of the token distributions to forget some of the tokens of those entities. We leverage this insight and propose a simple way to enrich query and passage representations with lexical information at inference time, and show that this significantly improves performance compared to the original model in zero-shot settings, and specifically on the BEIR benchmark.

  • 12.CrossSum: Beyond English-Centric Cross-Lingual Summarization for 1,500+ Language Pairs

    Abhik Bhattacharjee,Tahmid Hasan,Wasi Uddin Ahmad,Yuan-Fang Li,Yong-Bin Kang,Rifat Shahriyar
    Download URL

    https://aclanthology.org/2023.acl-long.143/


    abstract

    AbstractWe present CrossSum, a large-scale cross-lingual summarization dataset comprising 1.68 million article-summary samples in 1,500+ language pairs. We create CrossSum by aligning parallel articles written in different languages via cross-lingual retrieval from a multilingual abstractive summarization dataset and perform a controlled human evaluation to validate its quality. We propose a multistage data sampling algorithm to effectively train a cross-lingual summarization model capable of summarizing an article in any target language. We also introduce LaSE, an embedding-based metric for automatically evaluating model-generated summaries. LaSE is strongly correlated with ROUGE and, unlike ROUGE, can be reliably measured even in the absence of references in the target language. Performance on ROUGE and LaSE indicate that our proposed model consistently outperforms baseline models. To the best of our knowledge, CrossSum is the largest cross-lingual summarization dataset and the first ever that is not centered around English. We are releasing the dataset, training and evaluation scripts, and models to spur future research on cross-lingual summarization. The resources can be found at https://github.com/csebuetnlp/CrossSum

  • 13.RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models

    Zheng Liu,Shitao Xiao,Yingxia Shao,Zhao Cao
    Download URL

    https://aclanthology.org/2023.acl-long.148/


    abstract

    AbstractTo better support information retrieval tasks such as web search and open-domain question answering, growing effort is made to develop retrieval-oriented language models, e.g., RetroMAE and many others. Most of the existing works focus on improving the semantic representation capability for the contextualized embedding of the [CLS] token. However, recent study shows that the ordinary tokens besides [CLS] may provide extra information, which help to produce a better representation effect. As such, itâ??s necessary to extend the current methods where all contextualized embeddings can be jointly pre-trained for the retrieval tasks. In this work, we propose a novel pre-training method called Duplex Masked Auto-Encoder, a.k.a. DupMAE. It is designed to improve the quality of semantic representation where all contextualized embeddings of the pre-trained model can be leveraged. It takes advantage of two complementary auto-encoding tasks: one reconstructs the input sentence on top of the [CLS] embedding; the other one predicts the bag-of-words feature of the input sentence based on the ordinary tokensâ?? embeddings. The two tasks are jointly conducted to train a unified encoder, where the whole contextualized embeddings are aggregated in a compact way to produce the final semantic representation. DupMAE is simple but empirically competitive: it substantially improves the pre-trained modelâ??s representation capability and transferability, where superior retrieval performances can be achieved on popular benchmarks, like MS MARCO and BEIR. We make our code publicly available at https://github.com/staoxiao/RetroMAE.

  • 14.CORE: Cooperative Training of Retriever-Reranker for Effective Dialogue Response Selection

    Chongyang Tao,Jiazhan Feng,Tao Shen,Chang Liu,Juntao Li,Xiubo Geng,Daxin Jiang
    Download URL

    https://aclanthology.org/2023.acl-long.174/


    abstract

    AbstractEstablishing retrieval-based dialogue systems that can select appropriate responses from the pre-built index has gained increasing attention. Recent common practice is to construct a two-stage pipeline with a fast retriever (e.g., bi-encoder) for first-stage recall followed by a smart response reranker (e.g., cross-encoder) for precise ranking. However, existing studies either optimize the retriever and reranker in independent ways, or distill the knowledge from a pre-trained reranker into the retriever in an asynchronous way, leading to sub-optimal performance of both modules. Thus, an open question remains about how to train them for a better combination of the best of both worlds. To this end, we present a cooperative training of the response retriever and the reranker whose parameters are dynamically optimized by the ground-truth labels as well as list-wise supervision signals from each other. As a result, the two modules can learn from each other and evolve together throughout the training. Experimental results on two benchmarks demonstrate the superiority of our method.

  • 15.We Understand Elliptical Sentences, and Language Models should Too: A New Dataset for Studying Ellipsis and its Interaction with Thematic Fit

    Davide Testa,Emmanuele Chersoni,Alessandro Lenci
    Download URL

    https://aclanthology.org/2023.acl-long.188/


    abstract

    AbstractEllipsis is a linguistic phenomenon characterized by the omission of one or more sentence elements. Solving such a linguistic construction is not a trivial issue in natural language processing since it involves the retrieval of non-overtly expressed verbal material, which might in turn require the model to integrate human-like syntactic and semantic knowledge. In this paper, we explored the issue of how the prototypicality of event participants affects the ability of Language Models (LMs) to handle elliptical sentences and to identify the omitted arguments at different degrees of thematic fit, ranging from highly typical participants to semantically anomalous ones. With this purpose in mind, we built ELLie, the first dataset composed entirely of utterances containing different types of elliptical constructions, and structurally suited for evaluating the effect of argument thematic fit in solving ellipsis and reconstructing the missing element. Our tests demonstrated that the probability scores assigned by the models are higher for typical events than for atypical and impossible ones in different elliptical contexts, confirming the influence of prototypicality of the event participants in interpreting such linguistic structures. Finally, we conducted a retrieval task of the elided verb in the sentence in which the low performance of LMs highlighted a considerable difficulty in reconstructing the correct event.

  • 16.MatSci-NLP: Evaluating Scientific Language Models on Materials Science Language Tasks Using Text-to-Schema Modeling

    Yu Song,Santiago Miret,Bang Liu
    Download URL

    https://aclanthology.org/2023.acl-long.201/


    abstract

    AbstractWe present MatSci-NLP, a natural language benchmark for evaluating the performance of natural language processing (NLP) models on materials science text. We construct the benchmark from publicly available materials science text data to encompass seven different NLP tasks, including conventional NLP tasks like named entity recognition and relation classification, as well as NLP tasks specific to materials science, such as synthesis action retrieval which relates to creating synthesis procedures for materials. We study various BERT-based models pretrained on different scientific text corpora on MatSci-NLP to understand the impact of pretraining strategies on understanding materials science text. Given the scarcity of high-quality annotated data in the materials science domain, we perform our fine-tuning experiments with limited training data to encourage the generalize across MatSci-NLP tasks.Our experiments in this low-resource training setting show that language models pretrained on scientific text outperform BERT trained on general text. MatBERT, a model pretrained specifically on materials science journals, generally performs best for most tasks. Moreover, we propose a unified text-to-schema for multitask learning on {pasted macro â??BENCHMARKâ??} and compare its performance with traditional fine-tuning methods. In our analysis of different training methods, we find that our proposed text-to-schema methods inspired by question-answering consistently outperform single and multitask NLP fine-tuning methods. The code and datasets are publicly available https://github.com/BangLab-UdeM-Mila/NLP4MatSci-ACL23.

  • 17.Unified Demonstration Retriever for In-Context Learning

    Xiaonan Li,Kai Lv,Hang Yan,Tianyang Lin,Wei Zhu,Yuan Ni,Guotong Xie,Xiaoling Wang,Xipeng Qiu
    Download URL

    https://aclanthology.org/2023.acl-long.256/


    abstract

    AbstractIn-context learning is a new learning paradigm where a language model conditions on a few input-output pairs (demonstrations) and a test input, and directly outputs the prediction. It has been shown sensitive to the provided demonstrations and thus promotes the research of demonstration retrieval: given a test input, relevant examples are retrieved from the training set to serve as informative demonstrations for in-context learning. While previous works train task-specific retrievers for several tasks separately, these methods are hard to transfer and scale on various tasks, and separately trained retrievers will cause a lot of parameter storage and deployment cost. In this paper, we propose Unified Demonstration Retriever (UDR), a single model to retrieve demonstrations for a wide range of tasks. To train UDR, we cast various tasksâ?? training signals into a unified list-wise ranking formulation by language modelâ??s feedback. Then we propose a multi-task list-wise ranking training framework with an iterative mining strategy to find high-quality candidates, which can help UDR fully incorporate various tasksâ?? signals. Experiments on 30+ tasks across 13 task families and multiple data domains show that UDR significantly outperforms baselines. Further analyses show the effectiveness of each proposed component and UDRâ??s strong ability in various scenarios including different LMs (1.3B 175B), unseen datasets, varying demonstration quantities, etc. We will release the code and model checkpoint after review.

  • Fengran Mo,Kelong Mao,Yutao Zhu,Yihong Wu,Kaiyu Huang,Jian-Yun Nie
    Download URL

    https://aclanthology.org/2023.acl-long.274/


    abstract

    AbstractIn conversational search, the userâ??s real search intent for the current conversation turn is dependent on the previous conversation history. It is challenging to determine a good search query from the whole conversation context. To avoid the expensive re-training of the query encoder, most existing methods try to learn a rewriting model to de-contextualize the current query by mimicking the manual query rewriting.However, manually rewritten queries are not always the best search queries.Thus, training a rewriting model on them would lead to sub-optimal queries. Another useful information to enhance the search query is the potential answer to the question. In this paper, we propose ConvGQR, a new framework to reformulate conversational queries based on generative pre-trained language models (PLMs), one for query rewriting and another for generating potential answers.By combining both, ConvGQR can produce better search queries.In addition, to relate query reformulation to the retrieval task, we propose a knowledge infusion mechanism to optimize both query reformulation and retrieval. Extensive experiments on four conversational search datasets demonstrate the effectiveness of ConvGQR.

  • 19.Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

    Yan Zeng,Wangchunshu Zhou,Ao Luo,Ziming Cheng,Xinsong Zhang
    Download URL

    https://aclanthology.org/2023.acl-long.315/


    abstract

    AbstractIn this paper, we introduce Cross-View Language Modeling, a simple and effective pre-training framework that unifies cross-lingual and cross-modal pre-training with shared architectures and objectives. Our approach is motivated by a key observation that cross-lingual and cross-modal pre-training share the same goal of aligning two different views of the same object into a common semantic space. To this end, the cross-view language modeling framework considers both multi-modal data (i.e., image-caption pairs) and multi-lingual data (i.e., parallel sentence pairs) as two different views of the same object, and trains the model to align the two views by maximizing the mutual information between them with conditional masked language modeling and contrastive learning. We pre-train CCLM, a Cross-lingual Cross-modal Language Model, with the cross-view language modeling framework. Empirical results on IGLUE, a multi-lingual multi-modal benchmark, and two multi-lingual image-text retrieval datasets show that while conceptually simpler, CCLM significantly outperforms the prior state-of-the-art with an average absolute improvement of over 10%. Moreover, CCLM is the first multi-lingual multi-modal pre-trained model that surpasses the translate-test performance of representative English vision-language models by zero-shot cross-lingual transfer.

  • 20.Bridging the Domain Gaps in Context Representations for k-Nearest Neighbor Neural Machine Translation

    Zhiwei Cao,Baosong Yang,Huan Lin,Suhang Wu,Xiangpeng Wei,Dayiheng Liu,Jun Xie,Min Zhang,Jinsong Su
    Download URL

    https://aclanthology.org/2023.acl-long.321/


    abstract

    Abstractk-Nearest neighbor machine translation (kNN-MT) has attracted increasing attention due to its ability to non-parametrically adapt to new translation domains. By using an upstream NMT model to traverse the downstream training corpus, it is equipped with a datastore containing vectorized key-value pairs, which are retrieved during inference to benefit translation.However, there often exists a significant gap between upstream and downstream domains, which hurts the datastore retrieval and the final translation quality.To deal with this issue, we propose a novel approach to boost the datastore retrieval of kNN-MT by reconstructing the original datastore.Concretely, we design a reviser to revise the key representations, making them better fit for the downstream domain. The reviser is trained using the collected semantically-related key-queries pairs, and optimized by two proposed losses: one is the key-queries semantic distance ensuring each revised key representation is semantically related to its corresponding queries, and the other is an L2-norm loss encouraging revised key representations to effectively retain the knowledge learned by the upstream NMT model. Extensive experiments on domain adaptation tasks demonstrate that our method can effectively boost the datastore retrieval and translation quality of kNN-MT.Our code is available at https://github.com/DeepLearnXMU/Revised-knn-mt.

  • 21.TOME: A Two-stage Approach for Model-based Retrieval

    Ruiyang Ren,Wayne Xin Zhao,Jing Liu,Hua Wu,Ji-Rong Wen,Haifeng Wang
    Download URL

    https://aclanthology.org/2023.acl-long.336/


    abstract

    AbstractRecently, model-based retrieval has emerged as a new paradigm in text retrieval that discards the index in the traditional retrieval model and instead memorizes the candidate corpora using model parameters. This design employs a sequence-to-sequence paradigm to generate document identifiers, which enables the complete capture of the relevance between queries and documents and simplifies the classic index-retrieval-rerank pipeline. Despite its attractive qualities, there remain several major challenges in model-based retrieval, including the discrepancy between pre-training and fine-tuning, and the discrepancy between training and inference. To deal with the above challenges, we propose a novel two-stage model-based retrieval approach called TOME, which makes two major technical contributions, including the utilization of tokenized URLs as identifiers and the design of a two-stage generation architecture. We also propose a number of training strategies to deal with the training difficulty as the corpus size increases. Extensive experiments and analysis on MS MARCO and Natural Questions demonstrate the effectiveness of our proposed approach, and we investigate the scaling laws of TOME by examining various influencing factors.

  • 22.Parallel Context Windows for Large Language Models

    Nir Ratner,Yoav Levine,Yonatan Belinkov,Ori Ram,Inbal Magar,Omri Abend,Ehud Karpas,Amnon Shashua,Kevin Leyton-Brown,Yoav Shoham
    Download URL

    https://aclanthology.org/2023.acl-long.352/


    abstract

    AbstractWhen applied to processing long text, Large Language Models (LLMs) are limited by their context window. Existing efforts to address this limitation involve training specialized architectures, and cannot be easily applied to off- the-shelf LLMs. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (â??windowsâ??), restrict the attention mechanism to apply only within each window, and re-use the positional embeddings across the windows. Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents. Our results highlight Parallel Context Windows as a promising method for applying off-the-shelf LLMs in a range of settings that require long text sequences. We make our code publicly available at https://github.com/ ai21labs/parallel-context-windows.

  • 23.Retrieval-free Knowledge Injection through Multi-Document Traversal for Dialogue Models

    Rui Wang,Jianzhu Bao,Fei Mi,Yi Chen,Hongru Wang,Yasheng Wang,Yitong Li,Lifeng Shang,Kam-Fai Wong,Ruifeng Xu
    Download URL

    https://aclanthology.org/2023.acl-long.364/


    abstract

    AbstractDialogue models are often enriched with extensive external knowledge to provide informative responses through a retrieval-augmented pipeline.Nevertheless, retrieval-augmented approaches rely on finely annotated retrieval training data and knowledge-grounded response generation data, making it costly to transfer. To tackle this challenge, this paper proposed a retrieval-free approach, KiDG, by automatically turning knowledge documents into simulated multi-turn dialogues through a Multi-Document Traversal algorithm. The simulated knowledge-intensive dialogues constructed by KiDG in one domain can be easily used to train and enhance pre-trained dialogue modelsâ?? knowledge w.r.t. this domain without costly annotation.We conduct extensive experiments comparing retrieval-augmented models and a variety of retrieval-free models. We found that dialogue models enhanced with data simulated with KiDG largely outperform state-of-the-art retrieval-free methods, and it achieves comparable performance compared to retrieval-augmented methods while being better, and cheaper at domain transfer.

  • 24.BERM: Training the Balanced and Extractable Representation for Matching to Improve Generalization Ability of Dense Retrieval

    Shicheng Xu,Liang Pang,Huawei Shen,Xueqi Cheng
    Download URL

    https://aclanthology.org/2023.acl-long.365/


    abstract

    AbstractDense retrieval has shown promise in the first-stage retrieval process when trained on in-domain labeled datasets. However, previous studies have found that dense retrieval is hard to generalize to unseen domains due to its weak modeling of domain-invariant and interpretable feature (i.e., matching signal between two texts, which is the essence of information retrieval). In this paper, we propose a novel method to improve the generalization of dense retrieval via capturing matching signal called BERM. Fully fine-grained expression and query-oriented saliency are two properties of the matching signal. Thus, in BERM, a single passage is segmented into multiple units and two unit-level requirements are proposed for representation as the constraint in training to obtain the effective matching signal. One is semantic unit balance and the other is essential matching unit extractability. Unit-level view and balanced semantics make representation express the text in a fine-grained manner. Essential matching unit extractability makes passage representation sensitive to the given query to extract the pure matching information from the passage containing complex context. Experiments on BEIR show that our method can be effectively combined with different dense retrieval training methods (vanilla, hard negatives mining and knowledge distillation) to improve its generalization ability without any additional inference overhead and target domain data.

  • 25.Multiview Identifiers Enhanced Generative Retrieval

    Yongqi Li,Nan Yang,Liang Wang,Furu Wei,Wenjie Li
    Download URL

    https://aclanthology.org/2023.acl-long.366/


    abstract

    AbstractInstead of simply matching a query to pre-existing passages, generative retrieval generates identifier strings of passages as the retrieval target. At a cost, the identifier must be distinctive enough to represent a passage. Current approaches use either a numeric ID or a text piece (such as a title or substrings) as the identifier. However, these identifiers cannot cover a passageâ??s content well. As such, we are motivated to propose a new type of identifier, synthetic identifiers, that are generated based on the content of a passage and could integrate contextualized information that text pieces lack. Furthermore, we simultaneously consider multiview identifiers, including synthetic identifiers, titles, and substrings. These views of identifiers complement each other and facilitate the holistic ranking of passages from multiple perspectives. We conduct a series of experiments on three public datasets, and the results indicate that our proposed approach performs the best in generative retrieval, demonstrating its effectiveness and robustness.

  • 26.Patton: Language Model Pretraining on Text-Rich Networks

    Bowen Jin,Wentao Zhang,Yu Zhang,Yu Meng,Xinyang Zhang,Qi Zhu,Jiawei Han
    Download URL

    https://aclanthology.org/2023.acl-long.387/


    abstract

    AbstractA real-world text corpus sometimes comprises not only text documents, but also semantic links between them (e.g., academic papers in a bibliographic network are linked by citations and co-authorships).Text documents and semantic connections form a text-rich network, which empowers a wide range of downstream tasks such as classification and retrieval.However, pretraining methods for such structures are still lacking, making it difficult to build one generic model that can be adapted to various tasks on text-rich networks.Current pretraining objectives, such as masked language modeling, purely model texts and do not take inter-document structure information into consideration.To this end, we propose our PretrAining on TexT-Rich NetwOrk framework Patton.Patton includes two pretraining strategies: network-contextualized masked language modeling and masked node prediction, to capture the inherent dependency between textual attributes and network structure.We conduct experiments on four downstream tasks in five datasets from both academic and e-commerce domains, where Patton outperforms baselines significantly and consistently.

  • 27.Soft Language Clustering for Multilingual Model Pre-training

    Jiali Zeng,Yufan Jiang,Yongjing Yin,Yi Jing,Fandong Meng,Binghuai Lin,Yunbo Cao,Jie Zhou
    Download URL

    https://aclanthology.org/2023.acl-long.388/


    abstract

    AbstractMultilingual pre-trained language models have demonstrated impressive (zero-shot) cross-lingual transfer abilities, however, their performance is hindered when the target language has distant typologyfrom the source language or when pre-training data is limited in size.In this paper, we propose XLM-P, a method that contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our space-efficient and model-agnostic XLM-P approach enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods. On the tasks of XTREME, which include text classification, sequence labeling, question answering, and sentence retrieval, both base- and large-size language models pre-trained with our proposed method exhibit consistent performance improvement. Furthermore, it provides substantial advantages for low-resource languages in unsupervised sentence retrieval and for target languages that differ greatly from the source language in cross-lingual transfer.

  • 28.Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations

    Hyunjae Kim,Jaehyo Yoo,Seunghyun Yoon,Jaewoo Kang
    Download URL

    https://aclanthology.org/2023.acl-long.394/


    abstract

    AbstractMost weakly supervised named entity recognition (NER) models rely on domain-specific dictionaries provided by experts. This approach is infeasible in many domains where dictionaries do not exist. While a phrase retrieval model was used to construct pseudo-dictionaries with entities retrieved from Wikipedia automatically in a recent study, these dictionaries often have limited coverage because the retriever is likely to retrieve popular entities rather than rare ones. In this study, we present a novel framework, HighGEN, that generates NER datasets with high-coverage pseudo-dictionaries. Specifically, we create entity-rich dictionaries with a novel search method, called phrase embedding search, which encourages the retriever to search a space densely populated with various entities. In addition, we use a new verification process based on the embedding distance between candidate entity mentions and entity types to reduce the false-positive noise in weak labels generated by high-coverage dictionaries. We demonstrate that HighGEN outperforms the previous best model by an average F1 score of 4.7 across five NER benchmark datasets.

  • 29.MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation

    Jiazhan Feng,Qingfeng Sun,Can Xu,Pu Zhao,Yaming Yang,Chongyang Tao,Dongyan Zhao,Qingwei Lin
    Download URL

    https://aclanthology.org/2023.acl-long.405/


    abstract

    AbstractResponding with multi-modal content has been recognized as an essential capability for an intelligent conversational agent. In this paper, we introduce the MMDialog dataset to facilitate multi-modal conversation better. MMDialog is composed of a curated set of 1.08 million real-world dialogues with 1.53 million unique images across 4,184 topics. MMDialog has two main and unique advantages. First, it is the largest multi-modal conversation dataset by the number of dialogues by 88x. Second, it contains massive topics to generalize the open domain. To build an engaging dialogue system with this dataset, we propose and normalize two response prediction tasks based on retrieval and generative scenarios. In addition, we build two baselines for the above tasks with state-of-the-art techniques and report their experimental performance. We also propose a novel evaluation metric MM-Relevance to measure the multi-modal responses. Our dataset is available in https://github.com/victorsungo/MMDialog.

  • 30.Massively Multilingual Lexical Specialization of Multilingual Transformers

    Tommaso Green,Simone Paolo Ponzetto,Goran Glavaš
    Download URL

    https://aclanthology.org/2023.acl-long.426/


    abstract

    AbstractWhile pretrained language models (PLMs) primarily serve as general-purpose text encoders that can be fine-tuned for a wide variety of downstream tasks, recent work has shown that they can also be rewired to produce high-quality word representations (i.e., static word embeddings) and yield good performance in type-level lexical tasks. While existing work primarily focused on the lexical specialization of monolingual PLMs with immense quantities of monolingual constraints, in this work we expose massively multilingual transformers (MMTs, e.g., mBERT or XLM-R) to multilingual lexical knowledge at scale, leveraging BabelNet as the readily available rich source of multilingual and cross-lingual type-level lexical knowledge. Concretely, we use BabelNetâ??s multilingual synsets to create synonym pairs (or synonym-gloss pairs) across 50 languages and then subject the MMTs (mBERT and XLM-R) to a lexical specialization procedure guided by a contrastive objective. We show that such massively multilingual lexical specialization brings substantial gains in two standard cross-lingual lexical tasks, bilingual lexicon induction and cross-lingual word similarity, as well as in cross-lingual sentence retrieval. Crucially, we observe gains for languages unseen in specialization, indicating that multilingual lexical specialization enables generalization to languages with no lexical constraints. In a series of subsequent controlled experiments, we show that the number of specialization constraints plays a much greater role than the set of languages from which they originate.

  • 31.RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

    Afra Feyza Akyurek,Ekin Akyurek,Ashwin Kalyan,Peter Clark,Derry Tanti Wijaya,Niket Tandon
    Download URL

    https://aclanthology.org/2023.acl-long.427/


    abstract

    AbstractDespite their unprecedented success, even the largest language models make mistakes.Similar to how humans learn and improve using feedback, previous work proposed providing language models with natural language feedback to guide them in repairing their outputs. Because human-generated critiques are expensive to obtain, researchers have devised learned critique generators in lieu of human critics while assuming one can train downstream models to utilize generated feedback. However, this approach does not apply to black-box or limited access models such as ChatGPT, as they cannot be fine-tuned. Moreover, in the era of large general-purpose language agents, fine-tuning is neither computationally nor spatially efficient as it results in multiple copies of the network. In this work, we introduce RL4F (Reinforcement Learning for Feedback), a multi-agent collaborative framework where the critique generator is trained to maximize end-task performance of GPT-3, a fixed model more than 200 times its size. RL4F produces critiques that help GPT-3 revise its outputs. We study three datasets for action planning, summarization and alphabetization and show relative improvements up to 10% in multiple text similarity metrics over other learned, retrieval-augmented or prompting-based critique generators.

  • 32.Data Curation Alone Can Stabilize In-context Learning

    Ting-Yun Chang,Robin Jia
    Download URL

    https://aclanthology.org/2023.acl-long.452/


    abstract

    AbstractIn-context learning (ICL) enables large language models (LLMs) to perform new tasks by prompting them with a sequence of training examples. However, it is known that ICL is very sensitive to the choice of training examples: randomly sampling examples from a training set leads to high variance in performance. In this paper, we show that carefully curating a subset of training data greatly stabilizes ICL performance without any other changes to the ICL algorithm (e.g., prompt retrieval or calibration). We introduce two methods to choose training subsetsâ??both score training examples individually, then select the highest-scoring ones. CondAcc scores a training example by its average dev-set ICL accuracy when combined with random training examples, while Datamodels learns linear regressors that estimate how the presence of each training example influences LLM outputs. Across five tasks and two LLMs, sampling from stable subsets selected by CondAcc and Datamodels improves average accuracy over sampling from the entire training set by 7.7% and 6.3%, respectively.Surprisingly, the stable subset examples are not especially diverse in content or low in perplexity, in contrast with other work suggesting that diversity and perplexity are important when prompting LLMs.

  • 33.Weakly Supervised Vision-and-Language Pre-training with Relative Representations

    Chi Chen,Peng Li,Maosong Sun,Yang Liu
    Download URL

    https://aclanthology.org/2023.acl-long.464/


    abstract

    AbstractWeakly supervised vision-and-language pre-training (WVLP), which learns cross-modal representations with limited cross-modal supervision, has been shown to effectively reduce the data cost of pre-training while maintaining decent performance on downstream tasks. However, current WVLP methods use only local descriptions of images, i.e., object tags, as cross-modal anchors to construct weakly-aligned image-text pairs for pre-training. This affects the data quality and thus the effectiveness of pre-training. In this paper, we propose to directly take a small number of aligned image-text pairs as anchors, and represent each unaligned image and text by its similarities to these anchors, i.e., relative representations. We build a WVLP framework based on the relative representations, namely RELIT, which collects high-quality weakly-aligned image-text pairs from large-scale image-only and text-only data for pre-training through relative representation-based retrieval and generation. Experiments on four downstream tasks show that RELIT achieves new state-of-the-art results under the weakly supervised setting.

  • 34.Learning Answer Generation using Supervision from Automatic Question Answering Evaluators

    Matteo Gabburo,Siddhant Garg,Rik Koncel-Kedziorski,Alessandro Moschitti
    Download URL

    https://aclanthology.org/2023.acl-long.467/


    abstract

    AbstractRecent studies show that sentence-level extractive QA, i.e., based on Answer Sentence Selection (AS2), is outperformed by Generation-based QA (GenQA) models, which generate answers using the top-k answer sentences ranked by AS2 models (a la retrieval-augmented generation style). In this paper, we propose a novel training paradigm for GenQA using supervision from automatic QA evaluation models (GAVA). Specifically, we propose three strategies to transfer knowledge from these QA evaluation models to a GenQA model: (i) augmenting training data with answers generated by the GenQA model and labelled by GAVA (either statically, before training, or (ii) dynamically, at every training epoch); and (iii) using the GAVA score for weighting the generator loss during the learning of the GenQA model. We evaluate our proposed methods on two academic and one industrial dataset, obtaining a significant improvement in answering accuracy over the previous state of the art.

  • 35.RECAP: Retrieval-Enhanced Context-Aware Prefix Encoder for Personalized Dialogue Response Generation

    Shuai Liu,Hyundong Cho,Marjorie Freedman,Xuezhe Ma,Jonathan May
    Download URL

    https://aclanthology.org/2023.acl-long.468/


    abstract

    AbstractEndowing chatbots with a consistent persona is essential to an engaging conversation, yet it remains an unresolved challenge. In this work, we propose a new retrieval-enhanced approach for personalized response generation. Specifically, we design a hierarchical transformer retriever trained on dialogue domain data to perform personalized retrieval and a context-aware prefix encoder that fuses the retrieved information to the decoder more effectively. Extensive experiments on a real-world dataset demonstrate the effectiveness of our model at generating more fluent and personalized responses. We quantitatively evaluate our modelâ??s performance under a suite of human and automatic metrics and find it to be superior compared to state-of-the-art baselines on English Reddit conversations.

  • 36.End-to-end Knowledge Retrieval with Multi-modal Queries

    Man Luo,Zhiyuan Fang,Tejas Gokhale,Yezhou Yang,Chitta Baral
    Download URL

    https://aclanthology.org/2023.acl-long.478/


    abstract

    AbstractWe investigate knowledge retrieval with multi-modal queries, i.e. queries containing information split across image and text inputs, a challenging task that differs from previous work on cross-modal retrieval. We curate a new dataset called ReMuQ for benchmarking progress on this task. ReMuQ requires a system to retrieve knowledge from a large corpus by integrating contents from both text and image queries. We introduce a retriever model â??ReVizâ?? that can directly process input text and images to retrieve relevant knowledge in an end-to-end fashion without being dependent on intermediate modules such as object detectors or caption generators. We introduce a new pretraining task that is effective for learning knowledge retrieval with multimodal queries and also improves performance on downstream tasks. We demonstrate superior performance in retrieval on two datasets (ReMuQ and OK-VQA) under zero-shot settings as well as further improvements when finetuned on these datasets.

  • 37.Conflicts, Villains, Resolutions: Towards models of Narrative Media Framing

    Lea Frermann,Jiatong Li,Shima Khanehzar,Gosia Mikolajczak
    Download URL

    https://aclanthology.org/2023.acl-long.486/


    abstract

    AbstractDespite increasing interest in the automatic detection of media frames in NLP, the problem is typically simplified as single-label classification and adopts a topic-like view on frames, evading modelling the broader document-level narrative. In this work, we revisit a widely used conceptualization of framing from the communication sciences which explicitly captures elements of narratives, including conflict and its resolution, and integrate it with the narrative framing of key entities in the story as heroes, victims or villains. We adapt an effective annotation paradigm that breaks a complex annotation task into a series of simpler binary questions, and present an annotated data set of English news articles, and a case study on the framing of climate change in articles from news outlets across the political spectrum. Finally, we explore automatic multi-label prediction of our frames with supervised and semi-supervised approaches, and present a novel retrieval-based method which is both effective and transparent in its predictions. We conclude with a discussion of opportunities and challenges for future work on document-level models of narrative framing.

  • Yujia Qin,Zihan Cai,Dian Jin,Lan Yan,Shihao Liang,Kunlun Zhu,Yankai Lin,Xu Han,Ning Ding,Huadong Wang,Ruobing Xie,Fanchao Qi,Zhiyuan Liu,Maosong Sun,Jie Zhou
    Download URL

    https://aclanthology.org/2023.acl-long.499/


    abstract

    AbstractLong-form question answering (LFQA) aims at answering complex, open-ended questions with detailed, paragraph-length responses. The de facto paradigm of LFQA necessitates two procedures: information retrieval, which searches for relevant supporting facts, and information synthesis, which integrates these facts into a coherent answer. In this paper, we introduce WebCPM, the first Chinese LFQA dataset. One unique feature of WebCPM is that its information retrieval is based on interactive web search, which engages with a search engine in real time. Following WebGPT, we develop a web search interface. We recruit annotators to search for relevant information using our interface and then answer questions. Meanwhile, the web search behaviors of our annotators would be recorded. In total, we collect 5,500 high-quality question-answer pairs, together with 15,372 supporting facts and 125,954 web search actions. We fine-tune pre-trained language models to imitate human behaviors for web search and to generate answers based on the collected facts. Our LFQA pipeline, built on these fine-tuned models, generates answers that are no worse than human-written ones in 32.5% and 47.5% of the cases on our dataset and DuReader, respectively. The interface, dataset, and codes are publicly available at https://github.com/thunlp/WebCPM.

  • 39.Towards Better Entity Linking with Multi-View Enhanced Distillation

    Yi Liu,Yuan Tian,Jianxun Lian,Xinlong Wang,Yanan Cao,Fang Fang,Wen Zhang,Haizhen Huang,Weiwei Deng,Qi Zhang
    Download URL

    https://aclanthology.org/2023.acl-long.542/


    abstract

    AbstractDense retrieval is widely used for entity linking to retrieve entities from large-scale knowledge bases. Mainstream techniques are based on a dual-encoder framework, which encodes mentions and entities independently and calculates their relevances via rough interaction metrics, resulting in difficulty in explicitly modeling multiple mention-relevant parts within entities to match divergent mentions. Aiming at learning entity representations that can match divergent mentions, this paper proposes a Multi-View Enhanced Distillation (MVD) framework, which can effectively transfer knowledge of multiple fine-grained and mention-relevant parts within entities from cross-encoders to dual-encoders. Each entity is split into multiple views to avoid irrelevant information being over-squashed into the mention-relevant view. We further design cross-alignment and self-alignment mechanisms for this framework to facilitate fine-grained knowledge distillation from the teacher model to the student model. Meanwhile, we reserve a global-view that embeds the entity as a whole to prevent dispersal of uniform information. Experiments show our method achieves state-of-the-art performance on several entity linking benchmarks.

  • 40.When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

    Alex Mallen,Akari Asai,Victor Zhong,Rajarshi Das,Daniel Khashabi,Hannaneh Hajishirzi
    Download URL

    https://aclanthology.org/2023.acl-long.546/


    abstract

    AbstractDespite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the difficulty of encoding a wealth of world knowledge in their parameters. This paper aims to understand LMsâ?? strengths and limitations in memorizing factual knowledge, by conducting large-scale knowledge probing experiments on two open-domain entity-centric QA datasets: PopQA, our new dataset with 14k questions about long-tail entities, and EntityQuestions, a widely used open-domain QA dataset. We find that LMs struggle with less popular factual knowledge, and that retrieval augmentation helps significantly in these cases. Scaling, on the other hand, mainly improves memorization of popular knowledge, and fails to appreciably improve memorization of factual knowledge in the tail. Based on those findings, we devise a new method for retrieval-augmentation that improves performance and reduces inference costs by only retrieving non-parametric memories when necessary.

  • 41.Cognitive Reframing of Negative Thoughts through Human-Language Model Interaction

    Ashish Sharma,Kevin Rushton,Inna Lin,David Wadden,Khendra Lucas,Adam Miner,Theresa Nguyen,Tim Althoff
    Download URL

    https://aclanthology.org/2023.acl-long.555/


    abstract

    AbstractA proven therapeutic technique to overcome negative thoughts is to replace them with a more hopeful â??reframed thought.â?? Although therapy can help people practice and learn this Cognitive Reframing of Negative Thoughts, clinician shortages and mental health stigma commonly limit peopleâ??s access to therapy. In this paper, we conduct a human-centered study of how language models may assist people in reframing negative thoughts. Based on psychology literature, we define a framework of seven linguistic attributes that can be used to reframe a thought. We develop automated metrics to measure these attributes and validate them with expert judgements from mental health practitioners. We collect a dataset of 600 situations, thoughts and reframes from practitioners and use it to train a retrieval-enhanced in-context learning model that effectively generates reframed thoughts and controls their linguistic attributes. To investigate what constitutes a â??high-qualityâ?? reframe, we conduct an IRB-approved randomized field study on a large mental health website with over 2,000 participants. Amongst other findings, we show that people prefer highly empathic or specific reframes, as opposed to reframes that are overly positive. Our findings provide key implications for the use of LMs to assist people in overcoming negative thoughts.

  • 42.Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

    Harsh Trivedi,Niranjan Balasubramanian,Tushar Khot,Ashish Sabharwal
    Download URL

    https://aclanthology.org/2023.acl-long.557/


    abstract

    AbstractPrompting-based large language models (LLMs) are surprisingly powerful at generating natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question answering (QA). They struggle, however, when the necessary knowledge is either unavailable to the LLM or not up-to-date within its parameters. While using the question to retrieve relevant text from an external knowledge source helps LLMs, we observe that this one-step retrieve-and-read approach is insufficient for multi-step QA. Here, what to retrieve depends on what has already been derived, which in turn may depend on what was previously retrieved. To address this, we propose IRCoT, a new approach for multi-step QA that interleaves retrieval with steps (sentences) in a CoT, guiding the retrieval with CoT and in turn using retrieved results to improve CoT. Using IRCoT with GPT3 substantially improves retrieval (up to 21 points) as well as downstream QA (up to 15 points) on four datasets: HotpotQA, 2WikiMultihopQA, MuSiQue, and IIRC. We observe similar substantial gains in out-of-distribution (OOD) settings as well as with much smaller models such as Flan-T5-large without additional training. IRCoT reduces model hallucination, resulting in factually more accurate CoT reasoning.

  • 43.Direct Fact Retrieval from Knowledge Graphs without Entity Linking

    Jinheon Baek,Alham Fikri Aji,Jens Lehmann,Sung Ju Hwang
    Download URL

    https://aclanthology.org/2023.acl-long.558/


    abstract

    AbstractThere has been a surge of interest in utilizing Knowledge Graphs (KGs) for various natural language processing/understanding tasks. The conventional mechanism to retrieve facts in KGs usually involves three steps: entity span detection, entity disambiguation, and relation classification. However, this approach requires additional labels for training each of the three subcomponents in addition to pairs of input texts and facts, and also may accumulate errors propagated from failures in previous steps. To tackle these limitations, we propose a simple knowledge retrieval framework, which directly retrieves facts from the KGs given the input text based on their representational similarities, which we refer to as Direct Fact Retrieval (DiFaR). Specifically, we first embed all facts in KGs onto a dense embedding space by using a language model trained by only pairs of input texts and facts, and then provide the nearest facts in response to the input text. Since the fact, consisting of only two entities and one relation, has little context to encode, we propose to further refine ranks of top-k retrieved facts with a reranker that contextualizes the input text and the fact jointly. We validate our DiFaR framework on multiple fact retrieval tasks, showing that it significantly outperforms relevant baselines that use the three-step approach.

  • 44.DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions

    Vijay Viswanathan,Luyu Gao,Tongshuang Wu,Pengfei Liu,Graham Neubig
    Download URL

    https://aclanthology.org/2023.acl-long.573/


    abstract

    AbstractModern machine learning relies on datasets to develop and validate research ideas. Given the growth of publicly available data, finding the right dataset to use is increasingly difficult. Any research question imposes explicit and implicit constraints on how well a given dataset will enable researchers to answer this question, such as dataset size, modality, and domain. We operationalize the task of recommending datasets given a short natural language description of a research idea, to help people find relevant datasets for their needs. Dataset recommendation poses unique challenges as an information retrieval problem; datasets are hard to directly index for search and there are no corpora readily available for this task. To facilitate this task, we build the DataFinder Dataset which consists of a larger automatically-constructed training set (17.5K queries) and a smaller expert-annotated evaluation set (392 queries). Using this data, we compare various information retrieval algorithms on our test set and present a superior bi-encoder retriever for text-based dataset recommendation. This system, trained on the DataFinder Dataset, finds more relevant search results than existing third-party dataset search engines. To encourage progress on dataset recommendation, we release our dataset and models to the public.

  • 45.Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog

    Fanqi Wan,Weizhou Shen,Ke Yang,Xiaojun Quan,Wei Bi
    Download URL

    https://aclanthology.org/2023.acl-long.627/


    abstract

    AbstractRetrieving proper domain knowledge from an external database lies at the heart of end-to-end task-oriented dialog systems to generate informative responses. Most existing systems blend knowledge retrieval with response generation and optimize them with direct supervision from reference responses, leading to suboptimal retrieval performance when the knowledge base becomes large-scale. To address this, we propose to decouple knowledge retrieval from response generation and introduce a multi-grained knowledge retriever (MAKER) that includes an entity selector to search for relevant entities and an attribute selector to filter out irrelevant attributes. To train the retriever, we propose a novel distillation objective that derives supervision signals from the response generator. Experiments conducted on three standard benchmarks with both small and large-scale knowledge bases demonstrate that our retriever performs knowledge retrieval more effectively than existing methods. Our code has been made publicly available at https://github.com/18907305772/MAKER.

  • 46.AlignScore: Evaluating Factual Consistency with A Unified Alignment Function

    Yuheng Zha,Yichi Yang,Ruichen Li,Zhiting Hu
    Download URL

    https://aclanthology.org/2023.acl-long.634/


    abstract

    AbstractMany text generation applications require the generated text to be factually consistent with input information. Automatic evaluation of factual consistency is challenging. Previous work has developed various metrics that often depend on specific functions, such as natural language inference (NLI) or question answering (QA), trained on limited data. Those metrics thus can hardly assess diverse factual inconsistencies (e.g., contradictions, hallucinations) that occur in varying inputs/outputs (e.g., sentences, documents) from different tasks. In this paper, we propose AlignScore, a new holistic metric that applies to a variety of factual inconsistency scenarios as above. AlignScore is based on a general function of information alignment between two arbitrary text pieces. Crucially, we develop a unified training framework of the alignment function by integrating a large diversity of data sources, resulting in 4.7M training examples from 7 well-established tasks (NLI, QA, paraphrasing, fact verification, information retrieval, semantic similarity, and summarization). We conduct extensive experiments on large-scale benchmarks including 22 evaluation datasets, where 19 of the datasets were never seen in the alignment training. AlignScore achieves substantial improvement over a wide range of previous metrics. Moreover, AlignScore (355M parameters) matches or even outperforms metrics based on ChatGPT and GPT-4 that are orders of magnitude larger.

  • 47.CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

    Minghan Li,Sheng-Chieh Lin,Barlas Oguz,Asish Ghoshal,Jimmy Lin,Yashar Mehdad,Wen-tau Yih,Xilun Chen
    Download URL

    https://aclanthology.org/2023.acl-long.663/


    abstract

    AbstractMulti-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers and have achieved state-of-the-art performance on various retrieval tasks.These methods, however, are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts.In this paper, we unify different multi-vector retrieval models from a token routing viewpoint and propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.CITADEL learns to route different token vectors to the predicted lexical keys such that a query token vector only interacts with document token vectors routed to the same key.This design significantly reduces the computation cost while maintaining high accuracy.Notably, CITADEL achieves the same or slightly better performance than the previous state of the art, ColBERT-v2, on both in-domain (MS MARCO) and out-of-domain (BEIR) evaluations, while being nearly 40 times faster. Source code and data are available at https://github.com/facebookresearch/dpr-scale/tree/citadel.

  • 48.Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval

    John Wieting,Jonathan Clark,William Cohen,Graham Neubig,Taylor Berg-Kirkpatrick
    Download URL

    https://aclanthology.org/2023.acl-long.673/


    abstract

    AbstractContrastive learning has been successfully used for retrieval of semantically aligned sentences, but it often requires large batch sizes or careful engineering to work well. In this paper, we instead propose a generative model for learning multilingual text embeddings which can be used to retrieve or score sentence pairs. Our model operates on parallel data in N languages and, through an approximation we introduce, efficiently encourages source separation in this multilingual setting, separating semantic information that is shared between translations from stylistic or language-specific variation. We show careful large-scale comparisons between contrastive and generation-based approaches for learning multilingual text embeddings, a comparison that has not been done to the best of our knowledge despite the popularity of these approaches. We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval - the last of which we introduce in this paper. Overall, our model outperforms both a strong contrastive and generative baseline on these tasks.

  • 49.XDailyDialog: A Multilingual Parallel Dialogue Corpus

    Zeming Liu,Ping Nie,Jie Cai,Haifeng Wang,Zheng-Yu Niu,Peng Zhang,Mrinmaya Sachan,Kaiping Peng
    Download URL

    https://aclanthology.org/2023.acl-long.684/


    abstract

    AbstractHigh-quality datasets are significant to the development of dialogue models.However, most existing datasets for open-domain dialogue modeling are limited to a single language.The absence of multilingual open-domain dialog datasets not only limits the research on multilingual or cross-lingual transfer learning, but also hinders the development of robust open-domain dialog systems that can be deployed in other parts of the world.In this paper, we provide a multilingual parallel open-domain dialog dataset, XDailyDialog, to enable researchers to explore the challenging task of multilingual and cross-lingual open-domain dialog. XDailyDialog includes 13K dialogues aligned across 4 languages (52K dialogues and 410K utterances in total). We then propose a dialog generation model, kNN-Chat, which has a novel kNN-search mechanism to support unified response retrieval for monolingual, multilingual, and cross-lingual dialogue. Experiment results show the effectiveness of this framework. We will make XDailyDialog and kNN-Chat publicly available soon.

  • 50.Effective Contrastive Weighting for Dense Query Expansion

    Xiao Wang,Sean MacAvaney,Craig Macdonald,Iadh Ounis
    Download URL

    https://aclanthology.org/2023.acl-long.710/


    abstract

    AbstractVerbatim queries submitted to search engines often do not sufficiently describe the userâ??s search intent. Pseudo-relevance feedback (PRF) techniques, which modify a queryâ??srepresentation using the top-ranked documents, have been shown to overcome such inadequacies and improve retrieval effectiveness for both lexical methods (e.g., BM25) and dense methods (e.g., ANCE, ColBERT). For instance, the recent ColBERT-PRF approach heuristically chooses new embeddings to add to the query representation using the inverse document frequency (IDF) of the underlying tokens. However, this heuristic potentially ignores the valuable context encoded by the embeddings. In this work, we present a contrastive solution that learns to select the most useful embeddings for expansion. More specifically, a deep language model-based contrastive weighting model, called CWPRF, is trained to learn to discriminate between relevant and non-relevant documents for semantic search. Our experimental results show that our contrastive weighting model can aid to select useful expansion embeddings and outperform various baselines. In particular, CWPRF can improve nDCG@10 by upto to 4.1% compared to an existing PRF approach for ColBERT while maintaining its efficiency.

  • 51.mCLIP: Multilingual CLIP via Cross-lingual Transfer

    Guanhua Chen,Lu Hou,Yun Chen,Wenliang Dai,Lifeng Shang,Xin Jiang,Qun Liu,Jia Pan,Wenping Wang
    Download URL

    https://aclanthology.org/2023.acl-long.728/


    abstract

    AbstractLarge-scale vision-language pretrained (VLP) models like CLIP have shown remarkable performance on various downstream cross-modal tasks. However, they are usually biased towards English due to the lack of sufficient non-English image-text pairs. Existing multilingual VLP methods often learn retrieval-inefficient single-stream models by translation-augmented non-English image-text pairs. In this paper, we introduce mCLIP, a retrieval-efficient dual-stream multilingual VLP model, trained by aligning the CLIP model and a Multilingual Text Encoder (MTE) through a novel Triangle Cross-modal Knowledge Distillation (TriKD) method. It is parameter-efficient as only two light projectors on the top of them are updated during distillation. Furthermore, to enhance the token- and sentence-level multilingual representation of the MTE, we propose to train it with machine translation and contrastive learning jointly before the TriKD to provide a better initialization. Empirical results show that mCLIP achieves new state-of-the-art performance for both zero-shot and finetuned multilingual image-text retrieval task.

  • 52.On Complementarity Objectives for Hybrid Retrieval

    Dohyeon Lee,Seung-won Hwang,Kyungjae Lee,Seungtaek Choi,Sunghyun Park
    Download URL

    https://aclanthology.org/2023.acl-long.746/


    abstract

    AbstractDense retrieval has shown promising results in various information retrieval tasks, and hybrid retrieval, combined with the strength of sparse retrieval, has also been actively studied. A key challenge in hybrid retrieval is to make sparse and dense complementary to each other. Existing models have focused on dense models to capture â??residualâ?? features neglected in the sparse models. Our key distinction is to show how this notion of residual complementarity is limited, and propose a new objective, denoted as RoC (Ratio of Complementarity), which captures a fuller notion of complementarity. We propose a two-level orthogonality designed to improve RoC, then show that the improved RoC of our model, in turn, improves the performance of hybrid retrieval. Our method outperforms all state-of-the-art methods on three representative IR benchmarks: MSMARCO-Passage, Natural Questions, and TREC Robust04, with statistical significance. Our finding is also consistent in various adversarial settings.

  • 53.MVP-Tuning: Multi-View Knowledge Retrieval with Prompt Tuning for Commonsense Reasoning

    Yongfeng Huang,Yanyang Li,Yichong Xu,Lin Zhang,Ruyi Gan,Jiaxing Zhang,Liwei Wang
    Download URL

    https://aclanthology.org/2023.acl-long.750/


    abstract

    AbstractRecent advances in pre-trained language models (PLMs) have facilitated the development ofcommonsense reasoning tasks. However, existing methods rely on multi-hop knowledgeretrieval and thus suffer low accuracy due toembedded noise in the acquired knowledge.In addition, these methods often attain highcomputational costs and nontrivial knowledgeloss because they encode the knowledge independently of the PLM, making it less relevant to the task and thus resulting in a poorlocal optimum. In this work, we propose MultiView Knowledge Retrieval with Prompt Tuning (MVP-Tuning). MVP-Tuning leveragessimilar question-answer pairs in the training setto improve knowledge retrieval and employsa single prompt-tuned PLM to model knowledge and input text jointly. We conduct our experiments on five commonsense reasoning QAbenchmarks to show that MVP-Tuning outperforms all other baselines in 4 out of 5 datasetswith less than 2% trainable parameters. MVPTuning even gets a new state-of-the-art resulton OpenBookQA and is number one on theleaderboard.

  • 54.Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

    Jing Xu,Megan Ung,Mojtaba Komeili,Kushal Arora,Y-Lan Boureau,Jason Weston
    Download URL

    https://aclanthology.org/2023.acl-long.758/


    abstract

    AbstractFrozen models trained to mimic static datasets can never improve their performance. Models that can employ internet-retrieval for up-to-date information and obtain feedback from humans during deployment provide the promise of both adapting to new information, and improving their performance. In this work we study how to improve internet-driven conversational skills in such a learning framework. We collect deployment data, which we make publicly available, of human interactions, and collect various types of human feedback â?? including binary quality measurements, free-form text feedback, and fine-grained reasons for failure. We then study various algorithms for improving from such feedback, including standard supervised learning, rejection sampling, model-guiding and reward-based learning, in order to make recommendations on which type of feed- back and algorithms work best. We find the recently introduced DIRECTOR model (Arora et al., 2022) shows significant improvements over other existing approaches.

  • 55.U-CREAT: Unsupervised Case Retrieval using Events extrAcTion

    Abhinav Joshi,Akshat Sharma,Sai Kiran Tanikella,Ashutosh Modi
    Download URL

    https://aclanthology.org/2023.acl-long.777/


    abstract

    AbstractThe task of Prior Case Retrieval (PCR) in the legal domain is about automatically citing relevant (based on facts and precedence) prior legal cases in a given query case. To further promote research in PCR, in this paper, we propose a new large benchmark (in English) for the PCR task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. Given the complex nature of case relevance and the long size of legal documents, BM25 remains a strong baseline for ranking the cited prior documents. In this work, we explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT (Unsupervised Case Retrieval using Events Extraction). We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin, making it applicable to real-time case retrieval systems. Our proposed system is generic, we show that it generalizes across two different legal systems (Indian and Canadian), and it shows state-of-the-art performance on the benchmarks for both the legal systems (IL-PCR and COLIEE corpora).

  • 56.QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations

    Chaitanya Malaviya,Peter Shaw,Ming-Wei Chang,Kenton Lee,Kristina Toutanova
    Download URL

    https://aclanthology.org/2023.acl-long.784/


    abstract

    AbstractFormulating selective information needs results in queries that implicitly specify set operations, such as intersection, union, and difference. For instance, one might search for â??shorebirds that are not sandpipersâ?? or â??science-fiction films shot in Englandâ??. To study the ability of retrieval systems to meet such information needs, we construct QUEST, a dataset of 3357 natural language queries with implicit set operations, that map to a set of entities corresponding to Wikipedia documents. The dataset challenges models to match multiple constraints mentioned in queries with corresponding evidence in documents and correctly perform various set operations. The dataset is constructed semi-automatically using Wikipedia category names. Queries are automatically composed from individual categories, then paraphrased and further validated for naturalness and fluency by crowdworkers. Crowdworkers also assess the relevance of entities based on their documents and highlight attribution of query constraints to spans of document text. We analyze several modern retrieval systems, finding that they often struggle on such queries. Queries involving negation and conjunction are particularly challenging and systems are further challenged with combinations of these operations.

  • 57.Ambiguous Learning from Retrieval: Towards Zero-shot Semantic Parsing

    Shan Wu,Chunlei Xin,Hongyu Lin,Xianpei Han,Cao Liu,Jiansong Chen,Fan Yang,Guanglu Wan,Le Sun
    Download URL

    https://aclanthology.org/2023.acl-long.787/


    abstract

    AbstractCurrent neural semantic parsers take a supervised approach requiring a considerable amount of training data which is expensive and difficult to obtain. Thus, minimizing the supervision effort is one of the key challenges in semantic parsing. In this paper, we propose the Retrieval as Ambiguous Supervision framework, in which we construct a retrieval system based on pretrained language models to collect high-coverage candidates. Assuming candidates always contain the correct ones, we convert zero-shot task into ambiguously supervised task. To improve the precision and coverage of such ambiguous supervision, we propose a confidence-driven self-training algorithm, in which a semantic parser is learned and exploited to disambiguate the candidates iteratively. Experimental results show that our approach significantly outperforms the state-of-the-art zero-shot semantic parsing methods.

  • 58.To Adapt or to Annotate: Challenges and Interventions for Domain Adaptation in Open-Domain Question Answering

    Dheeru Dua,Emma Strubell,Sameer Singh,Pat Verga
    Download URL

    https://aclanthology.org/2023.acl-long.807/


    abstract

    AbstractRecent advances in open-domain question answering (ODQA) have demonstrated impressive accuracy on general-purpose domains like Wikipedia. While some work has been investigating how well ODQA models perform when tested for out-of-domain (OOD) generalization, these studies have been conducted only under conservative shifts in data distribution and typically focus on a single component (i.e., retriever or reader) rather than an end-to-end system. This work proposes a more realistic end-to-end domain shift evaluation setting covering five diverse domains. We not only find that end-to-end models fail to generalize but that high retrieval scores often still yield poor answer prediction accuracy. To address these failures, we investigate several interventions, in the form of data augmentations, for improving model adaption and use our evaluation set to elucidate the relationship between the efficacy of an intervention scheme and the particular type of dataset shifts we consider. We propose a generalizability test that estimates the type of shift in a target dataset without training a model in the target domain and that the type of shift is predictive of which data augmentation schemes will be effective for domain adaption. Overall, we find that these interventions increase end-to-end performance by up to ~24 points.

  • 59.PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification

    Yau-Shian Wang,Ta-Chung Chi,Ruohong Zhang,Yiming Yang
    Download URL

    https://aclanthology.org/2023.acl-long.832/


    abstract

    AbstractWe present PESCO, a novel contrastive learning framework that substantially improves the performance of zero-shot text classification. We formulate text classification as a neural text retrieval problem where each document is treated as a query, and the system learns the mapping from each query to the relevant class labels by (1) adding prompts to enhance label retrieval, and (2) using retrieved labels to enrich the training set in a self-training loop of contrastive learning. PESCO achieves state-of-the-art performance on four benchmark text classification datasets. On DBpedia, we achieve 98.5% accuracy without any labeled data, which is close to the fully-supervised result. Extensive experiments and analyses show all the components of PESCO are necessary for improving the performance of zero-shot text classification.

  • 60.Visually-augmented pretrained language models for NLP tasks without images

    Hangyu Guo,Kun Zhou,Wayne Xin Zhao,Qinyu Zhang,Ji-Rong Wen
    Download URL

    https://aclanthology.org/2023.acl-long.833/


    abstract

    AbstractAlthough pre-trained language models (PLMs) have shown impressive performance by text-only self-supervised training, they are found lack of visual semantics or commonsense. Existing solutions often rely on explicit images for visual knowledge augmentation (requiring time-consuming retrieval or generation), and they also conduct the augmentation for the whole input text, without considering whether it is actually needed in specific inputs or tasks. To address these issues, we propose a novel **V**isually-**A**ugmented fine-tuning approach that can be generally applied to various PLMs or NLP tasks, **W**ithout using any retrieved or generated **I**mages, namely **VAWI**. Experimental results show that our approach can consistently improve the performance of BERT, RoBERTa, BART, and T5 at different scales, and outperform several competitive baselines on ten tasks. Our codes and data are publicly available at https://github.com/RUCAIBox/VAWI.

  • 61.Bridging The Gap: Entailment Fused-T5 for Open-retrieval Conversational Machine Reading Comprehension

    Xiao Zhang,Heyan Huang,Zewen Chi,Xian-Ling Mao
    Download URL

    https://aclanthology.org/2023.acl-long.857/


    abstract

    AbstractOpen-retrieval conversational machine reading comprehension (OCMRC) simulates real-life conversational interaction scenes. Machines are required to make a decision of â??Yes/No/Inquireâ?? or generate a follow-up question when the decision is â??Inquireâ?? based on retrieved rule texts, user scenario, user question and dialogue history. Recent studies try to reduce the information gap between decision-making and question generation, in order to improve the performance of generation. However, the information gap still persists because these methods are still limited in pipeline framework, where decision-making and question generation are performed separately, making it hard to share the entailment reasoning used in decision-making across all stages. To tackle the above problem, we propose a novel one-stage end-to-end framework, called Entailment Fused-T5 (EFT), to bridge the information gap between decision-making and question generation in a global understanding manner. The extensive experimental results demonstrate that our proposed framework achieves new state-of-the-art performance on the OR-ShARC benchmark. Our model and code are publicly available at an anonymous link.

  • 62.LiveChat: A Large-Scale Personalized Dialogue Dataset Automatically Constructed from Live Streaming

    Jingsheng Gao,Yixin Lian,Ziyi Zhou,Yuzhuo Fu,Baoyuan Wang
    Download URL

    https://aclanthology.org/2023.acl-long.858/


    abstract

    AbstractOpen-domain dialogue systems have made promising progress in recent years. While the state-of-the-art dialogue agents are built upon large-scale social media data and large pre-trained models, there is no guarantee these agents could also perform well in fast-growing scenarios, such as live streaming, due to the bounded transferability of pre-trained models and biased distributions of public datasets from Reddit and Weibo, etc. To improve the essential capability of responding and establish a benchmark in the live open-domain scenario, we introduce the LiveChat dataset, composed of 1.33 million real-life Chinese dialogues with almost 3800 average sessions across 351 personas and fine-grained profiles for each persona. LiveChat is automatically constructed by processing numerous live videos on the Internet and naturally falls within the scope of multi-party conversations, where the issues of Who says What to Whom should be considered. Therefore, we target two critical tasks of response modeling and addressee recognition and propose retrieval-based baselines grounded on advanced techniques. Experimental results have validated the positive effects of leveraging persona profiles and larger average sessions per persona. In addition, we also benchmark the transferability of advanced generation-based models on LiveChat and pose some future directions for current challenges.

  • 63.Pre-trained Language Models Can be Fully Zero-Shot Learners

    Xuandong Zhao,Siqi Ouyang,Zhiguo Yu,Ming Wu,Lei Li
    Download URL

    https://aclanthology.org/2023.acl-long.869/


    abstract

    AbstractHow can we extend a pre-trained model to many language understanding tasks, without labeled or additional unlabeled data? Pre-trained language models (PLMs) have been effective for a wide range of NLP tasks. However, existing approaches either require fine-tuning on downstream labeled datasets or manually constructing proper prompts. In this paper, we propose nonparametric prompting PLM (NPPrompt) for fully zero-shot language understanding. Unlike previous methods, NPPrompt uses only pre-trained language models and does not require any labeled data or additional raw corpus for further fine-tuning, nor does it rely on humans to construct a comprehensive set of prompt label words. We evaluate NPPrompt against previous major few-shot and zero-shot learning methods on diverse NLP tasks: including text classification, text entailment, similar text retrieval, paraphrasing, and multiple-choice question answering. Experimental results demonstrate that our NPPrompt outperforms the previous best fully zero-shot method by big margins, with absolute gains of 12.8% in accuracy on text classification and 15.6% on the GLUE benchmark. Our source code is available at https://anonymous.4open. science/r/NPPrompt.

  • 64.An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models

    Zhongbin Xie,Thomas Lukasiewicz
    Download URL

    https://aclanthology.org/2023.acl-long.876/


    abstract

    AbstractThe increasingly large size of modern pre-trained language models not only makes them inherit more human-like biases from the training corpora, but also makes it computationally expensive to mitigate such biases. In this paper, we investigate recent parameter-efficient methods in combination with counterfactual data augmentation (CDA) for bias mitigation. We conduct extensive experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance and abilities to preserve the internal knowledge of a pre-trained model. We find that the parameter-efficient methods (i) are effective in mitigating gender bias, where adapter tuning is consistently the most effective one and prompt tuning is more suitable for GPT-2 than BERT, (ii) areless effective when it comes to racial and religious bias, which may be attributed to the limitations of CDA, and (iii) can perform similarly to or sometimes better than full fine-tuning with improved time and memory efficiency, as well as maintain the internal knowledge in BERT and GPT-2, evaluated via fact retrieval and downstream fine-tuning.

  • 65.Few-shot Reranking for Multi-hop QA via Language Model Prompting

    Muhammad Khalifa,Lajanugen Logeswaran,Moontae Lee,Honglak Lee,Lu Wang
    Download URL

    https://aclanthology.org/2023.acl-long.885/


    abstract

    AbstractWe study few-shot reranking for multi-hop QA (MQA) with open-domain questions. To alleviate the need for a large number of labeled question-document pairs for retriever training, we propose PromptRank, which relies on language model prompting for multi-hop path reranking. PromptRank first constructs an instruction-based prompt that includes a candidate document path and then computes the relevance score between a given question and the path based on the conditional likelihood of the question given the path prompt according to a language model. PromptRank yields strong retrieval performance on HotpotQA with only 128 training examples compared to state-of-the-art methods trained on thousands of examples â?? 73.6 recall@10 by PromptRank vs. 77.8 by PathRetriever and 77.5 by multi-hop dense retrieval.

  • 66.Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning

    Shivaen Ramshetty,Gaurav Verma,Srijan Kumar
    Download URL

    https://aclanthology.org/2023.acl-long.890/


    abstract

    AbstractThe robustness of multimodal deep learning models to realistic changes in the input text is critical for applicability on important tasks such as text-to-image retrieval and cross-modal entailment. To measure robustness, several existing approaches edit the text data, but without leveraging the cross-modal information present in multimodal data. Such information from the visual modality, such as color, size, and shape, provides additional attributes that users can include in their inputs. Thus, we propose cross-modal attribute insertions as a realistic perturbation strategy for vision-and-language data that inserts visual attributes of the objects in the image into the corresponding text (e.g., â??girl on a chairâ?? to â??little girl on a wooden chairâ??). Our proposed approach for cross-modal attribute insertions is modular, controllable, and task-agnostic. We find that augmenting input text using cross-modal insertions causes state-of-the-art approaches for text-to-image retrieval and cross-modal entailment to perform poorly, resulting in relative drops of ~15% in MRR and ~20% in F1 score, respectively. Crowd-sourced annotations demonstrate that cross-modal insertions lead to higher quality augmentations for multimodal data than augmentations using text-only data, and are equivalent in quality to original examples. We release the code to encourage robustness evaluations of deep vision-and-language models: https://github.com/claws-lab/multimodal-robustness-xmai

  • 67.A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text

    Yunxin Li,Baotian Hu,Yuxin Ding,Lin Ma,Min Zhang
    Download URL

    https://aclanthology.org/2023.acl-long.909/


    abstract

    AbstractPretrained Vision-Language Models (VLMs) have achieved remarkable performance in image retrieval from text. However, their performance drops drastically when confronted with linguistically complex texts that they struggle to comprehend. Inspired by the Divide-and-Conquer algorithm and dual-process theory, in this paper, we regard linguistically complex texts as compound proposition texts composed of multiple simple proposition sentences and propose an end-to-end Neural Divide-and-Conquer Reasoning framework, dubbed NDCR. It contains three main components: 1) Divide: a proposition generator divides the compound proposition text into simple proposition sentences and produces their corresponding representations, 2) Conquer: a pretrained VLMs-based visual-linguistic interactor achieves the interaction between decomposed proposition sentences and images, 3) Combine: a neural-symbolic reasoner combines the above reasoning states to obtain the final solution via a neural logic reasoning approach. According to the dual-process theory, the visual-linguistic interactor and neural-symbolic reasoner could be regarded as analogical reasoning System 1 and logical reasoning System 2. We conduct extensive experiments on a challenging image retrieval from contextual descriptions data set. Experimental results and analyses indicate NDCR significantly improves performance in the complex image-text reasoning problem.

  • 68.Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis

    Xuming Hu,Zhijiang Guo,Zhiyang Teng,Irwin King,Philip S. Yu
    Download URL

    https://aclanthology.org/2023.acl-short.27/


    abstract

    AbstractMultimodal relation extraction (MRE) is the task of identifying the semantic relationships between two entities based on the context of the sentence image pair. Existing retrieval-augmented approaches mainly focused on modeling the retrieved textual knowledge, but this may not be able to accurately identify complex relations. To improve the prediction, this research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image. We further develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities. Extensive experiments and analyses show that the proposed method is able to effectively select and compare evidence across modalities and significantly outperforms state-of-the-art models.

  • 69.Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages

    Yasmine Karoui,Rémi Lebret,Negar Foroutan Eghlidi,Karl Aberer
    Download URL

    https://aclanthology.org/2023.acl-short.32/


    abstract

    AbstractVision-Language Pre-training (VLP) has advanced the performance of many vision-language tasks, such as image-text retrieval, visual entailment, and visual reasoning.The pre-training mostly utilizes lexical databases and image queries in English. Previous work has demonstrated that the pre-training in English does not transfer well to other languages in a zero-shot setting. However, multilingual pre-trained language models (MPLM) have excelled at a variety of single-modal language tasks. In this paper, we propose a simple yet efficient approach to adapt VLP to unseen languages using MPLM.We utilize a cross-lingual contextualised token embeddings alignment approach to train text encoders for non-English languages. Our approach does not require image input and primarily uses machine translation, eliminating the need for target language data. Our evaluation across three distinct tasks (image-text retrieval, visual entailment, and natural language visual reasoning) demonstrates that this approach outperforms the state-of-the-art multilingual vision-language models without requiring large parallel corpora. Our code is available at https://github.com/Yasminekaroui/CliCoTea.

  • 70.Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models

    Ehsan Doostmohammadi,Tobias Norlund,Marco Kuhlmann,Richard Johansson
    Download URL

    https://aclanthology.org/2023.acl-short.45/


    abstract

    AbstractAugmenting language models with a retrieval mechanism has been shown to significantly improve their performance while keeping the number of parameters low. Retrieval-augmented models commonly rely on a semantic retrieval mechanism based on the similarity between dense representations of the query chunk and potential neighbors. In this paper, we study the state-of-the-art Retro model and observe that its performance gain is better explained by surface-level similarities, such as token overlap. Inspired by this, we replace the semantic retrieval in Retro with a surface-level method based on BM25, obtaining a significant reduction in perplexity. As full BM25 retrieval can be computationally costly for large datasets, we also apply it in a re-ranking scenario, gaining part of the perplexity reduction with minimal computational overhead.

  • 71.MIReAD: Simple Method for Learning High-quality Representations from Scientific Documents

    Anastasiia Razdaibiedina,Aleksandr Brechalov
    Download URL

    https://aclanthology.org/2023.acl-short.46/


    abstract

    AbstractLearning semantically meaningful representations from scientific documents can facilitate academic literature search and improve performance of recommendation systems. Pretrained language models have been shown to learn rich textual representations, yet they cannot provide powerful document-level representations for scientific articles. We propose MIReAD, a simple method that learns highquality representations of scientific papers by fine-tuning transformer model to predict the target journal class based on the abstract. We train MIReAD on more than 500,000 PubMed and arXiv abstracts across over 2,000 journal classes. We show that MIReAD produces representations that can be used for similar papers retrieval, topic categorization and literature search. Our proposed approach outperforms six existing models for representation learning on scientific documents across four evaluation standards.

  • 72.Typo-Robust Representation Learning for Dense Retrieval

    Panuthep Tasawong,Wuttikorn Ponwitayarat,Peerat Limkonchotiwat,Can Udomcharoenchaikit,Ekapol Chuangsuwanich,Sarana Nutanong
    Download URL

    https://aclanthology.org/2023.acl-short.95/


    abstract

    AbstractDense retrieval is a basic building block of information retrieval applications. One of the main challenges of dense retrieval in real-world settings is the handling of queries containing misspelled words. A popular approach for handling misspelled queries is minimizing the representations discrepancy between misspelled queries and their pristine ones. Unlike the existing approaches, which only focus on the alignment between misspelled and pristine queries, our method also improves the contrast between each misspelled query and its surrounding queries. To assess the effectiveness of our proposed method, we compare it against the existing competitors using two benchmark datasets and two base encoders. Our method outperforms the competitors in all cases with misspelled queries. Our code and models are available at https://github.com/panuthept/DST-DenseRetrieval.

  • 73.ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models

    Jianyi Zhang,Aashiq Muhamed,Aditya Anantharaman,Guoyin Wang,Changyou Chen,Kai Zhong,Qingjun Cui,Yi Xu,Belinda Zeng,Trishul Chilimbi,Yiran Chen
    Download URL

    https://aclanthology.org/2023.acl-short.97/


    abstract

    AbstractKnowledge Distillation (KD) is one of the most effective approaches to deploying large-scale pre-trained language models in low-latency environments by transferring the knowledge contained in the large-scale models to smaller student models.Prior KD approaches use the soft labels and intermediate activations generated by the teacher to transfer knowledge to the student model parameters alone. In this paper, we show that having access to non-parametric memory in the form of a knowledge base with the teacherâ??s soft labels and predictions can further improve student generalization. To enable the student to retrieve from the knowledge base effectively, we propose a new framework and loss function that preserves the semantic similarities of teacher and student training examples. We show through extensive experiments that our retrieval mechanism can achieve state-of-the-art performance for task-specific knowledge distillation on the GLUE benchmark.

  • 74.RAMP: Retrieval and Attribute-Marking Enhanced Prompting for Attribute-Controlled Translation

    Gabriele Sarti,Phu Mon Htut,Xing Niu,Benjamin Hsu,Anna Currey,Georgiana Dinu,Maria Nadejde
    Download URL

    https://aclanthology.org/2023.acl-short.126/


    abstract

    AbstractAttribute-controlled translation (ACT) is a subtask of machine translation that involves controlling stylistic or linguistic attributes (like formality and gender) of translation outputs. While ACT has garnered attention in recent years due to its usefulness in real-world applications, progress in the task is currently limited by dataset availability, since most prior approaches rely on supervised methods. To address this limitation, we propose Retrieval and Attribute-Marking enhanced Prompting (RAMP), which leverages large multilingual language models to perform ACT in few-shot and zero-shot settings. RAMP improves generation accuracy over the standard prompting approach by (1) incorporating a semantic similarity retrieval component for selecting similar in-context examples, and (2) marking in-context examples with attribute annotations. Our comprehensive experiments show that RAMP is a viable approach in both zero-shot and few-shot settings.

  • 75.LI-RAGE: Late Interaction Retrieval Augmented Generation with Explicit Signals for Open-Domain Table Question Answering

    Weizhe Lin,Rexhina Blloshmi,Bill Byrne,Adria de Gispert,Gonzalo Iglesias
    Download URL

    https://aclanthology.org/2023.acl-short.133/


    abstract

    AbstractRecent open-domain TableQA models are typically implemented as retriever-reader pipelines. The retriever component is usually a variant of the Dense Passage Retriever, which computes the similarities between questions and tables based on a single representation of each.These fixed vectors can be insufficient to capture fine-grained features of potentially very big tables with heterogeneous row/column information. We address this limitation by 1) applying late interaction models which enforce a finer-grained interaction between question and table embeddings at retrieval time. In addition, we 2) incorporate a joint training scheme of the retriever and reader with explicit table-level signals, and 3) embed a binary relevance token as a prefix to the answer generated by the reader, so we can determine at inference time whether the table used to answer the question is reliable and filter accordingly. The combined strategies set a new state-to-the-art performance on two public open-domain TableQA datasets.

  • 76.Task-Aware Specialization for Efficient and Robust Dense Retrieval for Open-Domain Question Answering

    Hao Cheng,Hao Fang,Xiaodong Liu,Jianfeng Gao
    Download URL

    https://aclanthology.org/2023.acl-short.159/


    abstract

    AbstractGiven its effectiveness on knowledge-intensive natural language processing tasks, dense retrieval models have become increasingly popular. Specifically, the de-facto architecture for open-domain question answering uses two isomorphic encoders that are initialized from the same pretrained model but separately parameterized for questions and passages. This biencoder architecture is parameter-inefficient in that there is no parameter sharing between encoders. Further, recent studies show that such dense retrievers underperform BM25 in various settings. We thus propose a new architecture, Task-Aware Specialization for dEnse Retrieval (TASER), which enables parameter sharing by interleaving shared and specialized blocks in a single encoder. Our experiments on five question answering datasets show that TASER can achieve superior accuracy, surpassing BM25, while using about 60% of the parameters as bi-encoder dense retrievers. In out-of-domain evaluations, TASER is also empirically more robust than bi-encoder dense retrievers. Our code is available at https://github.com/microsoft/taser.