socel.net is one of the many independent Mastodon servers you can use to participate in the fediverse.
Socel is a place for animation professionals, freelancers, independents, students, and fans to connect and grow together. Everyone in related fields are also welcome.

Server stats:

328
active users

#nlproc

3 posts3 participants0 posts today

Someone from U Zurich did an undisclosed persuasion experiment on Reddit users in r/ChangeMyView using #LLM bots. This kind of social media research is absolutely unethical and the "results" should not be published.
Additional shame on the ethics committee for arguing *for* publication. In my view, this is outrageous scientific misconduct. #nlproc #academia #ethics #socialMedia
reddit.com/r/changemyview/comm

Continued thread

🌍 We welcome applicants from all backgrounds and nationalities.

📅 Application deadline: May 25th, 2025.
After that, the position will remain open until filled. We will consider applications as soon as they are submitted.

(4/4)

#NLProc#NLP#Postdoc

#PhD job in the Dept. of Language and Information Sciences at the University of Lausanne: my colleague Davide Picca has an open PhD position starting on October 1, 2025 in an SNSF-funded project focused on the computational analysis of Charles S. #Peirce’s manuscripts.

Deadline for application: May 19, 2025

career5.successfactors.eu/care

career5.successfactors.euCareer Opportunities: Doctoral Student SNSF in Digital Humanities and Computational Semiotic Studies (22226)

Call from the past: This week I was contacted about k-delayed tree-local MCTAGs, a formalism I proposed with David Chiang in 2008 😍
Sadly, both I and the field at large have moved on, but it is so nice to see that someone still gets value out of this
aclanthology.org/W08-2303/
#sigh #TAG #nlproc #academicChatter

ACL AnthologyFlexible Composition and Delayed Tree-LocalityDavid Chiang, Tatjana Scheffler. Proceedings of the Ninth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+9). 2008.

Moved all my stuff out of Dropbox (didn’t have much there), Google drive is next (but it’s a bit more messy and complicated).
I have a colleague who years back was concerned about keeping work stuff (eg paper drafts, grant proposals) on Google -we’re in #nlproc, kind of the same area as they are-, and I thought he was a bit paranoid. Now I think it’s probably best to keep our stuff closer to home instead of on US clouds. #academicChatter #europe #warOnScience

Hi #nlproc people my ARR area chairing docket is very very far behind (after sending reminders) and I am not able to make up the gap from office neighbours so if there's anyone able to review papers in LLM evaluation especially please message me.

Gestern erschien der Podcast "Sockenpuppenzoo - Angriff auf Wikipedia”, in dem die Investigativjournalisten @daniellaufer und @Schattleitner dokumentieren, wie über Jahre hinweg deutsche Wikipediatexte gezielt von rechtsextremen Netzwerken manipuliert wurden.

In Folge 3 trafen sich die beiden u.a. mit meinen Studierenden und mir zur Frage, ob automatische Autorschaftserkennung beim Aufdecken der Identitäten hilfreich sein könnte. 3 Studierende haben dann Projekte anhand der Wikipedia-Daten durchgeführt! #RUB #nlproc #wikipedia #forensischeLinguistik #podcast

ardaudiothek.de/sendung/socken

Continued thread

8/n

[2] Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, and Kelvin Guu. 2024. Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? arxiv.org/abs/2406.13121

arXiv.orgCan Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-end modeling that minimizes cascading errors in complex pipelines, and allows for the application of sophisticated prompting techniques across the entire system. To assess this paradigm shift, we introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks. However, LCLMs still face challenges in areas like compositional reasoning that are required in SQL-like tasks. Notably, prompting strategies significantly influence performance, emphasizing the need for continued research as context lengths grow. Overall, LOFT provides a rigorous testing ground for LCLMs, showcasing their potential to supplant existing paradigms and tackle novel tasks as model capabilities scale.
#NLP#NLProc#RAG
Continued thread

7/

REFERENCES

[1] Yifu Qiu, Varun Embar, Yizhe Zhang, Navdeep Jaitly, Shay B Cohen, and Benjamin Han. 2025. Eliciting in-context Retrieval and reasoning for long-context large language models. arxiv.org/abs/2501.08248

arXiv.orgEliciting In-context Retrieval and Reasoning for Long-context Large Language ModelsRecent advancements in long-context language models (LCLMs) promise to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With their expanded context windows, LCLMs can process entire knowledge bases and perform retrieval and reasoning directly -- a capability we define as In-Context Retrieval and Reasoning (ICR^2). However, existing benchmarks like LOFT often overestimate LCLM performance by providing overly simplified contexts. To address this, we introduce ICR^2, a benchmark that evaluates LCLMs in more realistic scenarios by including confounding passages retrieved with strong retrievers. We then propose three methods to enhance LCLM performance: (1) retrieve-then-generate fine-tuning, (2) retrieval-attention-probing, which uses attention heads to filter and de-noise long contexts during decoding, and (3) joint retrieval head training alongside the generation head. Our evaluation of five well-known LCLMs on LOFT and ICR^2 demonstrates significant gains with our best approach applied to Mistral-7B: +17 and +15 points by Exact Match on LOFT, and +13 and +2 points on ICR^2, compared to vanilla RAG and supervised fine-tuning, respectively. It even outperforms GPT-4-Turbo on most tasks despite being a much smaller model.
#NLP#NLProc#RAG
Continued thread

6/

Through extensive experiments on five LCLMs using both the LOFT and ICR² benchmarks, our best approach on Mistral-7B with a 32K token limit outperformed Vanilla RAG and SFT baselines by an average of +17 and +15 points (Exact Match) on LOFT, and by +13 and +2 points on ICR², respectively (picture). It even achieved performance comparable to the state-of-the-art GPT-4, despite having only 7B parameters.

#NLP#NLProc#RAG
Continued thread

4/

With a more realistic benchmark in hand, we systematically explored three approaches to enhance model performance:

1. Retrieve-then-generate supervised fine-tuning (picture): we train LCLMs to first retrieve relevant information from the context and then generate the final responses.

2. Retrieval-attention-probing: During inference, we probe attention heads activated for in-context retrieval, and use their top predictions to filter out confounders.

#NLP#NLProc#RAG
Continued thread

3/

This limitation often leads to inflated results. To address this, we created a more realistic dataset ICR². It uses five retrievers to generate challenging negative documents (picture 1). Our results show significant performance drop with standard RAG setups. For example, with GPT-4-Turbo, accuracy on NQ dropped from 0.85 to 0.67, and on HPQA, it fell from 0.78 to 0.64 (picture 2).

#NLP#NLProc#RAG
Continued thread

2/

But are current LCLMs up to the task? If not, how can we improve their performance?

In our preprint [1], we evaluated five popular LCLMs using the LOFT benchmark [2], which involves answering questions paired with documents. However, LOFT relies on random sampling to create irrelevant (negative) documents for each query, failing to include confounding documents — those that are relevant but misleading — which are common in real-world scenarios.

#NLP#NLProc#RAG

1/

What if #LLMs had context windows so large that an entire knowledge base could fit into a single prompt? This would revolutionize Retrieval-Augmented Generation (RAG) applications by enabling retrieval, re-ranking, reasoning, and generation all in one step. With a Long-Context Language Model (LCLM), we could simplify RAG architecture by leveraging the model’s capability for In-Context Retrieval and Reasoning (ICR²).

#NLP#NLProc#RAG