프롬프트 엔지니어링으로 텍스트 처리해보기

2024년 04월 02일 22시 05분 03초에 업로드 된 글입니다.

작성자: 재형이

답변 결과의 정확도 평가

qa_data = {
    "question": [
        "In a website browser address bar, what does “www” stand for?",
        "Who was the first woman to win a Nobel Prize",
        "What is the name of the Earth’s largest ocean?",
    ],
    "answer_groundtruth": ["World Wide Web", "Marie Curie", "The Pacific Ocean"],
}
qa_data_df = pd.DataFrame(qa_data)
qa_data_df

PaLM 2세대 모델을 사용하여 답변을 얻고 answer_prediction 열에 저장해봅시다

def get_answer(row):
    prompt = f"""Answer the following question as precise as possible.\n\n
            question: {row}
            answer:
              """
    return generation_model.predict(
        prompt=prompt,
    ).text


qa_data_df["answer_prediction"] = qa_data_df["question"].apply(get_answer)
qa_data_df

하지만 단순히 문자열을 비교만 해서는 정확한 평가를 하기가 어렵다
왜냐하면 정답은 The Pacific Ocean인데 예측값이 Pacific Ocean인 경우에 단순 문자열 비교로는 틀린 것으로 나오기 때문이다
fuzzywuzzy와 python-Levenshtein라는 라이브러리를 사용해보자
String1: "this is a test"
String2: "this is a test!"

Fuzz Ratio => 97
Fuzz Partial Ratio => 100
#대부분의 문자가 동일하고 유사한 순서에 있으므로 알고리즘은 부분 비율을 100으로 계산

!pip install -q python-Levenshtein --upgrade --user
!pip install -q fuzzywuzzy --upgrade --user

from fuzzywuzzy import fuzz


def get_fuzzy_match(df):
    return fuzz.partial_ratio(df["answer_groundtruth"], df["answer_prediction"])


qa_data_df["match_score"] = qa_data_df.apply(get_fuzzy_match, axis=1)
qa_data_df

print(
    "the average match score of all predicted answer from PaLM 2 is : ",
    qa_data_df["match_score"].mean(),
    " %",
)

# the average match score of all predicted answer from PaLM 2 is :  100.0  %

Install Vertex AI SDK & Import Libraries, Models

!pip install google-cloud-aiplatform --upgrade --user

import pandas as pd
from vertexai.language_models import TextGenerationModel

generation_model = TextGenerationModel.from_pretrained("text-bison@001")

텍스트 분류

Topic classification

prompt = """
Classify a piece of text into one of several predefined topics, such as sports, politics, or entertainment. \n
text: President Biden will be visiting India in the month of March to discuss a few opportunites. \n
class:
"""

print(
    generation_model.predict(
        prompt=prompt,
        max_output_tokens=256,
        temperature=0.1,
    ).text
)

# politics

Spam detection

prompt = """
Given an email, classify it as spam or not spam. \n
email: hi user, \n
      you have been selected as a winner of the lotery and can win upto 1 million dollar. \n
      kindly share your bank details and we can proceed from there. \n\n

      from, \n
      US Official Lottry Depatmint
"""

print(
    generation_model.predict(
        prompt=prompt,
        max_output_tokens=256,
        temperature=0.1,
    ).text
)

# spam

Language identification

prompt = """
Given a piece of text, classify the language it is written in. \n
text: Selam nasıl gidiyor?
language:
"""

print(
    generation_model.predict(
        prompt=prompt,
        max_output_tokens=256,
        temperature=0.1,
    ).text
)

# Turkish

Emotion detection

prompt = """
Given a piece of text, classify the emotion it conveys, such as happiness, or anger. \n
text: I'm still so delighted from yesterday's news
"""

print(
    generation_model.predict(
        prompt=prompt,
        max_output_tokens=256,
        temperature=0.1,
    ).text
)

# happiness

텍스트 추출

Converting an ingredients list to JSON format

prompt = """
Extract the ingredients from the following recipe. Return the ingredients in JSON format with keys: ingredient, quantity, type.

Ingredients:
* 1 tablespoon olive oil
* 1 onion, chopped
* 2 carrots, chopped
* 2 celery stalks, chopped
* 1 teaspoon ground cumin
* 1/2 teaspoon ground coriander
* 1/4 teaspoon turmeric powder
* 1/4 teaspoon cayenne pepper (optional)
* Salt and pepper to taste
* 1 (15 ounce) can black beans, rinsed and drained
* 1 (15 ounce) can kidney beans, rinsed and drained
* 1 (14.5 ounce) can diced tomatoes, undrained
* 1 (10 ounce) can diced tomatoes with green chilies, undrained
* 4 cups vegetable broth
* 1 cup chopped fresh cilantro
"""

print(
    generation_model.predict(
        prompt, temperature=0.2, max_output_tokens=1024, top_k=40, top_p=0.8
    ).text
)

###################결과###################
```
{
  "ingredient": "olive oil",
  "quantity": "1 tablespoon",
  "type": "oil"
},
{
  "ingredient": "onion",
  "quantity": "1",
  "type": "vegetable"
},
{
  "ingredient": "carrot",
  "quantity": "2",
  "type": "vegetable"
},
{
  "ingredient": "diced tomatoes",
  "quantity": "1 (14.5 ounce) can",
  "type": "vegetable"
},
...
{
  "ingredient": "diced tomatoes with green chilies",
  "quantity": "1 (10 ounce) can",
  "type": "vegetable"
},
{
  "ingredient": "vegetable broth",
  "quantity": "4 cups",
  "type": "liquid"
},
{
  "ingredient": "chopped fresh cilantro",
  "quantity": "1 cup",
  "type": "herb"
}
```

Organizing the results of a text extraction

prompt = """
Message: Rachel Green (Jennifer Aniston), a sheltered but friendly woman, flees her wedding day and wealthy yet unfulfilling life and finds childhood friend Monica Geller (Courteney Cox), a tightly wound but caring chef.
Rachel becomes a waitress at West Village coffee house Central Perk after she moves into Monica\'s apartment above Central Perk and joins Monica\'s group of single friends in their mid-20s:
previous roommate Phoebe Buffay (Lisa Kudrow), an odd masseuse and musician; neighbor Joey Tribbiani (Matt LeBlanc), a dim-witted yet loyal struggling actor; Joey\'s roommate Chandler Bing (Matthew Perry),
a sarcastic, self-deprecating data processor; and Monica\'s older brother and Chandler\'s college roommate Ross Geller (David Schwimmer), a sweet-natured but insecure paleontologist.

Extract the characters and the actors who played them from above message:
Rachel Green - Jennifer Aniston, Monica Geller - Courteney Cox, Phoebe Buffay - Lisa Kudrow, Joey Tribbiani - Matt LeBlanc, Chandler Bing - Matthew Perry, Ross Geller - David Schwimmer

Message: Games such as chess, poker, Go, and many video games have always been fertile ground for AI research. Diplomacy is a seven-player game of negotiation and alliance formation, played on an old map of Europe partitioned
into provinces, where each player controls multiple units (rules of Diplomacy). In the standard version of the game, called Press Diplomacy, each turn includes a negotiation phase, after which all players reveal their
chosen moves simultaneously. The heart of Diplomacy is the negotiation phase, where players try to agree on their next moves. For example, one unit may support another unit, allowing it to overcome resistance by other units,
as illustrated here: Computational approaches to Diplomacy have been researched since the 1980s, many of which were explored on a simpler version of the game called No-Press Diplomacy, where strategic communication between
players is not allowed. Researchers have also proposed computer-friendly negotiation protocols, sometimes called \342\200\234Restricted-Press\342\200\235.

Extract the deinition of Diplomacy:
A seven-player game of negotiation and alliance formation


Message: Back in 2016, when we weren\'t using simulation and were using a small lab-configuration of industrial robots to learn how to grasp small objects like toys, keys and everyday household items, it took the equivalent of
four months for one robot to learn how to perform a simple grasp with a 75%% success rate. Today, a single robot learns how to perform a complex task such as opening doors with a 90%% success rate with less than a day
of real-world learning. Even more excitingly, we\'ve shown that we can build on the algorithms and learnings from door opening and apply them to a new task: straightening up chairs in our cafes. This progress gives us
hope that our moonshot for building general purpose learning robots might just be possible.

Extract the success rates of the robots in 2016 and today, respectively:
75%, 90%

Message: CapitalG was founded a decade ago to empower entrepreneurs with Alphabet and Google\'s unparalleled expertise in growth.
We are privileged to share the lessons learned from helping to scale Google, Stripe, Airbnb, CrowdStrike, Databricks, and Zscaler with the next wave of generational tech companies-perhaps including yours.
Alphabet is our sole LP and provides patient, long-term capital. As an independent growth fund, our priorities align with our entrepreneurs. CapitalG companies have achieved product-market fit and are ready to scale. We maintain a small, concentrated portfolio so every company receives substantial capital and hands-on support.

Extract the companies funded by CapitalG:
"""

print(
    generation_model.predict(
        prompt, temperature=0.2, max_output_tokens=256, top_k=1, top_p=0.8
    ).text
)

# Google, Stripe, Airbnb, CrowdStrike, Databricks, and Zscaler

텍스트 요약

Transcript summarization

prompt = """
Provide a very short summary, no more than three sentences, for the following article:

Our quantum computers work by manipulating qubits in an orchestrated fashion that we call quantum algorithms.
The challenge is that qubits are so sensitive that even stray light can cause calculation errors — and the problem worsens as quantum computers grow.
This has significant consequences, since the best quantum algorithms that we know for running useful applications require the error rates of our qubits to be far lower than we have today.
To bridge this gap, we will need quantum error correction.
Quantum error correction protects information by encoding it across multiple physical qubits to form a “logical qubit,” and is believed to be the only way to produce a large-scale quantum computer with error rates low enough for useful calculations.
Instead of computing on the individual qubits themselves, we will then compute on logical qubits. By encoding larger numbers of physical qubits on our quantum processor into one logical qubit, we hope to reduce the error rates to enable useful quantum algorithms.

Summary:

"""

print(
    generation_model.predict(
        prompt, temperature=0.2, max_output_tokens=1024, top_k=40, top_p=0.8
    ).text
)

###################결과###################
Quantum computers are very sensitive and prone to errors.
Quantum error correction protects information by encoding it across multiple physical qubits to form a “logical qubit”.
By encoding larger numbers of physical qubits on our quantum processor into one logical qubit, we hope to reduce the error rates to enable useful quantum algorithms.

Summarize text into bullet points

prompt = """
Provide a very short summary in four bullet points for the following article:

Our quantum computers work by manipulating qubits in an orchestrated fashion that we call quantum algorithms.
The challenge is that qubits are so sensitive that even stray light can cause calculation errors — and the problem worsens as quantum computers grow.
This has significant consequences, since the best quantum algorithms that we know for running useful applications require the error rates of our qubits to be far lower than we have today.
To bridge this gap, we will need quantum error correction.
Quantum error correction protects information by encoding it across multiple physical qubits to form a “logical qubit,” and is believed to be the only way to produce a large-scale quantum computer with error rates low enough for useful calculations.
Instead of computing on the individual qubits themselves, we will then compute on logical qubits. By encoding larger numbers of physical qubits on our quantum processor into one logical qubit, we hope to reduce the error rates to enable useful quantum algorithms.

Bulletpoints:

"""

print(
    generation_model.predict(
        prompt, temperature=0.2, max_output_tokens=256, top_k=1, top_p=0.8
    ).text
)

###################결과###################
- Quantum computers work by manipulating qubits in an orchestrated fashion that we call quantum algorithms.
- Qubits are so sensitive that even stray light can cause calculation errors.
- The best quantum algorithms require the error rates of our qubits to be far lower than we have today.
- Quantum error correction protects information by encoding it across multiple physical qubits to form a “logical qubit”.

Title & heading generation

prompt = """
Write a title for this text, give me five options:
Whether helping physicians identify disease or finding photos of “hugs,” AI is behind a lot of the work we do at Google. And at our Arts & Culture Lab in Paris, we’ve been experimenting with how AI can be used for the benefit of culture.
Today, we’re sharing our latest experiments—prototypes that build on seven years of work in partnership the 1,500 cultural institutions around the world.
Each of these experimental applications runs AI algorithms in the background to let you unearth cultural connections hidden in archives—and even find artworks that match your home decor."
"""

print(
    generation_model.predict(
        prompt, temperature=0.8, max_output_tokens=256, top_k=1, top_p=0.8
    ).text
)

###################결과###################
1. How AI is used for the benefit of culture
2. Google Arts & Culture Lab experiments with AI
3. AI in the Arts & Culture Lab
4. AI for culture
5. AI in the Arts

'인공지능 > 프롬프트' 카테고리의 다른 글

Prompt Engineering - 프롬프트 엔지니어링 (0)	2024.04.12
프롬프트 디자인 (0)	2024.03.31

다음글이 없습니다.

이전글이 없습니다.

답변 결과의 정확도 평가

Install Vertex AI SDK & Import Libraries, Models

텍스트 분류

Topic classification

Spam detection

Language identification

Emotion detection

텍스트 추출

Converting an ingredients list to JSON format

Organizing the results of a text extraction

텍스트 요약

Transcript summarization

Summarize text into bullet points

Title & heading generation

'인공지능 > 프롬프트' 카테고리의 다른 글

티스토리툴바