๋ฆฌ๋ทฐ ์˜ˆ์ธก ํ”„๋กœ์ ํŠธ

2023. 6. 23. 10:28ใ†Python

vscode ๊ธฐ๋ณธ

ํŒŒ์ด์ฌ_ํ”„๋กœ์ ํŠธ ํด๋” ํ•˜๋‚˜ ์ƒ์„ฑํ•˜์—ฌ vscode์—์„œ ์—ด๊ธฐ

ํŒŒ์ผ ์ƒ์„ฑ test.py

 

์œ ์šฉํ•œ ํ™•์žฅํŒฉ ์„ค์น˜ 

Snippet

Trailing Spaces

Prettier - Code formatter

Tabnine AI Autocomplete for Javascript, Python, Typesc

Todo Tree

 

์œ ์šฉํ•œ ๊ธฐ๋Šฅ setting 

์ €์žฅํ•  ๋•Œ ๋„์›Œ์“ฐ๊ธฐ ๊ฐ™์€ ๊ฒƒ๋“ค ์ œ๋Œ€๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์ €์žฅํ•ด์ฃผ๋Š” ๊ธฐ๋Šฅ

๋ฆฌํ”„๋ ˆ์‰ฌ ๊ธฐ๋Šฅ

ctrl + p 

๊ฒ€์ƒ‰์ฐฝ ๋œจ๋ฉด

'>' ์น˜๊ณ  ๋ช…๋ น์–ด(? ์น˜๋ฉด vscode ํ™˜๊ฒฝ ์„ค์ • ๊ฐ€๋Šฅ

developer restart ~ : vscode ๋ฉˆ์ถ”๊ฑฐ๋‚˜ ์‹คํ–‰ ์ž˜ ์•ˆ ๋  ๋•Œ ์‚ฌ์šฉ

 

 

 

ํ„ฐ๋ฏธ๋„ ์ผœ๊ธฐ

์ œ๋ชฉ์ค„์˜ Terminal > new Terminal ํด๋ฆญ

+ ์•„์ด์ฝ˜ ๋ˆŒ๋Ÿฌ์„œ git bash ํด๋ฆญ

๊ฐ€์ƒ ํ™˜๊ฒฝ ์„ค์ • 

ํ„ฐ๋ฏธ๋„์— ์•„๋ž˜ ๋ช…๋ น์–ด ์น˜๊ธฐ

python -m venv venv(๊ฐ€์ƒ ํ™˜๊ฒฝ ์ด๋ฆ„)

 

ํด๋” ๊ฒฝ๋กœ๋กœ ์ด๋™ํ•˜๋ฉด ๊ฐ€์ƒ ํ™˜๊ฒฝ ํด๋” ์ƒ์„ฑ๋œ ๊ฒƒ ํ™•์ธ ๊ฐ€๋Šฅ!

 

๊ฐ€์ƒ ํ™˜๊ฒฝ ์ง„์ž… ๋ช…๋ น์–ด

source ./venv/Scripts/activate

๊ฐ€์ƒ ํ™˜๊ฒฝ์— ์ง„์ž… ์„ฑ๊ณตํ•˜๋ฉด ์•„๋ž˜ ์ด๋ฏธ์ง€์ฒ˜๋Ÿผ ๊ฐ€์ƒํ™˜๊ฒฝ์ด๋ฆ„์ด ๋œธ!

 

*์ฐธ๊ณ  ๊ฐ€์ƒ ํ™˜๊ฒฝ์—์„œ ๋ฒ—์–ด๋‚˜๋Š” ๋ช…๋ น์–ด

deactivate

 

 

ํด๋ž˜์Šค ์ƒ์„ฑ ๋ฐ ๊ฐ์ฒด ์ƒ์„ฑ ๋ฐฉ๋ฒ•

#ํด๋ž˜์Šค ์ƒ์„ฑ
class TestClass:
    var1 =  1

    #๊ฐ์ฒด ์ƒ์„ฑ ->์˜ ์˜๋ฏธ : None - ํ•จ์ˆ˜ ์•„๋ฌด๊ฒƒ๋„ ๋ฐ˜ํ™˜ x,  int - ์ •์ˆ˜ํ˜•์„ ๋ฐ˜ํ™˜
    # __init__ : java์—์„œ์˜ ์ƒ์„ฑ์ž
    def __init__(self) -> None:
        pass #์—†์œผ๋ฉด error

    def func1(self): #self ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ๋„ฃ์–ด์ฃผ๋ฉด java์˜ this์™€ ๋™์ผ ๊ธฐ๋Šฅ
        self.var1 = 2
        return self.var1

 

 

 

๋ฆฌ๋ทฐ ์˜ˆ์ธก ํ”„๋กœ์ ํŠธ

์ƒˆ ํŒŒ์ผ ์ƒ์„ฑ

review_project.py

colab์—์„œ ๋ฆฌ๋ทฐ ์˜ˆ์ธกํ•œ ์ฝ”๋“œ ์ฐจ๋ก€๋Œ€๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ

*์ฐธ๊ณ 

๋งค๋ฒˆ ํ”„๋กœ์ ํŠธ ํ•  ๋•Œ๋งˆ๋‹ค import ๋ฐ install ํ•ด์•ผ ํ•จ

 

ํ„ฐ๋ฏธ๋„์—์„œ ๋ช…๋ น์–ด ์‹คํ–‰  

pip install konlpy 

 

import ์ฝ”๋“œ ๊ฐ€์ ธ์™€์„œ review_project.py ์ฒ˜์Œ์— ๋ถ™์—ฌ ๋„ฃ์€ ํ›„

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re
import urllib.request
from konlpy.tag import Okt
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import load_model
from keras.preprocessing.text import tokenizer_from_json

print(1)

class ReviewPredict():
    review_model
    def __init__(self) -> None:
        pass

ํ„ฐ๋ฏธ๋„์—์„œ ๋ช…๋ น์–ด ์‹คํ–‰ํ•˜์—ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜

pip install pandas numpy matplotlib

pip install tensorflow

 

ํ„ฐ๋ฏธ๋„์—์„œ ๋ช…๋ น์–ด ์‹คํ–‰ํ•˜์—ฌ 1 ์ถœ๋ ฅ๋˜๋Š”์ง€ ํ™•์ธ

python review_predict.py

 

๋งŒ์•ฝ ์•„๋ž˜์™€ ๊ฐ™์€ ์˜ค๋ฅ˜ ๋œจ๋Š” ๊ฒฝ์šฐ

๋งจ ์•„๋ž˜์ชฝ url๋กœ ์ ‘์†

์•„ํ‚คํ…์ฒ˜ ๋‹ค์šด๋กœ๋“œ ํ›„ ๋ช…๋ น์–ด ๋‹ค์‹œ ์‹คํ–‰

 

*tip

vscode์—์„œ ํ•ด๋‹น ๋ณ€์ˆ˜๋ช… ๋ชจ๋‘ ๋ฐ”๊พธ๊ณ  ์‹ถ์„ ๋•Œ f2 ๋ˆ„๋ฅด๊ณ  ๋ณ€์ˆ˜๋ช… ์„ค์ •ํ•˜๋ฉด ๋™์ผ ๋ณ€์ˆ˜๋ช… ํ•œ๋ฒˆ์— ๋ฐ”๋€œ.

 

๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ์ž‘์—…

#๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ
class ReviewPredict():
    data = pd.DataFrame
    loaded_model = load_model("best_model.h5")
    #์ƒ์„ฑ์ž
    def __init__(self, data:pd.DataFrame, model_name) -> None:
        self.data = data
        # self.loaded_model = load_model(model_name)

    def  process_data(self): #type: ignore
        tmp_data = self.data.dropna(how = 'any') #null ๊ฐ’์ด ์กด์žฌํ•˜๋Š” ํ–‰ ์ œ๊ฑฐ
        # ํ•œ๊ธ€๊ณผ ๊ณต๋ฐฑ์„ ์ œ์™ธํ•˜๊ณ  ๋ชจ๋‘ ์ œ๊ฑฐ, \s : ๊ณต๋ฐฑ(๋นˆ ์นธ)์˜๋ฏธ
        tmp_data['document'] = tmp_data['document'].str.replace("[^ใ„ฑ-ใ…Žใ…-ใ…ฃ๊ฐ€-ํžฃ\s]", "")
        #๊ณต๋ฐฑ์œผ๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋นˆ ๊ฐ’์œผ๋กœ
        tmp_data['document'] = tmp_data['document'].str.replace('^ +', "")
        #๋นˆ ๊ฐ’์„ null ๊ฐ’์œผ๋กœ
        tmp_data['document'].replace('', np.nan, inplace=True)
        return tmp_data
    
    def sentiment_predict(self, new_sentence):
        
        okt = Okt()
        tokenizer = Tokenizer()
        stopwords = ['์˜','๊ฐ€','์ด','์€','๋“ค',
             '๋Š”','์ข€','์ž˜','๊ฑ','๊ณผ','๋„',
             '๋ฅผ','์œผ๋กœ','์ž','์—','์™€','ํ•œ','ํ•˜๋‹ค']
        max_len = 30
        new_sentence = re.sub(r'[^ใ„ฑ-ใ…Žใ…-ใ…ฃ๊ฐ€-ํžฃ ]', '', new_sentence)
        new_sentence = okt.morphs(new_sentence, stem=True) #ํ† ํฐํ™”
        new_sentence = [word for word in new_sentence if not word in stopwords] #๋ถˆ์šฉ์–ด ์ œ๊ฑฐ
        encoded = tokenizer.texts_to_sequences([new_sentence]) #์ •์ˆ˜ ์ธ์ฝ”๋”ฉ
        pad_new = pad_sequences(encoded, maxlen = max_len) #ํŒจ๋”ฉ
        score = float(self.loaded_model.predict(pad_new)) #์˜ˆ์ธก
        if(score > 0.5):
            print("{:.2f}% ํ™•๋ฅ ๋กœ ๊ธ์ • ๋ฆฌ๋ทฐ์ž…๋‹ˆ๋‹ค.\n".format(score * 100))
        else :
            print("{:.2f}% ํ™•๋ฅ ๋กœ ๋ถ€์ • ๋ฆฌ๋ทฐ์ž…๋‹ˆ๋‹ค.\n".format((1-score) *100 ))

๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

best_model.h5 ํŒŒ์ผ ํ”„๋กœ์ ํŠธ์™€ ๊ฐ™์€ ๊ฒฝ๋กœ์— ์ถ”๊ฐ€

 

app.py ํŒŒ์ผ ์ƒ์„ฑ

from flask import Flask
app = Flask(__name__)

@app.route('/test1')
def test():
    return "Hello"

if __name__ == '__main__':
    app.run(debug=True, port=5000)

flask ์„ค์น˜

ํ„ฐ๋ฏธ๋„์— ๋ช…๋ น์–ด 

pip install Flask

 

์„ค์น˜ ํ›„

pip freeze > requirements.txt ๋ช…๋ น์–ด 

> ์ƒˆ๋กœ์šด ํŒŒ์ผ ํ•˜๋‚˜ ์ƒ์„ฑ๋จ.

> ์„ค์น˜๋œ ํŒจํ‚ค์ง€ ๋ชฉ๋ก์— ๋Œ€ํ•œ ์ •๋ณด ๋งŒ๋“œ๋Š” ๊ฒƒ

๋‹ค๋ฅธ ํ”„๋กœ์ ํŠธ ์ž‘์—… ์‹œ requirements.txt ์•ˆ์— ์žˆ๋Š” ํŒจํ‚ค์ง€๋“ค์„ ๋ชจ๋‘ ์„ค์น˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์•„๋ž˜ ๋ช…๋ น์–ด๋ฅผ ์ž…๋ ฅ

pip install -r requirements.txt 

๋ช…๋ น์–ด ์น˜๋ฉด ์ด์ „์— ํ–ˆ๋˜ install ์ž‘์—… ์•ˆ ํ•ด๋„ ํ•œ๋ฒˆ์— ์„ค์น˜ ๊ฐ€๋Šฅ

 

 

- flask์—์„œ get method๋ฅผ ํ†ตํ•ด querystring ๋ฐ›์•„์˜ค๊ธฐ

from flask import Flask, request
import review_predict as rp
import os
os.environ['JAVA_HOME'] = r'C:\Program Files\Java\jdk-17\bin\server'
app = Flask(__name__)


@app.route("/test")
def test():
    sentence = request.args.get("sentence")
    return sentence

@app.route("/predict")
def predict_review_good_or_bad():
    sentence = request.args.get("sentence")
    if sentence == None: return "sentence๋ฅผ ์ž…๋ ฅํ•ด์ฃผ์„ธ์š”." #์˜ˆ์™ธ ์ฒ˜๋ฆฌ ๋ฌธ์žฅ ์ž…๋ ฅํ•˜๋ผ๋Š” return
    reviewPredict = rp.ReviewPredict()
    result = reviewPredict.sentiment_predict(sentence)
    return result

if __name__ == '__main__':
    app.run(debug=True, port=5000)

 

jvm ์˜ค๋ฅ˜ ๋ฐœ์ƒ ์‹œ

ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ • ์•„๋ž˜ ๊ฒฝ๋กœ ๋ณต์‚ฌ

์‹œ์Šคํ…œ ์†์„ฑ์—์„œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •

์‹œ์Šคํ…œ ๋ณ€์ˆ˜์—์„œ ์ƒˆ๋กœ ๋งŒ๋“ค๊ธฐํ•ด์„œ ๊ฒฝ๋กœ ์ถ”๊ฐ€ ํ›„ vscode ์žฌ์‹คํ–‰

์œ„ ์„ค์ •ํ•ด๋„ ์•ˆ ๋  ๊ฒฝ์šฐ ์•„๋ž˜ ์ฝ”๋“œ ์ถ”๊ฐ€

import os
os.environ['JAVA_HOME'] = r'C:\Program Files\Java\jdk-17\bin\server'
print('JAVA_HOME' in os.environ)
์ฝ”๋“œ ์ณค์„ ๋•Œ Ture ์ถœ๋ ฅ๋ผ์•ผ ์ œ๋Œ€๋กœ ์„ค์ •๋œ ๊ฒƒ!

 

tokenizer.json ํŒŒ์ผ vscode์— ์ถ”๊ฐ€

json ํ™•์žฅํŒฉ ์„ค์น˜

 

์„ค์น˜ ํ›„ ์„ค์ •์ฐฝ์—์„œ json-zain ๊ฒ€์ƒ‰ ํ›„ ์•„๋ž˜ ํ•ญ๋ชฉ ์ฒดํฌ

์ฝ”๋“œ ์ˆ˜์ •

#๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ
class ReviewPredict():
    data:pd.DataFrame
    loaded_model = load_model("best_model.h5")
    tokenizer = None

    #์ƒ์„ฑ์ž
    def __init__(self) -> None:
        with open('tokenizer.json') as f:
            data = json.load(f)
            self.tokenizer = tokenizer_from_json(data)

        pass
        # self.data = data
        # self.loaded_model = load_model(model_name)
        

    def  process_data(self): #type: ignore
        tmp_data = self.data.dropna(how = 'any') #null ๊ฐ’์ด ์กด์žฌํ•˜๋Š” ํ–‰ ์ œ๊ฑฐ
        # ํ•œ๊ธ€๊ณผ ๊ณต๋ฐฑ์„ ์ œ์™ธํ•˜๊ณ  ๋ชจ๋‘ ์ œ๊ฑฐ, \s : ๊ณต๋ฐฑ(๋นˆ ์นธ)์˜๋ฏธ
        tmp_data['document'] = tmp_data['document'].str.replace("[^ใ„ฑ-ใ…Žใ…-ใ…ฃ๊ฐ€-ํžฃ\s]", "")
        #๊ณต๋ฐฑ์œผ๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋นˆ ๊ฐ’์œผ๋กœ
        tmp_data['document'] = tmp_data['document'].str.replace('^ +', "")
        #๋นˆ ๊ฐ’์„ null ๊ฐ’์œผ๋กœ
        tmp_data['document'].replace('', np.nan, inplace=True)
        return tmp_data
    
    def sentiment_predict(self, new_sentence):
        okt = Okt()
        # tokenizer = Tokenizer()
        stopwords = ['์˜','๊ฐ€','์ด','์€','๋“ค',
             '๋Š”','์ข€','์ž˜','๊ฑ','๊ณผ','๋„',
             '๋ฅผ','์œผ๋กœ','์ž','์—','์™€','ํ•œ','ํ•˜๋‹ค']
        max_len = 30
        new_sentence = re.sub(r'[^ใ„ฑ-ใ…Žใ…-ใ…ฃ๊ฐ€-ํžฃ ]', '', new_sentence)
        new_sentence = okt.morphs(new_sentence, stem=True) #ํ† ํฐํ™”
        new_sentence = [word for word in new_sentence if not word in stopwords] #๋ถˆ์šฉ์–ด ์ œ๊ฑฐ
        encoded = self.tokenizer.texts_to_sequences([new_sentence]) #์ •์ˆ˜ ์ธ์ฝ”๋”ฉ
        pad_new = pad_sequences(encoded, maxlen = max_len) #ํŒจ๋”ฉ
        score = float(self.loaded_model.predict(pad_new)) #์˜ˆ์ธก
        if(score > 0.5):
            return "{:.2f}% ํ™•๋ฅ ๋กœ ๊ธ์ • ๋ฆฌ๋ทฐ์ž…๋‹ˆ๋‹ค.\n".format(score * 100)
        else :
            return "{:.2f}% ํ™•๋ฅ ๋กœ ๋ถ€์ • ๋ฆฌ๋ทฐ์ž…๋‹ˆ๋‹ค.\n".format((1-score) *100 )

git bash๋กœ ๋“ค์–ด์™€์„œ ํ„ฐ๋ฏธ๋„์—

python app.py ์‹คํ–‰

 

ํ•ด๋‹น ํฌํŠธ๋กœ ์ ‘์†

url+/๊ฒฝ๋กœ?sentence=๊ธ/๋ถ€์ • ํ™•์ธํ•  ๋ฌธ์žฅ์น˜๋ฉด ์•„๋ž˜์ฒ˜๋Ÿผ ๋‚˜์˜ด!