在Hugging Face Hub上 已经有了许多的文本摘要预训练模型,但是对于一些特定领域,还是需要重新训练或微调的。本文主要训练一个双语文本摘要模型(双语是指英语和西班牙语)。可以访问如下链接model 试下模型效果。



双语语料数据集使用链接Multilingual Amazon Reviews Corpus-多语言Amazon评论语料数据集,来训练我们的摘要生成器。该数据集包含6种语言Amanzon网购产品评论,同时也是多语言摘要模型的标准评估数据集。因为每个产品评论都对应一个短标题,因此可以将短标题作为摘要模型的标签。首先下载英文和西班牙文子集,如下。

from datasets import load_dataset ​ spanish_dataset = load_dataset("amazon_reviews_multi", "es") english_dataset = load_dataset("amazon_reviews_multi", "en") english_dataset DatasetDict({ train: Dataset({ features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'], num_rows: 200000 }) validation: Dataset({ features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'], num_rows: 5000 }) test: Dataset({ features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'], num_rows: 5000 }) })


def show_samples(dataset, num_samples=3, seed=42): sample = dataset["train"].shuffle(seed=seed).select(range(num_samples)) for example in sample: print(f"\n'>> Title: {example['review_title']}'") print(f"'>> Review: {example['review_body']}'") ​ ​ show_samples(english_dataset) '>> Title: Worked in front position, not rear' '>> Review: 3 stars because these are not rear brakes as stated in the item description. At least the mount adapter only worked on the front fork of the bike that I got it for.' ​ '>> Title: meh' '>> Review: Does it’s job and it’s gorgeous but mine is falling apart, I had to basically put it together again with hot glue' ​ '>> Title: Can\'t beat these for the money' '>> Review: Bought this for handling miscellaneous aircraft parts and hanger "stuff" that I needed to organize; it really fit the bill. The unit arrived quickly, was well packaged and arrived intact (always a good sign). There are five wall mounts-- three on the top and two on the bottom. I wanted to mount it on the wall, so all I had to do was to remove the top two layers of plastic drawers, as well as the bottom corner drawers, place it when I wanted and mark it; I then used some of the new plastic screw in wall anchors (the 50 pound variety) and it easily mounted to the wall. Some have remarked that they wanted dividers for the drawers, and that they made those. Good idea. My application was that I needed something that I can see the contents at about eye level, so I wanted the fuller-sized drawers. I also like that these are the new plastic that doesn\'t get brittle and split like my older plastic drawers did. I like the all-plastic construction. It\'s heavy duty enough to hold metal parts, but being made of plastic it\'s not as heavy as a metal frame, so you can easily mount it to the wall and still load it up with heavy stuff, or light stuff. No problem there. For the money, you can\'t beat it. Best one of these I\'ve bought to date-- and I\'ve been using some version of these for over forty years.'通过调整Dataset.shuffle()函数的种子,可以看下语料库中的其他例子。如果你的母语是西班牙语,可以看下spanish_dataset中的部分评论,来看看数据集的具体情况,主要是看下title和body是否是摘要关系。

对数据进行探索性分析后,可以知道这些评论是典型的网络评论,有正面的,也有负面的(当然还有其他的中兴评论)。上面的第二个例子,确实有点鸡贼,"meh"好像并不能表示下面body的主题思想。同其他的例子貌似还可以(有时候可以做标签纠正,成本很高)。在单GPU上训练400,000个评论数据是非常耗时的(在A100 80G版本上呢?),为了节省时间,这里值关注一个特定产品领域的语料。为了选择这个领域,我们将englist_dataset转换为pandas.DataFrame格式,并且按照产品领域来统计数据量,如下。

english_dataset.set_format("pandas") english_df = english_dataset["train"][:] # Show counts for top 20 products english_df["product_category"].value_counts()[:20] home 17679 apparel 15951 wireless 15717 other 13418 beauty 12091 drugstore 11730 kitchen 10382 toy 8745 sports 8277 automotive 7506 lawn_and_garden 7327 home_improvement 7136 pet_products 7082 digital_ebook_purchase 6749 pc 6401 electronics 6186 office_product 5521 shoes 5197 grocery 4730 book 3756 Name: product_category, dtype: int64


def filter_books(example): return ( example["product_category"] == "book" or example["product_category"] == "digital_ebook_purchase" )




spanish_books = spanish_dataset.filter(filter_books) english_books = english_dataset.filter(filter_books) show_samples(english_books) '>> Title: I\'m dissapointed.' '>> Review: I guess I had higher expectations for this book from the reviews. I really thought I\'d at least like it. The plot idea was great. I loved Ash but, it just didnt go anywhere. Most of the book was about their radio show and talking to callers. I wanted the author to dig deeper so we could really get to know the characters. All we know about Grace is that she is attractive looking, Latino and is kind of a brat. I\'m dissapointed.' ​ '>> Title: Good art, good price, poor design' '>> Review: I had gotten the DC Vintage calendar the past two years, but it was on backorder forever this year and I saw they had shrunk the dimensions for no good reason. This one has good art choices but the design has the fold going through the picture, so it\'s less aesthetically pleasing, especially if you want to keep a picture to hang. For the price, a good calendar' ​ '>> Title: Helpful' '>> Review: Nearly all the tips useful and. I consider myself an intermediate to advanced user of OneNote. I would highly recommend.'


from datasets import concatenate_datasets, DatasetDict ​ books_dataset = DatasetDict() ​ for split in english_books.keys(): books_dataset[split] = concatenate_datasets( [english_books[split], spanish_books[split]] ) books_dataset[split] = books_dataset[split].shuffle(seed=42) ​ # 选择少量例子,进行查看 show_samples(books_dataset) '>> Title: Easy to follow!!!!' '>> Review: I loved The dash diet weight loss Solution. Never hungry. I would recommend this diet. Also the menus are well rounded. Try it. Has lots of the information need thanks.' ​ '>> Title: PARCIALMENTE DAÑADO' '>> Review: Me llegó el día que tocaba, junto a otros libros que pedí, pero la caja llegó en mal estado lo cual dañó las esquinas de los libros porque venían sin protección (forro).' ​ '>> Title: no lo he podido descargar' '>> Review: igual que el anterior'



books_dataset = books_dataset.filter(lambda x: len(x["review_title"].split()) > 2)




Transformer 模型描述是否支持多语言?GPT-2尽管GPT2是自回归模型,但是也可以使用该模型生成摘要(在所有的输入句子后加上"TL;DR",这个算是prompt吧,或者咒语)❌PEGASUS该模型在多句文本中掩盖掉部分句子,然后使用模型预测这些句子。该模型的预训练任务与摘要任务的形式非常接近,并且一般在摘要任务该类模型的效果也是最好的。❌T5该模型使用文本到文本的架构统一了所有的NLP任务。例如对于摘要任务,在输入文本之前加上summarize,比如summarize: ARTICLE。❌mT5T5模型的多语言版本,在多个常用Crawl语料库上进行预训练,覆盖101种语言。✅BART一个原生Transformer架构(包含编码器和解码器),预训练任务综合了BERT和GPT-2的预训练任务,既有掩码预测也有自回归预测。❌mBART-50BART的多语言版本,在50个语言上进行了预训练。✅






from transformers import AutoTokenizer ​ model_checkpoint = "google/mt5-small" tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)



inputs = tokenizer("I loved reading the Hunger Games!") inputs {'input_ids': [336, 259, 28387, 11807, 287, 62893, 295, 12507, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}

好吧,从结果看,有input_ids和attention_mask分别是分词后的token id和掩码。让我们使用分词器的函数convert_ids_to_tokens()将token id转换为对应的token,如下。

tokenizer.convert_ids_to_tokens(inputs.input_ids) ['▁I', '▁', 'loved', '▁reading', '▁the', '▁Hung', 'er', '▁Games', '']



max_input_length = 512 max_target_length = 30 ​ ​ def preprocess_function(examples): model_inputs = tokenizer( examples["review_body"], max_length=max_input_length, truncation=True ) # Set up the tokenizer for targets with tokenizer.as_target_tokenizer(): labels = tokenizer( examples["review_title"], max_length=max_target_length, truncation=True ) ​ model_inputs["labels"] = labels["input_ids"] return model_inputs



tokenized_datasets =, batched=True)

上面代码运行一般会花掉一些时间,与数据集时间有关。下面我们看下在摘要任务中常用的指标。目前来看,很难判断一个机器生成的文本优劣(RLHF,训练一个Reward Model来判断生成文本的优劣,看来能大白魔法的只有魔法)。

注意上面函数中设置参数batched=True,这样就会按照批次1000的大小来处理数据,并且使用fast tokenizer的多线程处理能力(基于Rust,所以有多线程。如果基于python,那就只有单线程)。如果硬件满足要求的话,可以在预处理的时候都使用批处理(当然也可以使用多进程进一步加速)。


和本教程中的大多数任务相比,很难直接评估文本生成(例如摘要和翻译)任务的效果。例如给定一个评论文本“I loved reading the Hunger Games”,会存在多个合理的摘要结果,例如“I loved the Hunger Games” 或 “Hunger Games is a great read”。直接文本匹配生成的摘要和标签,对于评估生成的摘要的效果不是一个很好的选择(类似RLHF,训练Reward Model来评估,应该更合理)。就算是人来评价生成摘要的优劣,依然是很困难的,因为不同的人有自己不同的写作风格(一千个人,就有一千个哈姆雷特)。

对于摘要任务,常用的指标称为ROUGE score,该指标也是比对n-gram的情况,和BLEU的区别在于其是计算召回率以及n-gram无需连续。为了测试下该指标,我们使用如下例子。

generated_summary = "I absolutely loved reading the Hunger Games" reference_summary = "I loved reading the Hunger Games"




对比上面的例子而言,使用上面的召回率公式可得,其召回率为6/6=100%;这是因为所有的预测文本包含所有的标签。看起来还不错,但是如果我们生成的文本是 “I really really loved reading the Hunger Games all night”。这个生成文本也能得到100%的召回率,但是这个结果肯定是不好的。对于这种情况,我们还需要计算精确率,在ROUGE评价指标中,用来计算生成文本和标签的相关性。计算公式如下:

计算上面例子的精确率可得,6/10 = 60%,这个精确率小于最开始的例子中的6/7 = 86%(优劣明显呀!)。实际上计算精确率和召回率,然后用来计算F1得分(精确率和召回率的几何平均)。使用rouge_score的python包很容易完成F1得分的计算。(在国内可以加上 -i --default-timeout=2000)

!pip install rouge_score


import evaluate ​ rouge_score = evaluate.load("rouge")


scores = rouge_score.compute( predictions=[generated_summary], references=[reference_summary] ) scores {'rouge1': AggregateScore(low=Score(precision=0.86, recall=1.0, fmeasure=0.92), mid=Score(precision=0.86, recall=1.0, fmeasure=0.92), high=Score(precision=0.86, recall=1.0, fmeasure=0.92)), 'rouge2': AggregateScore(low=Score(precision=0.67, recall=0.8, fmeasure=0.73), mid=Score(precision=0.67, recall=0.8, fmeasure=0.73), high=Score(precision=0.67, recall=0.8, fmeasure=0.73)), 'rougeL': AggregateScore(low=Score(precision=0.86, recall=1.0, fmeasure=0.92), mid=Score(precision=0.86, recall=1.0, fmeasure=0.92), high=Score(precision=0.86, recall=1.0, fmeasure=0.92)), 'rougeLsum': AggregateScore(low=Score(precision=0.86, recall=1.0, fmeasure=0.92), mid=Score(precision=0.86, recall=1.0, fmeasure=0.92), high=Score(precision=0.86, recall=1.0, fmeasure=0.92))}


scores["rouge1"].mid Score(precision=0.86, recall=1.0, fmeasure=0.92)





!pip install nltk


import nltk ​"punkt")


from nltk.tokenize import sent_tokenize ​ ​ def three_sentence_summary(text): return "\n".join(sent_tokenize(text)[:3]) ​ ​ print(three_sentence_summary(books_dataset["train"][1]["review_body"])) 'I grew up reading Koontz, and years ago, I stopped,convinced i had "outgrown" him.' 'Still,when a friend was looking for something suspenseful too read, I suggested Koontz.' 'She found Strangers.'


def evaluate_baseline(dataset, metric): summaries = [three_sentence_summary(text) for text in dataset["review_body"]] return metric.compute(predictions=summaries, references=dataset["review_title"])


import pandas as pd ​ score = evaluate_baseline(books_dataset["validation"], rouge_score) rouge_names = ["rouge1", "rouge2", "rougeL", "rougeLsum"] rouge_dict = dict((rn, round(score[rn].mid.fmeasure * 100, 2)) for rn in rouge_names) rouge_dict {'rouge1': 16.74, 'rouge2': 8.83, 'rougeL': 15.6, 'rougeLsum': 15.96}




from transformers import AutoModelForSeq2SeqLM ​ model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)前面在加载BERT的checkpoint时会出现警告,这是因为BERT的checkpoint有部分权重没有被加载,所以有警告。而对于上面的mT5模型,由于其包含了所有seq2seq模型的权重,因此不会报出警告。主要是模型head部分的区别。from huggingface_hub import notebook_login ​ notebook_login()


huggingface-cli login


from transformers import Seq2SeqTrainingArguments ​ batch_size = 8 num_train_epochs = 8 # Show the training loss with every epoch logging_steps = len(tokenized_datasets["train"]) // batch_size model_name = model_checkpoint.split("/")[-1] ​ args = Seq2SeqTrainingArguments( output_dir=f"{model_name}-finetuned-amazon-en-es", evaluation_strategy="epoch", learning_rate=5.6e-5, per_device_train_batch_size=batch_size, per_device_eval_batch_size=batch_size, weight_decay=0.01, save_total_limit=3, num_train_epochs=num_train_epochs, predict_with_generate=True, logging_steps=logging_steps, push_to_hub=True, )


设置参数push_to_hub=True允许我们在训练后将模型上传到Hub(当然过程是后台自动完成,多么人性化);并且默认使用output_dir的名称来命名该repo的名称,当然也可以通过设置参数hub_model_id来设置repo名称(也可以指定组织名称或用户名称)。例如,当需要将模型上传到huggingface-course organization,可以在Seq2SeqTrainingArguments中设置参数hub_model_id="huggingface-course/mt5-finetuned-amazon-en-es"。


import numpy as np ​ ​ def compute_metrics(eval_pred): predictions, labels = eval_pred # Decode generated summaries into text decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True) # Replace -100 in the labels as we can't decode them labels = np.where(labels != -100, labels, tokenizer.pad_token_id) # Decode reference summaries into text decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True) # ROUGE expects a newline after each sentence decoded_preds = ["\n".join(sent_tokenize(pred.strip())) for pred in decoded_preds] decoded_labels = ["\n".join(sent_tokenize(label.strip())) for label in decoded_labels] # Compute ROUGE scores result = rouge_score.compute( predictions=decoded_preds, references=decoded_labels, use_stemmer=True ) # Extract the median scores result = {key: value.mid.fmeasure * 100 for key, value in result.items()} return {k: round(v, 4) for k, v in result.items()}

接下来,我们设置序列到序列任务的data collator(数据预处理器)。因为mT5是一个编码器-解码器模型,在准备解码输入和输出时需要做一些处理(将解码输入右移,得到解码输出)。解码部分的标签与常规语言模型中的处理方法类似(GPT类常用,在BART中也经常使用)。

Huggingface也很贴心的设置好了DataCollatorForSeq2Seq来完成动态补齐输入和标签的功能。实例化该data collator,需要提供tokenizer和model参数,如下。

from transformers import DataCollatorForSeq2Seq ​ data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

用一些例子看看该data collator做了什么。首先我们需要移除一些列,因为它们不需要进行padding操作(不设置的话会报错)。

tokenized_datasets = tokenized_datasets.remove_columns( books_dataset["train"].column_names )

因为collator期望输入为一个字典列表,每个dict表示数据集中一个单独的例子,我们也希望在data collator处理之前,将数据处理为字典列表。

features = [tokenized_datasets["train"][i] for i in range(2)] data_collator(features) {'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]), 'input_ids': tensor([[ 1494, 259, 8622, 390, 259, 262, 2316, 3435, 955, 772, 281, 772, 1617, 263, 305, 14701, 260, 1385, 3031, 259, 24146, 332, 1037, 259, 43906, 305, 336, 260, 1, 0, 0, 0, 0, 0, 0], [ 259, 27531, 13483, 259, 7505, 260, 112240, 15192, 305, 53198, 276, 259, 74060, 263, 260, 459, 25640, 776, 2119, 336, 259, 2220, 259, 18896, 288, 4906, 288, 1037, 3931, 260, 7083, 101476, 1143, 260, 1]]), 'labels': tensor([[ 7483, 259, 2364, 15695, 1, -100], [ 259, 27531, 13483, 259, 7505, 1]]), 'decoder_input_ids': tensor([[ 0, 7483, 259, 2364, 15695, 1], [ 0, 259, 27531, 13483, 259, 7505]])}



from transformers import Seq2SeqTrainer ​ trainer = Seq2SeqTrainer( model, args, train_dataset=tokenized_datasets["train"], eval_dataset=tokenized_datasets["validation"], data_collator=data_collator, tokenizer=tokenizer, compute_metrics=compute_metrics, )




trainer.evaluate() {'eval_loss': 3.028524398803711, 'eval_rouge1': 16.9728, 'eval_rouge2': 8.2969, 'eval_rougeL': 16.8366, 'eval_rougeLsum': 16.851, 'eval_gen_len': 10.1597, 'eval_runtime': 6.1054, 'eval_samples_per_second': 38.982, 'eval_steps_per_second': 4.914}


trainer.push_to_hub(commit_message="Training complete", tags="summarization") ''









model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

然后初始化data collator,并定义我们的dataloader。

from import DataLoader ​ batch_size = 8 train_dataloader = DataLoader( tokenized_datasets["train"], shuffle=True, collate_fn=data_collator, batch_size=batch_size, ) eval_dataloader = DataLoader( tokenized_datasets["validation"], collate_fn=data_collator, batch_size=batch_size )


from torch.optim import AdamW ​ optimizer = AdamW(model.parameters(), lr=2e-5)


from accelerate import Accelerator ​ accelerator = Accelerator() model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare( model, optimizer, train_dataloader, eval_dataloader )





from transformers import get_scheduler ​ num_train_epochs = 10 num_update_steps_per_epoch = len(train_dataloader) num_training_steps = num_train_epochs * num_update_steps_per_epoch ​ lr_scheduler = get_scheduler( "linear", optimizer=optimizer, num_warmup_steps=0, num_training_steps=num_training_steps, )


def postprocess_text(preds, labels): preds = [pred.strip() for pred in preds] labels = [label.strip() for label in labels] ​ # ROUGE expects a newline after each sentence preds = ["\n".join(nltk.sent_tokenize(pred)) for pred in preds] labels = ["\n".join(nltk.sent_tokenize(label)) for label in labels] ​ return preds, labels



from huggingface_hub import get_full_repo_name ​ model_name = "test-bert-finetuned-squad-accelerate" repo_name = get_full_repo_name(model_name) repo_name 'lewtun/mt5-finetuned-amazon-en-es-accelerate'


from huggingface_hub import Repository ​ output_dir = "results-mt5-finetuned-squad-accelerate" repo = Repository(output_dir, clone_from=repo_name)






from import tqdm import torch import numpy as np ​ progress_bar = tqdm(range(num_training_steps)) ​ for epoch in range(num_train_epochs): # 训练 model.train() for step, batch in enumerate(train_dataloader): outputs = model(**batch) loss = outputs.loss accelerator.backward(loss) ​ optimizer.step() lr_scheduler.step() optimizer.zero_grad() progress_bar.update(1) ​ # 评估 model.eval() for step, batch in enumerate(eval_dataloader): with torch.no_grad(): generated_tokens = accelerator.unwrap_model(model).generate( batch["input_ids"], attention_mask=batch["attention_mask"], ) ​ generated_tokens = accelerator.pad_across_processes( generated_tokens, dim=1, pad_index=tokenizer.pad_token_id ) labels = batch["labels"] ​ # If we did not pad to max length, we need to pad the labels too labels = accelerator.pad_across_processes( batch["labels"], dim=1, pad_index=tokenizer.pad_token_id ) ​ generated_tokens = accelerator.gather(generated_tokens).cpu().numpy() labels = accelerator.gather(labels).cpu().numpy() ​ # Replace -100 in the labels as we can't decode them labels = np.where(labels != -100, labels, tokenizer.pad_token_id) if isinstance(generated_tokens, tuple): generated_tokens = generated_tokens[0] decoded_preds = tokenizer.batch_decode( generated_tokens, skip_special_tokens=True ) decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True) ​ decoded_preds, decoded_labels = postprocess_text( decoded_preds, decoded_labels ) ​ rouge_score.add_batch(predictions=decoded_preds, references=decoded_labels) ​ # Compute metrics result = rouge_score.compute() # Extract the median ROUGE scores result = {key: value.mid.fmeasure * 100 for key, value in result.items()} result = {k: round(v, 4) for k, v in result.items()} print(f"Epoch {epoch}:", result) ​ # Save and upload accelerator.wait_for_everyone() unwrapped_model = accelerator.unwrap_model(model) unwrapped_model.save_pretrained(output_dir, if accelerator.is_main_process: tokenizer.save_pretrained(output_dir) repo.push_to_hub( commit_message=f"Training in progress epoch {epoch}", blocking=False ) Epoch 0: {'rouge1': 5.6351, 'rouge2': 1.1625, 'rougeL': 5.4866, 'rougeLsum': 5.5005} Epoch 1: {'rouge1': 9.8646, 'rouge2': 3.4106, 'rougeL': 9.9439, 'rougeLsum': 9.9306} Epoch 2: {'rouge1': 11.0872, 'rouge2': 3.3273, 'rougeL': 11.0508, 'rougeLsum': 10.9468} Epoch 3: {'rouge1': 11.8587, 'rouge2': 4.8167, 'rougeL': 11.7986, 'rougeLsum': 11.7518} Epoch 4: {'rouge1': 12.9842, 'rouge2': 5.5887, 'rougeL': 12.7546, 'rougeLsum': 12.7029} Epoch 5: {'rouge1': 13.4628, 'rouge2': 6.4598, 'rougeL': 13.312, 'rougeLsum': 13.2913} Epoch 6: {'rouge1': 12.9131, 'rouge2': 5.8914, 'rougeL': 12.6896, 'rougeLsum': 12.5701} Epoch 7: {'rouge1': 13.3079, 'rouge2': 6.2994, 'rougeL': 13.1536, 'rougeLsum': 13.1194} Epoch 8: {'rouge1': 13.96, 'rouge2': 6.5998, 'rougeL': 13.9123, 'rougeLsum': 13.7744} Epoch 9: {'rouge1': 14.1192, 'rouge2': 7.0059, 'rougeL': 14.1172, 'rougeLsum': 13.9509}




from transformers import pipeline ​ hub_model_id = "huggingface-course/mt5-small-finetuned-amazon-en-es" summarizer = pipeline("summarization", model=hub_model_id)


def print_summary(idx): review = books_dataset["test"][idx]["review_body"] title = books_dataset["test"][idx]["review_title"] summary = summarizer(books_dataset["test"][idx]["review_body"])[0]["summary_text"] print(f"'>>> Review: {review}'") print(f"\n'>>> Title: {title}'") print(f"\n'>>> Summary: {summary}'")


print_summary(100) '>>> Review: Nothing special at all about this product... the book is too small and stiff and hard to write in. The huge sticker on the back doesn’t come off and looks super tacky. I would not purchase this again. I could have just bought a journal from the dollar store and it would be basically the same thing. It’s also really expensive for what it is.' ​ '>>> Title: Not impressed at all... buy something else' ​ '>>> Summary: Nothing special at all about this product'


print_summary(0) '>>> Review: Es una trilogia que se hace muy facil de leer. Me ha gustado, no me esperaba el final para nada' ​ '>>> Title: Buena literatura para adolescentes' ​ '>>> Summary: Muy facil de leer'






