A Content Generator that utilizes an LLM and internet scraping to dynamically generate well-structured documents of varying lengths, rich in information and traditionally partitioned.
Here is the Introduction from the Sample Essay Provided:
This Introduction was Generated by the Llama3.2 3B Model, using the ContentGenerator as a framework.
This Generator was created for Educational and Scientific Purposes, and by no means should be used with intentions of Cheating or as a Professional Tool.
Keep in mind that if this Generator was running on external, paid-per-token based models, a pretty penny will have to be spent in order to obtain proper results... Especially if the requested Content is of Large Size.
The ContentGenerator was tested using a locally hosted Meta Llama3.2 3B Model with ollama. Note that your mileage may vary when using a different Large Language Model.
The ContentGenerator is not fully complete yet, as it only has an Essay Generator Available. Do expect a Story Generator to come out any time soon.
Setting up the settings.json File:
Despite being simple, the settings file is quite important.
{
"LANGUAGE_MODEL": {
"OPENAI": true,
"MODEL": "---",
"BASE_URL": "---",
"API_KEY": "?"
},
"DOCUMENT_SETTINGS": {
"WATERMARK_PDF_LOCATION": "---"
}
}
The moment you import and run the package without having a settings file saved, a settings.json file will be created and an exception will be raised telling you to fill it up.
The most important part is setting up the LLM part... If you wish to use an OpenAI Model, Make sure to set OPENAI
to true
and choosing what model you want to use and don't forget your API Key. Do NOT fill in the BASE_URL
value as that is pre-set within OpenAI package.
If you wish to use a custom/locally hosted LLM, then set OPENAI
to false
and specify which model you want to use and the BASE_URL
where the requests to the LLM will be made.
I used ollama for my testing, so that means my settings.json looked like this:
{
"LANGUAGE_MODEL": {
"OPENAI": false,
"MODEL": "llama3.2",
"BASE_URL": "http://localhost:11434/v1",
"API_KEY": "?"
},
"DOCUMENT_SETTINGS": {
"WATERMARK_PDF_LOCATION": "WatermarkTemplate.pdf"
}
}
Let's start with the Essay Section, we import the class that manages Essays.
from ContentGenerators.Essay import Document
The Essay itself has multiple parameters that determine the size of the document and the information that it contains. The Essay Class already comes with pre-set values to avoid possible crashes due to human error.
# These default values result in the Generation of a Generic Essay That talks about "The Evolution of Artificial Intelligence".
def __init__(self,
id=None,
path=".", #
topic="The Evolution of Artificial Intelligence.",
commands="",
paragraphs=5,
wordcount=1000,
pictures=False,
language="en",
search_language="en",
citation_format="MLA"
):
...
In order to use the Essay Generator and receive a Human-readable Essay, we can use this code right here:
from ContentGenerators.Essay import Document
Essay = Document(pictures=True, topic="World War II")
Essay.Generate()
Essay.Assemble(watermark=True)
The Essay.Assemble()
Function simply packages all the Information such as Paragraphs, citations, and pictures generated using the Essay.Generate()
Function, inside a Final Document, that document can either be a DOCX or PDF file. After all the functions have been executed successfully, the document should appear in your ./
path as essay_XXXXXXXXXX.(docx/pdf)
. From there, you can access the final result Generated.
Keep in mind that you need Microsoft Office or LibreOffice installed on your device depending on your Operating System, thats ONLY if you are trying to convert your document to a pdf. You do NOT need any type of Document management software/suite for just generating a DOCX file as that is handled inside the imported python library.
Speaking of python libraries, make sure you install all of the required packages using the requirements.txt file provided.
pip install -r requirements.txt
Expect More Optimisations, New Updates, and Generation Features in the future!
As we saw in the previous section, the essay generator is composed of 2 significant pieces, the Content Generator and the Content Assembler.
Let's start with the Content Generator, these are the main parameters of the Essay Generator function along with their default values:
id=None,
path=".", #
topic="The Evolution of Artificial Intelligence.",
commands="",
paragraphs=5,
wordcount=1000,
pictures=False,
language="en",
search_language="en",
citation_format="MLA"
This parameter was originally created for handling a load of essays and storing them in a database, basically preserve uniqueness in essay filenames in order to easily server them to the userbase.
The value of this parameter can be as simple as a random string:
Essay = Document(id="JSH3UI43H29RIJS83")
This parameter is just for setting where the final product, the essay, will be stored. Its originally set to "."
which means that the essay will be generated in the same directory the project exists in.
If you want to place the document in a separate folder in your project, you can easily do so by just specifying in the parameter:
Essay = Document(path="my_folder")
This is the most important parameter in the Generate Function as it determines what the requested Essay will be talking about, it's used to generate ideas, pictures, and most importantly paragraphs.
The value of this parameter can be set this way:
Essay = Document(topic="The Cold War.")
This parameter can be used to alter the information that the LLM Generator, examples of usage can include bias as in siding with a specific side of a conflict, focusing more on one side of a topic rather than broadly indulging in it, or using a specific way of writing/language.
You can tell the LLM what to do like this:
Essay = Document(commands="Side with the Soviet Union on this one, and describe how their ideology is right. Use extremely Complicated and Detailed Language.")
or (for neutrality's sake):
Essay = Document(commands="Side with the Americans on this one, and describe how their ideology is right. Use extremely simple and broad Language.")
This parameter determines the overall size of the essay itself, the word count is heavily dependent on the amount of paragraphs due to the reset of token generation in between each paragraph, which is very handy when using an online model because of stricter generation limits.
Let's say we need 8 paragraphs in this Essay:
Essay = Document(paragraphs=8)
This parameters determines the size of each paragraph in the essay using a words-per-paragraph equation, which basically determines how many words each paragraph will contain.
words_per_paragraph = word_count / paragraph_count
# The value of this variable is then fed into the paragraph generating prompt.
Let's say we will need a 2000-word essay:
Essay = Document(wordcount=2000)
This parameter just determines whether the essay will contain pictures or not.
Let's say we do want pictures in this essay:
Essay = Document(pictures=True)
This parameter decides what language the essay will be written in.
Here is a list of all the language you can generate your Essay in:
English - en | Afrikaans - af | Albanian - sq | Amharic - am | Arabic - ar | Armenian - hy |
---|---|---|---|---|---|
Assamese - as | Aymara - ay | Azerbaijani - az | Bambara - bm | Basque - eu | Belarusian - be |
Bengali - bn | Bhojpuri - bho | Bosnian - bs | Bulgarian - bg | Catalan - ca | Cebuano - ceb |
Chichewa - ny | Chinese - zh-CN | Corsican - co | Croatian - hr | Czech - cs | Danish - da |
Dhivehi - dv | Dogri - doi | Dutch - nl | Esperanto - eo | Estonian - et | Ewe - ee |
Filipino - tl | Finnish - fi | French - fr | Frisian - fy | Galician - gl | Georgian - ka |
German - de | Greek - el | Guarani - gn | Gujarati - gu | Haitian - ht | Hausa - ha |
Hawaiian - haw | Hebrew - iw | Hindi - hi | Hmong - hmn | Hungarian - hu | Icelandic - is |
Igbo - ig | Ilocano - ilo | Indonesian - id | Irish - ga | Italian - it | Japanese - ja |
Javanese - jw | Kannada - kn | Kazakh - kk | Khmer - km | Kinyarwanda - rw | Konkani - gom |
Korean - ko | Krio - kri | Kurdish (K) - ku | Kurdish (S) - ckb | Kyrgyz - ky | Lao - lo |
Latin - la | Latvian - lv | Lingala - ln | Lithuanian - lt | Luganda - lg | Luxembourgish - lb |
Macedonian - mk | Maithili - mai | Malagasy - mg | Malay - ms | Malayalam - ml | Maltese - mt |
Maori - mi | Marathi - mr | Meiteilon - mni-Mtei | Mizo - lus | Mongolian - mn | Burmese - my |
Nepali - ne | Norwegian - no | Odia - or | Oromo - om | Pashto - ps | Persian - fa |
Polish - pl | Portuguese - pt | Punjabi - pa | Quechua - qu | Romanian - ro | Russian - ru |
Samoan - sm | Sanskrit - sa | Scots Gaelic - gd | Sepedi - nso | Serbian - sr | Sesotho - st |
Shona - sn | Sindhi - sd | Sinhala - si | Slovak - sk | Slovenian - sl | Somali - so |
Spanish - es | Sundanese - su | Swahili - sw | Swedish - sv | Tajik - tg | Tamil - ta |
Tatar - tt | Telugu - te | Thai - th | Tigrinya - ti | Tsonga - ts | Turkish - tr |
Turkmen - tk | Twi - ak | Ukrainian - uk | Urdu - ur | Uyghur - ug | Uzbek - uz |
Vietnamese - vi | Welsh - cy | Xhosa - xh | Yiddish - yi | Yoruba - yo | Zulu - zu |
Let's just Generate the Essay in English:
Essay = Document(language="en")
This parameter determines what language will be used in the search for information process (internet scraping). It's quite important for region specific topics that might only be available in a specific language.
Since we are talking about The Cold War which is widely covered in the english language, lets just stick to english:
Essay = Document(search_language="en")
This parameter just dictates the format of citations inserted at the end of the essay, quite important for organisation with specific research requirements. Here are examples of Citation Formats:
- APA Citation:
Grady, J. S., Her, M., Moreno, G., Perez, C., & Yelinek, J. (2019). Emotions in storybooks: A comparison of storybooks that represent ethnic and racial groups in the United States. Psychology of Popular Media Culture, 8(3), 207–217. https://doi.org/10.1037/ppm0000185
- MLA Citation:
Del Castillo, Inigo. "How Not to Kill Your Houseplants, According to Botanists." Apartment Therapy, 29 Jan. 2020, www.apartmenttherapy.com/houseplant-tips-botanists-36710191.
- Chicago Style:
Kossinets, Gueorgi, and Duncan J. Watts. 2009. “Origins of Homophily in an Evolving Social Network.” American Journal of Sociology 115:405–50. Accessed February 28, 2010. doi:10.1086/599247.
- IEEE
[6] A. Altun, “Understanding hypertext in the context of reading on the web: Language learners’ experience,” Current Issues in Education, vol. 6, no. 12, July, 2005. [Online serial]. Available: http://cie.ed.asu.edu/volume6/number12/. [Accessed Dec. 2, 2007].
Expect 80% - 90% Accuracy with the Content Generator's Citations.
Let's just use APA Format for this Essay:
Essay = Document(citation_format="APA")
Now Lets talk about the Content Assembler, this Function essentially packages all the information created by Content Generator, and returns a File, either DOCX or PDF. Here is the Function's Initialisation along with the default values set
def Assemble(self,
pdf=False,
watermark=False
):
This parameter just decides whether the document generated would be a PDF or DOCX.
Let's pick .pdf for this essay:
Essay.Assemble(pdf=True)
When activated, this parameter applies a watermark on the generated document.
The function behind this parameter requires a watermark.pdf template similar to the one present in the repo, and a specification of its path in the settings.json file.
"DOCUMENT_SETTINGS": {
"WATERMARK_PDF_LOCATION": "WatermarkTemplate.pdf"
}
Let's just say that we need a Watermark applied on our Document:
Essay.Assemble(watermark=True)
We are done with explaining the parameters and Functions, now let's generate an essay with the examples provided.
Here is the Script to Generate the Essay Using the Examples Above.
from ContentGenerators.Essay import Document
Essay = Document(
id="JSH3UI43H29RIJS83",
path="readmeAssets",
topic="The Cold War.",
commands="Side with the Americans on this one, and describe how their ideology is right. Use extremely simple and broad Language.",
paragraphs=8,
wordcount=2000,
pictures=True,
language="en",
search_language="en",
citation_format="APA"
)
Essay.Generate()
Essay.Assemble(pdf=True, watermark=True)
And the Content Generator has generated an Essay on par with the requirements provided.
You can check out the complete essay here.