Data Sets

Data Sets Explainable AI Featured Posts Generative AI Natural Language Processing Python

Breakthrough: Zero-Weight LLM for Accurate Predictions and High-Performance Clustering

While most AI companies keep building LLMs with more weights and tokens (now one trillion is a standard number), I went in the opposite direction. Of course, zero weight means that there is no neural network behind the scenes. More specifically, it means that there is no lengthy Blackbox process to find the “best” weights […]

Read More
Data Sets Explainable AI Featured Posts Generative AI Natural Language Processing

Genome: Synthesizing DNA Sequences with LLM Techniques

This methodology is not focused on genome data alone. The purpose is to design a generic solution that may also work in other contexts, such as synthesizing molecules. The problem involves dealing with a large amount of “text”. Indeed, the sequences discussed here consist of letter arrangements, from an alphabet that has 5 symbols: A, […]

Read More
Courses Data Sets Generative AI Python

10 GenAI Notebooks: OpenAI, LLM, RAG, GPT, and More

For developers and AI/ML professionals. This comprehensive free resource offered by our sponsor is designed to provide you with hands-on experience and deeper insights into building cutting-edge GenAI applications. 🌟 Special Opportunity: You can win a pair of Apple Airpods simply by following the tutorial and learning something new. How to Participate Follow these 2 […]

Read More
Data Sets Experimental Math Featured Posts Python Statistical Science

Number Theory: Longest Runs of Zeros in Binary Digits of Square Root of 2

Studying the longest head runs in coin tossing has a very long history, starting in gaming and probability theory. Today, it has applications in cryptography and insurance. For random sequences or Bernoulli trials, the associated statistical properties and distributions have been studied in detail, even when the proportions of zero and one are different. Yet, […]

Read More
Data Sets Explainable AI Featured Posts Generative AI Machine Learning Synthetic Data

New Python Library to Evaluate AI-generated Data and Compare Models

Called GenAI-Evalution, you use it for instance to assess the quality of tabular synthetic data. In this case, it measures how faithfully the synthetization mimics the real data it is derived from, by comparing the full joint empirical distributions (ECDF) attached to the two datasets. It works both with categorical and numerical features, and returns […]

Read More
Data Sets Featured Posts Generative AI Machine Learning Python Synthetic Data

How to Fix a Failing Generative Adversarial Network

In this article, I explore different front-end strategies to improve a generative adversarial network (GAN) that leads to poor synthetization, in the context of tabular data generation. It is well known that tabular data is a lot more challenging than images, when using deep neural networks for synthetization purposes. An algorithm may work very well […]

Read More
Data Sets Explainable AI Featured Posts Generative AI Synthetic Data Time Series

Synthesizing Geospatial Data with A Simple NoGAN Technique

If you regularly read my articles, you know that I developed several different techniques for data synthetization. Many are explained in details in my upcoming book Synthetic Data and Generative AI (Elsevier), available here. It includes generative adversarial networks (GANs), copulas, agent-based modeling, methods based on interpolation, correlated noise mixtures, and more. The technique presented […]

Read More
Data Sets Experimental Math Featured Posts Generative AI Synthetic Data Time Series

Sound Generation in Python: Turning Your Data into Music

Not long ago, I published here an article entitled “The Sound that Data Makes”. The goal was turning data — random noise in this case — into music. The hope was that by “listening” to your data, you could gain a different kind of insights, not conveyed by visualizations or tabular summaries. This article is […]

Read More
Data Sets Featured Posts Synthetic Data

Generative AI: Synthetic Data Vendor Comparison and Benchmarking Best Practices

The goal of data synthetization is to produce artificial data that mimics the patterns and features present in existing, real data. Many generation methods and evaluation techniques are available, depending on purposes, the type of data, and the application field. Everyone is familiar with synthetic images in the context of computer vision, or synthetic text […]

Read More
Courses Data Sets Deep Learning Explainable AI Featured Posts

Massively Speed-Up your Learning Algorithm, with Stochastic Thinning

Dramatically Speed-Up your Learning Algorithm, with Stochastic Thinning. Includes use case, Python code, regression and neural network illustrations.

Read More
Exit mobile version